Exploring NoSQL: MongoDB

MongoDB is one of the most popular of open source NoSQL databases. Supported by 10gen and boasting a long list of deployments including Disney, Craigslist, and SAP, MongoDB has a remarkably simple application programming interface (API) and all the tools necessary for massive scalability.

MongoDB is a document-oriented NoSQL data store, but the word document might mislead. Really, MongoDB stores objects much like an object database. More technically, MongoDB stores BSON documents, which stands for Binary JSON. JSON stands for JavaScript Object Notation, which is a text-based format for storing structured data, much like XML but less verbose, with more colons and curly braces than angle brackets.

MongoDB stores each object as a document, which might be thought akin to a record in the relational world. While there is no schema in MongoDB, that is no defined set of columns, developers would normally store like objects (objects created from the same class) together in a collection, which might be thought of as a table. A database would typically consist of several such collections. And while there are no relationships between collections, a JSON object can itself contain a hierarchy of other JSON objects or fields that link to objects in other collections, so it is possible to model a variety of relationships.

Developers access MongoDB through a driver, which maps a MongoDB document to a familiar construct in the developer’s language of choice. JSON objects, as the name implies, map easily to JavaScript objects. In Python, a document maps to a dictionary. In C#, a document maps to a specially defined class called a BsonDocument. Regardless of the language, the API is quite straight forward.

Object databases remove the grunt work of mapping classes to relational data models. But object databases never caught on, perhaps because of the difficulty of ad hoc queries. MongoDB does provide a variety of query mechanisms. It allows for queries by object properties, queries based on regular expressions, and complex queries using JavaScript methods. Whether any of these approaches satisfies requirements for ad hoc queries depends on the specific application scenario.

MongoDB achieves scalability through sharding, which divides objects in a collection between different servers based on a key. If the developer defines zip code as the shard key, for example, a customer object with a New York City zip code might be stored on a different server than a customer object with a San Francisco zip code. MongoDB handles the work of distributing the data.

Sharding is distinct from replication. Each MongoDB shard can be configured as a replicate set, which provides asynchronous master-slave replication with automatic failover and recovery of member nodes. In production, a replica set generally consists of at least three nodes running on separate machines, one of which serves as an arbiter. The arbiter breaks ties when the cluster needs to elect a new primary node. Drivers automatically detect when a replica set’s primary node changes and begin to send writes to the new primary. The replication process uses logs in much the same way as relational databases. And while non-primary replicas could be used for reads to speed performance, the primary purpose of replication is reliability. (Paragraph updated 3 Feb 2012.)

Reliability brings to mind transactions. Relational databases support transactions across multiple records and tables; MongoDB restricts transactions to single documents. While this might appear overly restrictive, it could be made to work in many scenarios. Recall that a document can include a hierarchy of objects. If an application’s data is modeled such that all of the data requiring all-or-nothing modification resides within one document, the MongoDB approach would suffice.

However, this restriction on transactions calls attention to the design objectives of MongoDB: speed, simplicity, and scalability. Expanding transactions to encompass multiple objects stored across shards would significantly impact performance leading to complex dead-lock problems. Indeed, by default, a save function call to the MongoDB returns immediately to the application without waiting for confirmation that the save was successfully persisted. If a networking or disk failure prevents the write, the application continues without awareness of the error. However, the MongoDB API does provide options for safe writes that wait for a success response. There are even options to specify how many replication slaves must get updated before the save is considered a success. So while speed is the default, reliability is a possibility.

While I mentioned earlier that MongoDB provides drivers for a variety of languages, it holds particular appeal to JavaScript devotees. JSON documents were designed to store JavaScript objects. JavaScript is the MongoDB language for complex queries. And the command line tool for managing MongoDB is built on top of the JavaScript shell. So if you have mastered JavaScript for the coding of dynamic web pages, MongoDB provides an opportunity to expand its use.

I recommend visiting MongoDB.org and trying out the online shell. In a few minutes, you’ll get a sense of the API. Then take a look at the tutorial. And to experiment further, download and install MongoDB for yourself. (I managed to install it on Windows 7 in a few minutes, but somehow got stuck installing the package on Ubuntu.)

Related Posts:
Exploring NoSQL: Memcached
Exploring NoSQL: Redis
Exploring NoSQL: Riak
Exploring NoSQL: Couchbase

Posted in MongoDB, NoSQL

NoSQL: The Joy is in the Details

Whenever my wife returns excitedly from the mall having bought something new, I respond on reflex: Why do we need that? To which my wife retorts that if it were up to me, humans would still live in caves. Maybe not caves, but we’d still program in C and all applications would run on relational databases. Fortunately, there are geeks out there with greater imagination.

When I first began reading about NoSQL, I ran into the CAP Theorem, according to which a database system can provide only two of three key characteristics: consistency, availability, or partition tolerance. Relational databases offer consistency and availability, but not partition tolerance, namely, the capability of a database system to survive network partitions. This notion of partition tolerance ties into the ability of a system to scale horizontally across many servers, achieving on commodity hardware the massive scalability necessary for Internet giants. In certain scenarios, the gain in scalability makes worthwhile the abandonment of consistency. (For a simplified explanation, see this visual guide. For a heavy computer science treatment, see this proof.)

This initially led me to assume that NoSQL makes sense only for the likes of Facebook and Twitter. The rest of us who seek something less than world domination and who associate consistency with job security may as well stay within the safe and comfortable realm of relational database, which have certainly passed the test of time.

However, I’m starting to question that assumption. Clearly, relational databases still make sense for many applications, especially those requiring strict transactions and complex ad hoc queries. Relational databases will certainly remain the backbone of financial and ERP systems. But I’m now wondering whether NoSQL might fit quite well for many other applications.

When I say NoSQL, however, I’m not really saying anything. Once computer scientists freed themselves from the principles of relational databases, an astounding creativity burst forth. The only thing that NoSQL databases have in common is that they are not relational. So it is not a choice between SQL and NoSQL, but rather a choice between SQL and a wide diversity of other options.

Wikipedia does a good job categorizing the many NoSQL databases now available. But that should just be taken as a starting point. The only way to appreciate the range of choices is to explore each one, looking over its documentation, playing with code, and experimenting. The value of NoSQL is not in the theory, but in the specific character of each NoSQL database.

And so I plan to spend time this year exploring and posting about some of the many NoSQL options out there. I’ve already started a post on MongoDB. Stay tuned for more. And if you have any suggestions for which database I should look into next, please make a comment.

Related Posts:

Exploring NoSQL: MongoDB

Exploring NoSQL: Memcached

Exploring NoSQL: Redis

Posted in NoSQL

Cloud Foundry Evolves

Cloud Foundry, the open-source Platform-as-a-Service (PaaS) solution from VMware, continues to gain community support and evolve toward a more diverse, enterprise-ready platform. Last night at the Silicon Valley Cloud Computing Group meet up in Palo Alto, VMware engineers and representatives from several community partners spoke of recent progress and future plans.

Building on Cloud Foundry’s extension framework for languages and services, Uhuru Software has added .Net support to Cloud Foundry, enabling .Net developers to create applications in Visual Studio and deploy them directly to a Cloud Foundry-based private or public cloud. AppFog has added PHP support, and ActiveState has added Perl and Python support. Jaspersoft has extended Cloud Foundry with BI support, including user-friendly wizards for building reports and dashboards and direct support for the document-oriented MongoDB as a data source.

Scalr, whose founder, Sebastian Stadil, organizes the cloud computing group, demonstrated tooling for deploying a Cloud Foundry cluster. The graphical, web-based tool builds up a configuration for each server and calls web services to spin up the instances.

And VMware itself continues to make major contributions to Cloud Foundry. Patrick Chanezon and Ramnivas Laddad from VMware demonstrated Micro Cloud Foundry, a Cloud Foundry cluster with all components running on one virtual machine. This capability makes it possible for developers to spin up a PaaS instance on their laptop, deploy an application, and debug. Using an Eclipse plugin, Laddad gave a demo of debugging a cloud application that mirrored the experience of debugging traditional applications.

During a closing panel, VMware and partner representatives clarified the distinction between CloudFoundry.org and CloudFoundry.com. CloudFoundry.org hosts the open-source software development project, which enables organizations to run their own private or public PaaS cloud. This codebase will grow to support multiple languages and services. CloudFoundry.com is a public instance of Cloud Foundry run by VMware. Like other hosted instances of Cloud Foundry, CloudFoundry.com supports only a subset of the languages and services provided for by the open-source Cloud Foundry code. Despite the expansion of the code base, hosting providers must limit their offerings to services that they have the operational expertise to support.

In light of the rapid growth and expanding ecosystem, Jeremy Voorhis, a senior engineer at AppFog, suggested that VMware create an independent governance body to direct the future development of Cloud Foundry and to mediate potential conflicts between contributors. A few meet-up participants supported the suggestion. While all agreed that VMware has done a fabulous job of starting the project and building an ecosystem, those who raised the suggestion were concerned that conflicts were inevitable and that it would be better to build up a governance system in preparation. Representatives from VMware responded that they did not oppose the idea but did not consider governance a priority given the platform’s early stage of development.

The meet up made one thing clear, that extensiblity (see my post from last May) has made Cloud Foundry into a dynamic platform that has caught the attention of the open-source community.

Posted in Cloud Computing, Cloud Foundry

Extreme Transaction Processing in SOA

On the LinkedIn Mountain View campus, Anirudh Pandit, a senior architect at Oracle, spoke last night at the SVForum’s Software Architecture and Platform SIG on the topic of Extreme Transaction Processing in SOA. Pandit proposed an architectural pattern for improving SOA performance and reviewed several case studies in which the pattern dramatically improved performance in enterprise SOA implementations.

While no longer generating the buzz that it once enjoyed, SOA (Services Oriented Architecture) remains an important integration strategy at large enterprises, perhaps now enjoying more real success than it did at the peak of it is hype cycle. Though implementations often prove costly and difficult, ultimately SOA provides the most conceptually compelling means to cut through the complexity and diversity of enterprise systems—systems encumbered by a mix of legacy and web-based applications, of behind-the-firewall systems with the external systems of business partners, and of applications accumulated through long histories of mergers and acquisitions.

Despite its conceptual advantages, Pandit pointed out that many SOA implementations suffer from poor performance and a lack of scalability. As messages move from service to service through an orchestration, the work of serializing and deserializing XML messages creates processing bottlenecks and maintaining transactions via relational database and data persistence creates disk IO bottlenecks.

Pandit proposed a solution based on two changes to the traditional SOA architecture. First, rather than passing XML messages with repeated serializations and deserializations, pass a token that each service may use to retrieve the message. Second, instead of persisting the message as it moves from service to service, cache the message to memory and assure message integrity by synchronizing the cache across multiple machines, preferably across machines in different data centers.

The solution, Pandit explained, does not work in all circumstances. By avoiding persistence to disk, the solution might fail to meet compliance requirements. By synchronizing the cache across many machines and data centers, the solution might introduce network bottlenecks and latency issues. But where it does fit, Pandit concluded, the solution overcomes performance problems in a plug-and-play manner without a need for the costly redesign of services.

While most recent conversations on scalability revolve around NoSQL, Pandit’s presentation was a reminder that the intelligent use of caching remains a viable option.

Stay tuned to the Software Architecture and Platform SIG for future events and the slide deck of Pandit’s presentation (not yet available at time of writing). The next meeting will be on the use of Hadoop at LinkedIn.

Posted in Architecture, SOA, SVForum

Five Bad Reasons Not to Adopt Agile

While the advantages of agile appear obvious, I observe that many IT departments stick stubbornly to waterfall. Despite failure after failure, IT managers blame the people—staff members, vendors, users—rather than the process.

Waterfall organizes software projects into distinct sequential stages: requirements, design, coding, integration, testing. Despite its common sense appeal, waterfall projects almost always lead to unproductive conflicts. Most users cannot visualize a system specified in a requirements document. Rather, users discover the true requirements only late in the project, usually during user acceptance testing, at which point accommodating the request means revising the design, reworking the implementation, and re-testing.

In fact, users should change their minds. We all do as we learn. As users begin using a system, they begin to see more clearly how it could improve their jobs. We should welcome this learning. But under the waterfall approach, such discovery gets labeled as scope creep, it undermines the process, causes cost overruns, and leads to a frenzy of finger pointing.

Agile methods throw out the assumption that a complex system can be understood and fully designed and planned for up front. While there are a great diversity of agile methods, they almost all tackle complexity by breaking projects into small—one week to a month—iterations, each one of which delivers working software of value to a customer. Users get to work with and provide feedback on the system sooner rather than later.

So why do so many IT organizations insist on crashing over waterfalls again and again? Here are the most frequent refrains I hear from waterfall proponents:

“We need a fixed-fee contract to avoid risk.”

Fixed-fee contracts force projects into a waterfall approach that makes change cost prohibitive. While fixed-fee contracts in theory transfer risk from client to vendor, these contracts in reality increase the risk of a final product that meets specification but fails to achieve the business objectives. A far simpler alternative would be a time and material contract that requires renewal at the start of each iteration. The customer reduces risk by agreeing to less cost at each signing and by receiving working software sooner rather than later, which both achieves a quicker return on investment and assures the project remains in line with expectations.

“What’s this going to cost?”

To make the business case for a project, it is essential to nail down costs, which the waterfall method promises for an entire project as early as the requirements stage. And while I’m all for business cases, making decisions based on bogus data makes little sense. What we need is an agile business case, one that covers a single iteration and rests on meaningful cost and revenue data.

“I’m a PM and I need a schedule to manage to.”

Wedded to the illusion of predictability, many PMs identify their discipline with large, complex schedules, full of interdependent sequences of tasks stretching out over months, schedules so easy to create in MS Project but that so rarely work out in reality. These schedules, together with requirement documents the size of a New York City phone book, end up as just more road kill of the waterfall process.

 “Should we buy or build? We need a decision.”

Agile eschews big designs up front in favor of designs that evolve organically over time. But IT departments must often make certain big design decisions up front, such as whether to buy or build a system or component.

And, unfortunately, these sorts of decisions fall outside of the well-known agile frameworks such as Scrum or XP, which focus exclusively on software development. So it might be assumed that once a big up front decision is required, that agile no longer applies. Indeed, it does make sense that more requirements and more design is needed up front to avoid committing dollars to the wrong vendor.

Regardless, I’d argue that there are ways to keep a project agile even if certain decisions must be made up front.

1)      Keep in mind that there is no need to collect every requirement, just those needed for the decision.

2)      Take advantage of demo versions and commit a few early iterations to build out rapid prototypes on different products.

3)      Consider open source. Low cost means low commitment.

And once a decision to buy or build has been made, implementation can proceed according to agile principles. Most enterprise software requires substantial configuration and customization that can benefit as much as pure software development from agile approaches.

“Let’s just get on with the next project.”

Perhaps because of the pain and strained relationships that accompany failed waterfall projects, few organizations analyze the process that led them to failure. Everybody flees the sinking ship and moves onto the next project so fast that nobody reflects on the root causes of the disaster. I would think that if enough time were spent studying these root causes, it would be determined that the problem was the process, not the people.

Thoughts?

If you know of other objections to agile, I’d like to hear them. And if you have had success in introducing agile into IT organizations, please share your experiences.

Posted in Agile

A Cloud Curriculum

Erik Bansleben, Program Development Director at the University of Washington, invited me and several other bloggers to participate in a discussion last Friday on UW’s new certificate program in cloud computing. Aimed at students with programming experience and a basic knowledge of networking, the program covers a range of topics from the economics of cloud computing to big data in the cloud.

I had a mixed impression of the curriculum. It looked a bit heavy on big data, an important cloud use case that would perhaps be better suited to a certificate program of its own. And the curriculum looked a bit light on platform-as-a-service and open source platforms for managing clouds such as OpenStack. Since the program is aimed at programmers, perhaps I’d organize it around the challenges of architecting applications to benefit from cloud computing.

But that is nitpicking. It is certainly important for universities to update course offerings based on current trends. And UW has put together an impressive panel of industry experts to guide the program.

Clearly, it’s difficult to put together a program around cloud computing, since there is so much debate around what constitutes cloud computing. And given that just about every tech firm out there has repositioned its products as cloud offerings, it would seem very difficult to achieve sufficient breath without giving up the opportunity for rigorous depth.

But any interesting new direction in the tech industry would pose such challenges. Indeed, perhaps it is these difficulties that make it worth tackling the topic.

For more information, see the UW website at http://www.pce.uw.edu/certificates/cloud-computing.html.

Posted in Cloud Computing

The Elements of Scrum: A Book Review

Let me say up front that I’m a biased reviewer. I attend the SF Bay Area Agile meet-up organized by Agile Learning Labs, where both authors work. In fact, I won a copy of this book at the last meet-up event. And I’m a big fan of Chris Sims. Chris has a knack for engaging a group in interactive exercises that not only make clear the practicality of agile principles but are also just plain fun.

The Elements of Scrum is a short and pleasant read, a wonderful metaphor for what it conveys about Scrum, that Scrum makes software development a joy. Chris and his co-author, Hillary Louise Johnson, introduce Scrum in a way that is clear and practical. The book introduces the principles of agile, the Scrum framework, and a host of supporting practices.

Chris and Hillary make a great point of rejecting the common myth that agile means no planning: “On an agile software project, the plan is all around you, in the form of user stories, backlogs, acceptance tests, and big, visible charts: these are all part of your communication-rich environment.”

I particularly liked the metaphor comparing software development to sailing: the only way to reach a destination is to adjust frequently for wind and current. While I don’t want to spoil the experience by quoting this section, suffice it to say that it vividly describes the need for agile.

Any organization embarking on Scrum should hand this book out to all stakeholders. Given its brevity and practicality, it might actually get read. And it is bound to generate enthusiasm.

And by the way, if you live in Silicon Valley, check out the agile meet-up at http://www.meetup.com/AgileManagers/.

Posted in Agile

Big Data and Hadoop: A Cloud Use Case

Tuesday night I drove to Santa Clara for the Big Data Camp, learned more about Hadoop, and even ran into a few Dell colleagues. Thanks to Dave Nielson for organizing the camp and to Ken Krugler for his great overview of Hadoop.

While the phrase big data lacks precision, it is of growing importance to ever more enterprises. With data flooding in from web sites, mobile devices, and those ever present sensors, processing and deriving business value out of it all becomes ever more difficult. Data becomes big when it exceeds the storage and processing capabilities of a single computer. While an enterprise could spend lots of money on high-end, specialized hardware finely tuned for a specific database, at some point that hardware will become either inadequate or too expensive.

Once data exceeds the capabilities of a single computer, we face the problem of distributing its storage and processing. To solve this problem, big data innovators are using Hadoop, an open source platform capable of scaling across thousands of nodes of commodity hardware. Hadoop includes both a distributed file system (HDFS) and a mechanism for distributing batch processes. It is scalable, reliable, fault tolerant, and simple in its design.

Many enterprises, however, would only need to process big data periodically, perhaps daily or weekly. When running a big data batch job, an organization might want to distribute the load over hundreds or thousands of nodes to assure completion within a time window. But when the batch job completes, those nodes may not be needed for Hadoop until the time comes for the next batch job.

This use case—the dynamic expansion and contraction in the need for computing resources—fits well with the capabilities of cloud computing, whether private, public, or hybrid. (In this case, I mean cloud as in Infrastructure-as-a-Service.) A cloud infrastructure could spin up the nodes when needed and free the resources when that need goes away.  In public cloud computing, that freeing of resources would directly impact cost.

During his presentation, Ken Krugler suggested that we think of Hadoop as a distributed operating system in that Hadoop enables many computers to store and process data as if they formed a single machine. I’d add that cloud computing—virtualization, automation, and processes that enable an agile infrastructure—may be needed to complete this operating system analogy so that this distributed operating system not only manages distributed resources but does so for maximum efficiency across a multitude of use cases.

Extra Credit Reading (Dell Resources on Hadoop):

Philippe Julio, Hadoop Architecture, http://www.slideshare.net/PhilippeJulio/hadoop-architecture

Joey Jablonski, Hadoop in the Enterprise, http://www.slideshare.net/jrjablo/hadoop-in-the-enterprise

Aurelian Dumitru, Hadoop Chief Architect, Dell’s Big Data Solutions including Hadoop Overview, http://www.youtube.com/watch?v=OTjX4FZ8u2s

Together with a few colleagues from Dell. I'm on the left. That's Aurelian Dumitru (Hadoop Chief Architect) in the center. And Barton George (Dell Cloud Evangelist) is on the right.

That’s me on the left, Aurelian Dumitru, Dell’s Hadoop Chief Architect, in the center, Barton George, Director of Marketing for Dell’s Web & Tech vertical, on the right. Thanks to DJ Cline for the photo. See http://bit.ly/iHJJIl for more photos of the Big Data Camp.

Posted in Cloud Computing, Hadoop

Pathways to an Open Source Cloud

Last night at the Silicon Valley Cloud Computing Group hosted by Box.net, David Nalley of CloudStack told an overflow crowd that open source ideally fits the requirements of cloud computing, allowing users to solve their own problems and avoid cloud lock in. He outlined the comprehensive set of options now available for building an open source cloud.

Nalley emphasized in particular the need for a tool chain, a set of tools for provisioning, configuration, monitoring, and automation. Without these tools working together, system admins could not possibly manage all of the virtual machines deployed in a cloud.

Here’s a list of the open source projects Nalley mentioned in the talk broken down by function.

Hypervisors: Xen, KVM, Virtual Box, OpenVZ, LXC
(Xen and KVM are the most widespread)

Management: CloudStack, Eucalyptus, OpenStack, Ubuntu Enterprise Cloud, Abiquo

PaaS: Cloud Foundry, OpenShift, Stratos

Storage: GlusterFS, CloudFS, CEPH, OpenStack Object Storage (SWIFT), Sheepdog

Cloud APIs: jclouds, libcloud, deltacloud

Provisioning Tools: Cobbler, Kickstart, Spacewalk, Crowbar

Configuration Management: Chef, Puppet

Monitoring: Cacti, Nagios, OpenNMS, Zabbix, Zenoss

Automation/Orchestration: AutomateID, Capistrano, RunDeck, Func, MCollective

For details, see the slide deck at http://www.slideshare.net/socializedsoftware/crash-course-in-open-source-cloud-computing-8279535.

Posted in Cloud Computing, Open Source

Connecting Clouds: L2 or L3?

We speak of the cloud, but we have no cloud. We have clouds: private clouds, a cloud at Amazon, a cloud at Rackspace; more and more clouds, distinct and separate. Enterprises looking to cloud computing may well decide on an all-of-the-above strategy, picking and choosing clouds based on cost and quality. Or they may go hybrid, keeping their data center but extending it into the cloud when and where that makes sense.

Making this choice possible requires both connecting and isolating clouds: connecting so that computers in the enterprise and computers across multiple cloud service providers can communicate; isolating so that this communication stays within the bounds of the enterprise.

On the technical side, this boils down to a networking problem, and networks are built in layers. Layer 1, the physical layer, moves bits across physical communication media, dealing with matters such as voltage and current. Layer 2, the data link layer, frames these bits into packages, dealing with problems such as congestion and collisions. These two layers work closely together; each L2 protocol works with specific L1 protocols. Computers connected through a set of L1 and L2 protocols constitute a physical network.

Layer 3, the network layer, provides for the routing of packets across networks. The L3 Internet Protocol (IP) makes possible the connection of multiple physical networks into a single virtual network or inter-network. The IP protocol forms the foundation of the internet, without which it would be hard to imagine cloud computing.

Now we want to take computers within an enterprise and computers at various cloud service providers and connect them together so they form one isolated cloud—one big dynamic pool of resources, controlled and isolated as if the enterprise owned it all. How should it be done? Should we use L2 or L3 protocols? Of course, public clouds must be accessed over the internet, which means all traffic to these clouds must go through IP. Making the connection based on L2 means tunneling L2 within L3, which trades off efficiency for the sake of making the connection mimic a physical network. Does this make sense?

The question matters to businesses seeking to benefit from cloud computing. Such fundamental architectural decisions are not easily reversed. Getting it wrong means wasted time, wasted money, and lost opportunity.

Using two fairly standard technologies, an IPSec Virtual Private Network (VPN) and a Virtual Local Area Network (vLAN), the Amazon Virtual Private Cloud solves the problem of connecting clouds at L3. The vLAN provides isolation. Even though the virtual machines belonging to a particular enterprise might be spread out on multiple network segments at Amazon, the vLAN allows them to communicate as if they and they alone were on one physical network.

The virtual machines on this isolated vLAN connect to computers in an enterprise’s data center over the internet using an IPSec-based VPN. The IPSec protocol provides a tunneling capability—encrypted IP packages travel within unencrypted IP packages. Amazon decrypts the inner packages and routes them to the correct computer within its Virtual Private Cloud.

Other vendors propose L2 solutions to the problem of connecting clouds. Citrix recently announced a new product, NetScaler® Cloud Bridge, to provide this functionality. Another vendor, CloudSwitch, focuses specifically on this L2 approach to connecting clouds.

The CloudSwitch architecture includes three components: a CloudSwitch Appliance, a CloudSwitch Instance, and a Virtual Network Interface Card (vNIC). The CloudSwitch Appliance works as a switch; it takes L2 frames destined for machines in the cloud, encrypts them, and sends them over the internet through a VPN to a CloudSwitch Instance running in the cloud. The CloudSwitch Instance takes the frames and sends them to the vNIC of the appropriate virtual machine in the cloud, which knows how to decrypt the frames. The system keeps the L2 frames encrypted as they pass over the internet and through the network of the cloud service provider, thereby establishing secure communication and isolation.

Proponents of the L2 approach argue that enterprise applications depend on communicating over a single network with services such as databases and LDAP directories. Moreover, communicating over separate networks in different clouds complicates addressing, since most enterprises internally use private IP addresses that might clash with the addresses used by the cloud service provider.

Unless an organization has specific requirements that demand connecting clouds using L2 protocols, I would advocate the L3 approach. I don’t agree that enterprise applications depend on specific L2 protocols. Application developers write code that uses TCP, UDP, or application-level protocols such as HTTP and FTP that all run over IP, which can travel over any L2 protocol. And I don’t agree that the addressing issue poses an insurmountable problem. Amazon solves the problem through IPSec tunneling and a vLAN. The Amazon Virtual Private Cloud enables enterprises to assign any IP address, private or public, to their virtual machines running in the cloud.

The L3 approach uses well-established technologies such as IPSec VPNs and vLANs that most enterprises already deploy and understand. The L2 approach requires additional proprietary software, adding cost and complexity.

While the L2 approach could benefit applications that rely on L2 multicasting and broadcasting, most notably clustered applications, extending such applications transparently between enterprise data centers and public clouds could cause issues due to differences in latency and reliability. (Thanks to Dell colleague, Andi Abes, for this observation.)

Most importantly, the L2 approach of connecting clouds violates a basic architectural principle of the highly successful TCP/IP protocol suite: the L3 Internet Protocol (IP) connects together separate physical networks so that the physical networks themselves can be implemented using the L1 and L2 protocols that best meet specific requirements. By encapsulating L2 frames in L3 datagrams, the L2 approach creates an unnecessary coupling between the physical network in the enterprise datacenter and the physical networks of the cloud providers.

I look forward to seeing how this debate unfolds and which approach wins out in practice. Please share your opinions.

Additional Resources:

Why Cloud Federation Requires Layer-2 Connectivity

Networking in Federated Clouds – The L2/L3 Debate

Citrix Unveils End-to-End Cloud Networking Lineup

vCider Virtual Networks Screen/Podcast

Introducing Amazon Virtual Private Cloud (VPC)

Extend Your IT Infrastructure with Amazon Virtual Private Cloud

Intercloud (Wikipedia)

Posted in Cloud Computing
%d bloggers like this: