We speak of the cloud, but we have no cloud. We have clouds: private clouds, a cloud at Amazon, a cloud at Rackspace; more and more clouds, distinct and separate. Enterprises looking to cloud computing may well decide on an all-of-the-above strategy, picking and choosing clouds based on cost and quality. Or they may go hybrid, keeping their data center but extending it into the cloud when and where that makes sense.
Making this choice possible requires both connecting and isolating clouds: connecting so that computers in the enterprise and computers across multiple cloud service providers can communicate; isolating so that this communication stays within the bounds of the enterprise.
On the technical side, this boils down to a networking problem, and networks are built in layers. Layer 1, the physical layer, moves bits across physical communication media, dealing with matters such as voltage and current. Layer 2, the data link layer, frames these bits into packages, dealing with problems such as congestion and collisions. These two layers work closely together; each L2 protocol works with specific L1 protocols. Computers connected through a set of L1 and L2 protocols constitute a physical network.
Layer 3, the network layer, provides for the routing of packets across networks. The L3 Internet Protocol (IP) makes possible the connection of multiple physical networks into a single virtual network or inter-network. The IP protocol forms the foundation of the internet, without which it would be hard to imagine cloud computing.
Now we want to take computers within an enterprise and computers at various cloud service providers and connect them together so they form one isolated cloud—one big dynamic pool of resources, controlled and isolated as if the enterprise owned it all. How should it be done? Should we use L2 or L3 protocols? Of course, public clouds must be accessed over the internet, which means all traffic to these clouds must go through IP. Making the connection based on L2 means tunneling L2 within L3, which trades off efficiency for the sake of making the connection mimic a physical network. Does this make sense?
The question matters to businesses seeking to benefit from cloud computing. Such fundamental architectural decisions are not easily reversed. Getting it wrong means wasted time, wasted money, and lost opportunity.
Using two fairly standard technologies, an IPSec Virtual Private Network (VPN) and a Virtual Local Area Network (vLAN), the Amazon Virtual Private Cloud solves the problem of connecting clouds at L3. The vLAN provides isolation. Even though the virtual machines belonging to a particular enterprise might be spread out on multiple network segments at Amazon, the vLAN allows them to communicate as if they and they alone were on one physical network.
The virtual machines on this isolated vLAN connect to computers in an enterprise’s data center over the internet using an IPSec-based VPN. The IPSec protocol provides a tunneling capability—encrypted IP packages travel within unencrypted IP packages. Amazon decrypts the inner packages and routes them to the correct computer within its Virtual Private Cloud.
Other vendors propose L2 solutions to the problem of connecting clouds. Citrix recently announced a new product, NetScaler® Cloud Bridge, to provide this functionality. Another vendor, CloudSwitch, focuses specifically on this L2 approach to connecting clouds.
The CloudSwitch architecture includes three components: a CloudSwitch Appliance, a CloudSwitch Instance, and a Virtual Network Interface Card (vNIC). The CloudSwitch Appliance works as a switch; it takes L2 frames destined for machines in the cloud, encrypts them, and sends them over the internet through a VPN to a CloudSwitch Instance running in the cloud. The CloudSwitch Instance takes the frames and sends them to the vNIC of the appropriate virtual machine in the cloud, which knows how to decrypt the frames. The system keeps the L2 frames encrypted as they pass over the internet and through the network of the cloud service provider, thereby establishing secure communication and isolation.
Proponents of the L2 approach argue that enterprise applications depend on communicating over a single network with services such as databases and LDAP directories. Moreover, communicating over separate networks in different clouds complicates addressing, since most enterprises internally use private IP addresses that might clash with the addresses used by the cloud service provider.
Unless an organization has specific requirements that demand connecting clouds using L2 protocols, I would advocate the L3 approach. I don’t agree that enterprise applications depend on specific L2 protocols. Application developers write code that uses TCP, UDP, or application-level protocols such as HTTP and FTP that all run over IP, which can travel over any L2 protocol. And I don’t agree that the addressing issue poses an insurmountable problem. Amazon solves the problem through IPSec tunneling and a vLAN. The Amazon Virtual Private Cloud enables enterprises to assign any IP address, private or public, to their virtual machines running in the cloud.
The L3 approach uses well-established technologies such as IPSec VPNs and vLANs that most enterprises already deploy and understand. The L2 approach requires additional proprietary software, adding cost and complexity.
While the L2 approach could benefit applications that rely on L2 multicasting and broadcasting, most notably clustered applications, extending such applications transparently between enterprise data centers and public clouds could cause issues due to differences in latency and reliability. (Thanks to Dell colleague, Andi Abes, for this observation.)
Most importantly, the L2 approach of connecting clouds violates a basic architectural principle of the highly successful TCP/IP protocol suite: the L3 Internet Protocol (IP) connects together separate physical networks so that the physical networks themselves can be implemented using the L1 and L2 protocols that best meet specific requirements. By encapsulating L2 frames in L3 datagrams, the L2 approach creates an unnecessary coupling between the physical network in the enterprise datacenter and the physical networks of the cloud providers.
I look forward to seeing how this debate unfolds and which approach wins out in practice. Please share your opinions.