In February 2011, Membase, Inc. and CouchOne merged to combine the strengths of their two open-source NoSQL projects, Membase and CouchDB. The joint team released Couchbase 1.8 in January, 2012 as an upgrade to Membase 1.7. Version 2.0 is now available as a developer preview. Meanwhile, CouchDB lives on as an independent project under Apache.
Membase, developed by several leaders of the Memcached project, maintains protocol compatibility with Memcached, making it possible to plug in Membase as a replacement for Memcached without rewriting application code. Like Memcached, Membase is a key-value store supporting set and get operations. But Membase adds persistence and replication. While Memcached removes items from cache when it runs out of memory, Membase makes room by moving items to disk. And while Memcached stores each item on a single server, Membase replicates items to additional servers, first to memory, and then to disk as needed.
CouchDB is a document-oriented database that stores values as JSON documents. Like Riak, CouchDB is written in Erlang, a language designed for distributed, concurrent, fault-tolerant, soft real-time systems.
The merged result, Couchbase, combines the capabilities of both products, serving as either a key-value store or as a document-oriented database, depending on the use case and values being stored. Like Membase, Couchbase can replace Memcached without a code rewrite, meaning it can function as a key-value store in the manner of Memcached. Indeed, the Couchbase API supports the set and get of arbitrary binary values without forcing values to conform to JSON. And while internally all items get stored as JSON with arbitrary values kept as attachments, this need not matter to a developer. Indeed, there are quite a few use cases in which it would make sense to use Couchbase as a key-value store and ignore its document-oriented capabilities.
For values that do conform to JSON, however, Couchbase provides document-oriented capabilities. Indeed, it shares many features with MongoDB. Couchbase organizes data into buckets, a concept very much akin to a MongoDB collection (and comparable to a relational table minus the schema). Both databases support sharding, replication, automatic failover, and the ability to add capacity without downtime. Both provide rich monitoring capabilities. And both emphasize consistency over high-availability, requiring that all writes go to a single master responsible for a segment of a key space so that every read consistently returns the latest value.
And both Couchbase and MongoDB enable developers to retrieve collections of documents using fields other than the primary key, but the manner in which they do so varies significantly. Each has taken a different data retrieval feature from the relational database world as its starting point. MongoDB supports secondary indexes and ad-hoc queries, much like SQL but without joins. Couchbase supports materialized views. A view is very much like a pre-written query. A materialized view stores the sorted results of the query in memory or on disk ready for fast retrieval. In Couchbase, a developer creates these materialized views using JavaScript map-reduce functions. Couchbase supports updating views either on writes or reads, the latter approach making sense following bulk updates. While not as easy to define as a MongoDB query, a Couchbase view, just like a view in a relational database, can look dramatically different than the underlying data and may include aggregated values.
Because views on large data sets can take a long time to materialize, Couchbase provides for developer views, which run against a random subset of data. When ready, a developer can publish a view to production and have it materialized against an entire data set. Last week, I had the chance to watch Matt Ingenthron, Director of Developer Solutions at Couchbase, and Dustin Sallings, Couchbase’s Chief Architect, demo this feature at a Couchbase user group in San Francisco. It is clearly a critically important feature for large data sets.
Whether the Couchbase or MongoDB model makes the most sense for any application depends on the use case. If an application requires a key-value store in some places and document-oriented features in others, Couchbase might make sense. And if an application built on Memcached now needs a more reliable cache, Couchbase may well make a good fit. But for developers looking for ad-hoc queries and indexing reminiscent of relational databases, MongoDB would feel more comfortable. No matter your preference, the evolution of these two open-source document-oriented NoSQL databases demonstrates that choice is alive and well in the open source community.
Related Posts:
Exploring NoSQL: MongoDB
Exploring NoSQL: Memcached
Exploring NoSQL: Redis
Exploring NoSQL: Riak
Leave a Reply