Whenever my wife returns excitedly from the mall having bought something new, I respond on reflex: Why do we need that? To which my wife retorts that if it were up to me, humans would still live in caves. Maybe not caves, but we’d still program in C and all applications would run on relational databases. Fortunately, there are geeks out there with greater imagination.
When I first began reading about NoSQL, I ran into the CAP Theorem, according to which a database system can provide only two of three key characteristics: consistency, availability, or partition tolerance. Relational databases offer consistency and availability, but not partition tolerance, namely, the capability of a database system to survive network partitions. This notion of partition tolerance ties into the ability of a system to scale horizontally across many servers, achieving on commodity hardware the massive scalability necessary for Internet giants. In certain scenarios, the gain in scalability makes worthwhile the abandonment of consistency. (For a simplified explanation, see this visual guide. For a heavy computer science treatment, see this proof.)
This initially led me to assume that NoSQL makes sense only for the likes of Facebook and Twitter. The rest of us who seek something less than world domination and who associate consistency with job security may as well stay within the safe and comfortable realm of relational database, which have certainly passed the test of time.
However, I’m starting to question that assumption. Clearly, relational databases still make sense for many applications, especially those requiring strict transactions and complex ad hoc queries. Relational databases will certainly remain the backbone of financial and ERP systems. But I’m now wondering whether NoSQL might fit quite well for many other applications.
When I say NoSQL, however, I’m not really saying anything. Once computer scientists freed themselves from the principles of relational databases, an astounding creativity burst forth. The only thing that NoSQL databases have in common is that they are not relational. So it is not a choice between SQL and NoSQL, but rather a choice between SQL and a wide diversity of other options.
Wikipedia does a good job categorizing the many NoSQL databases now available. But that should just be taken as a starting point. The only way to appreciate the range of choices is to explore each one, looking over its documentation, playing with code, and experimenting. The value of NoSQL is not in the theory, but in the specific character of each NoSQL database.
And so I plan to spend time this year exploring and posting about some of the many NoSQL options out there. I’ve already started a post on MongoDB. Stay tuned for more. And if you have any suggestions for which database I should look into next, please make a comment.
In the last few years we have seen our customer base shift almost exclusively from RDBMSs (MySQL, to a lesser degree PostgreSQL, and a slow but steady increase in Oracle) to numerous NoSQL solutions. (Full disclosure – I am an Architect in the Professional Services group at RightScale.) The customer use cases for these NoSQL implementations have expanded rapidly as well. Initial deployments focused on social gaming environments, but NoSQL solutions have become more and more prevalent in classic web applications and even in some enterprise installations. You mention that you are going to investigate MongoDB first, and from the data I have seen this would appear to be good place to start. We get more requests for MongoDB solutions than any other NoSQL option at this point. The other ones we see (in order of decreasing demand) are: Membase, Redis, and Riak, with surprisingly few requests for Cassandra, which initially got quite a bit of attention.
Thanks for the input. I’ll follow your suggestions to explore Membase, Redis, and Riak. By the way, I just posted on MongoDB. I’d welcome your thoughts on that post as well.
[…] NoSQL: The Joy is in the Details by James Downey. […]