This Monday at the Silicon Valley NewSQL meetup in Mountain View, Michael Stonebreaker took turns bashing both the established relational databases (Oracle, DB2, SQL Server, PostgreSQL) and the NoSQL newcomers (MongoDB, Cassandra, Riak), proposing a third alternative, VoltDB, a NewSQL database.
Stonebreaker—a leading researcher in the field of database design, former UC Berkeley professor and now MIT professor, winner of the IEEE John von Neumann Medal, and a designer of PostgreSQL—argued that the established databases have become legacy bloatware incapable of scaling to modern requirements without complete redesign. According to Stonebreaker’s research, these systems, all of which follow a similar design, use only a small percentage of CPU cycles (about 4%) on useful work. The bulk of CPU cycles go to overhead, divided fairly evenly into four categories of about 24% each:
- Managing the buffer pool of disk pages cached in memory
- Multi-row locking for transactions
- Latching memory objects such as b-trees to prevent corruption in a multi-threaded environment
- Write-ahead logging to disk
The NoSQL databases, according to Stonebreaker, solve these problems, but they do so by jettisoning SQL and ACID. Giving up SQL, Stonebreaker argued, makes no sense. The SQL standard has proven itself as a time-saving high level language that successfully depends on compilation to generate low level commands. Going backwards to row-level commands and unique APIs for each database, Stonebreaker claimed, is comparable to giving up C for assembler.
Stonebreaker also argued against giving up ACID, a requirement (or potential requirement) for almost all applications. If a database does not provide ACID, application developers will need to write this complex code themselves.
Stonebreaker proposed instead his product, VoltDB, a relational database that supports ACID and most of the SQL standard. VoltDB avoids the overhead of buffer management by keeping all data in memory. It avoids the overhead of row locking and memory object latching by using a single thread per partition. Only one thread touches memory objects, and transactions run sequentially on the one thread. And instead of write-ahead logging of data, VoltDB takes periodic snapshots of the database and logs only commands, which is faster but still capable of rebuilding the database from disk in case of failure. (See the VoltDB Technical Overview for more details.)
Like most of the NoSQL databases, VoltDB supports scalability across commodity hardware by sharding data based on keys. According to Stonebreaker, the choice of key is critical to performance, as joins and transactions that cross partitions degrade performance, a problem that cannot be solved even by eliminating the overhead of traditional RDMS. VoltDB makes scaling possible, but application developers must still give careful thought to how to partition data so that most operations only touch a single partition.
One could argue that this latter admission proves the NoSQL case against relational databases, namely that a database supporting ACID cannot scale. VoltDB scales only as long as transactions do not cross partitions. In a sense, VoltDB can be thought of as many small, fast databases that support ACID or one large database that supports ACID but does not scale. In other words, VoltDB does not solve the CAP dilemma.
Certainly, VoltDB will make sense for certain use cases, where there is a need for lightning speed and transactional integrity, where data can be sharded into largely autonomous partitions, and where VoltDB’s only partial implementation of the SQL standard fulfills requirements. But VoltDB will not replace traditional RDMS anytime soon, as it lacks much of the functionality that enterprises expect, bloatware though that might be.
Nor will VoltDB eliminate the demand for NoSQL, because many organizations will find a NoSQL database out there that fits well with its specific requirements. If all you need is a key-value store, why not choose a database that specializes in this function. If your data takes the shape of a graph, why not choose a database tailor made to the purpose.
Moreover, Stonebreaker overstates the case against NoSQL APIs. Yes, SQL is a proven high level language for data access, but it does not always fit well with object models used by application developers, which is why organizations have turned to solutions for object-relational mapping. In many ways, objects map more intuitively to document-oriented databases than to SQL. Besides, a central tenet of NoSQL is the embrace of variety, the rejection of the one size fits all mentality. With that in mind, the diversity of NoSQL APIs is a benefit; the diversity allows developers to choose the one or more APIs that best fit a specific requirement.
Whether or not we accept all of Stonebreaker’s claims, the VoltDB retake on the traditional RDMS makes clear that these are exciting times in the world of databases.