This Monday at the Silicon Valley NewSQL meetup in Mountain View, Michael Stonebreaker took turns bashing both the established relational databases (Oracle, DB2, SQL Server, PostgreSQL) and the NoSQL newcomers (MongoDB, Cassandra, Riak), proposing a third alternative, VoltDB, a NewSQL database.
Stonebreaker—a leading researcher in the field of database design, former UC Berkeley professor and now MIT professor, winner of the IEEE John von Neumann Medal, and a designer of PostgreSQL—argued that the established databases have become legacy bloatware incapable of scaling to modern requirements without complete redesign. According to Stonebreaker’s research, these systems, all of which follow a similar design, use only a small percentage of CPU cycles (about 4%) on useful work. The bulk of CPU cycles go to overhead, divided fairly evenly into four categories of about 24% each:
- Managing the buffer pool of disk pages cached in memory
- Multi-row locking for transactions
- Latching memory objects such as b-trees to prevent corruption in a multi-threaded environment
- Write-ahead logging to disk
The NoSQL databases, according to Stonebreaker, solve these problems, but they do so by jettisoning SQL and ACID. Giving up SQL, Stonebreaker argued, makes no sense. The SQL standard has proven itself as a time-saving high level language that successfully depends on compilation to generate low level commands. Going backwards to row-level commands and unique APIs for each database, Stonebreaker claimed, is comparable to giving up C for assembler.
Stonebreaker also argued against giving up ACID, a requirement (or potential requirement) for almost all applications. If a database does not provide ACID, application developers will need to write this complex code themselves.
Stonebreaker proposed instead his product, VoltDB, a relational database that supports ACID and most of the SQL standard. VoltDB avoids the overhead of buffer management by keeping all data in memory. It avoids the overhead of row locking and memory object latching by using a single thread per partition. Only one thread touches memory objects, and transactions run sequentially on the one thread. And instead of write-ahead logging of data, VoltDB takes periodic snapshots of the database and logs only commands, which is faster but still capable of rebuilding the database from disk in case of failure. (See the VoltDB Technical Overview for more details.)
Like most of the NoSQL databases, VoltDB supports scalability across commodity hardware by sharding data based on keys. According to Stonebreaker, the choice of key is critical to performance, as joins and transactions that cross partitions degrade performance, a problem that cannot be solved even by eliminating the overhead of traditional RDMS. VoltDB makes scaling possible, but application developers must still give careful thought to how to partition data so that most operations only touch a single partition.
One could argue that this latter admission proves the NoSQL case against relational databases, namely that a database supporting ACID cannot scale. VoltDB scales only as long as transactions do not cross partitions. In a sense, VoltDB can be thought of as many small, fast databases that support ACID or one large database that supports ACID but does not scale. In other words, VoltDB does not solve the CAP dilemma.
Certainly, VoltDB will make sense for certain use cases, where there is a need for lightning speed and transactional integrity, where data can be sharded into largely autonomous partitions, and where VoltDB’s only partial implementation of the SQL standard fulfills requirements. But VoltDB will not replace traditional RDMS anytime soon, as it lacks much of the functionality that enterprises expect, bloatware though that might be.
Nor will VoltDB eliminate the demand for NoSQL, because many organizations will find a NoSQL database out there that fits well with its specific requirements. If all you need is a key-value store, why not choose a database that specializes in this function. If your data takes the shape of a graph, why not choose a database tailor made to the purpose.
Moreover, Stonebreaker overstates the case against NoSQL APIs. Yes, SQL is a proven high level language for data access, but it does not always fit well with object models used by application developers, which is why organizations have turned to solutions for object-relational mapping. In many ways, objects map more intuitively to document-oriented databases than to SQL. Besides, a central tenet of NoSQL is the embrace of variety, the rejection of the one size fits all mentality. With that in mind, the diversity of NoSQL APIs is a benefit; the diversity allows developers to choose the one or more APIs that best fit a specific requirement.
Whether or not we accept all of Stonebreaker’s claims, the VoltDB retake on the traditional RDMS makes clear that these are exciting times in the world of databases.
A pretty good summary for the meetup. You may want to add the percentage of each overhead Dr. Stonebraker mentioned (they are each 24%).
Thanks for writing it up.
Thanks for the comment. I’ll update the post.
You say: “One could argue that this latter admission proves the NoSQL case against relational databases…”
I cannot agree with your conclusion here – “against” suggests a winner. All it says is that no system, Relational or NoSQL, can do cross partition transactions at scale. And no one solves the CAP dilemma – if you mean consistency and availability in the face of a partition.
Also: “… it lacks much of the functionality that enterprises expect…” – what functionality are you referring to? Foreign Key enforcement? If so, most large scale Oracle databases (e.g. eBay) do not use database enforced FKs because of scalability issues.
On: “…If all you need is a key-value store, why not choose a database that specializes in this function…” remember that VoltDB is 5x faster than Cassandra so where are you going with this?
On: “…If your data takes the shape of a graph, why not choose a database tailor made to the purpose…” actually most graph usage is “next neighbor” where key-value is a good solution. Graph databases fit well when you have complex navigational queries – but this is not a common use case.
”Exciting times in the world of databases”, I couldn’t agree more. It is particularly interesting that you point out that when you support ACID transactions you cannot scale. To manage a situation where many different users simultaneously try to update the same data does not scale, irrespective of what database you use.
A distributed VoltDB solution supporting ACID transactions does not scale. In fact a solution built on top of a NoSQL database, where you deal with transactional conflicts in the application layer, does not scale either.
Although support of ACID transactions does not scale, for many applications you still want to get as many ACID transactions executed per time unit as possible. The more separated in space parallel ACID transactions are, the longer time it takes to synchronize them.
Our conclusion to these facts is that you should run all parallel ACID transaction in RAM on the same computer to get the best overall ACID transactional throughput.
We call it to “scale in” and it is implemented in the Starcounter DB.
Why Adopt Cloud Computing Now Rather Than Later?
As we are all well aware of, technology expands and progresses at an alarming rate. For this reason it is crucial that we adapt accordingly otherwise we run the risk of being left behind which will no doubt mark the end for many businesses. For example, Moore’s Law states that over the history of computer hardware, the number of transistors on integrated circuits doubles about every two years. The period is often quoted as “18 months” and relates directly to the doubling of chip performance. This example of rapid technological progression can also be applied to cloud computing in that it is also advancing at a fast pace with more and more businesses opening their eyes to the revolutionary concept every day. In order to remain competitive in the future organisations will have to migrate into the cloud.
So why should organisations move now rather than later? Firstly, early movers will always reap more benefits than those who move later. For example, two engineering firms are competing directly with each other in the same town. Whichever firm adopts cloud technology first will see an improvement in operating efficiency, a drastic reduction in IT related costs and they will have access to their own customizable virtual desktop whereas the other firm will remain on the traditional systems therefore forfeiting some of their competitiveness. A firm who uses cloud technology will have superiority over those who don’t in relation to accessing files globally, keeping IT costs down and keeping their systems maintained and up to date, creating more value for their business and their customers.
Another reason for switching to the cloud would relate to the fact that cloud computing will soon become a necessity for businesses. Currently, cloud computing is at a relatively early stage of adoption, meaning companies don’t really see it as essential. It’s just a matter of time before the concept will properly take off which is when organisations will begin to realise how important cloud computing actually is in relation to maintaining competitiveness and efficiency. Cloud computing will become necessary for a number of reasons, an important reason being that cloud computing is very economical. While traditional desktops have a limited lifespan, virtual desktops for example have a much longer lifespan, offering an increased return on investment for the business.
A very important feature of cloud computing in general is that it can be easily scaled and tailored to an organisation’s individual needs. For instance, take the study that was conducted at the London School of Economics and Political Science which consisted of survey data from 1,035 businesses and IT executives and in-depth interviews with more than 35 service providers. The researchers discovered that the cloud services could be so tailor-fit to a company’s needs that employees once bogged down with old processes were now free to thin more creatively. According to the study carried out, “With a cloud model, companies can think about a process at a level that is more detailed and personalized to their individual needs, but the solution will not need to be customized in older, prohibitively expensive ways.
Why bother with all this stuff and USE an ASSOCIATIVE DATABASE model instead..? 100x faster than SQL (scale to hexabyte)and guaranteed to be 50 x faster than newSQL just a thought?