In my last post on Riak, I discussed how an application developer could use the values of N (number of replication nodes), R (read quorum), and W (write quorum) to fine tune the availability, latency, and consistency trade-offs of a distributed database.

The idea of fine tuning consistency with the values of N, W, and R was first made popular in 2007 when Amazon engineers published an article explaining the design principles of Dynamo, a highly available and scalable distributed database used internally at Amazon. The principles popularized by the article were incorporated into three popular open-source NoSQL databases: Riak, Cassandra, and Project Voldemort.

Recall that W+R>N assures consistency. Such a system of overlapping nodes is referred to as a strict quorum, in that every read is guaranteed to return the latest write. [See subsequent post that clarifies this point.] Many developers, however, choose to configure W+R<N for the sake of greater availability or lower latency. Such a system, known as a weak or partial quorum, does not guarantee consistency. But these systems do use various mechanisms to guarantee eventual consistency, meaning that in the absence of additional writes to a key, the key’s value will eventually become consistent across all N nodes. (For a good summary of eventual consistency, see the post on the topic by Werner Vogel, Amazon’s CTO.)

But eventual consistency is an odd guarantee. How long might eventual take? What is the probability of an inconsistent read? Perhaps because the answers to these questions depend on many factors, NoSQL providers do not provide specifics.

Now a group of computer science graduate students at Berkeley are in pursuit of answers. Using data samples from Internet-scale companies and statistical methods (most notably Monte Carlo analysis), the team has put together mathematical models and a simulation tool to determine both average and upper bound answers to these key questions regarding eventual consistency. They refer to the model as Probabilistically Bounded Staleness (PBS). The model factors in the values of N, W, and R as well as sample latencies for read write operations. As of yet, the model does not account for nodes entering or leaving the cluster.

PBS is a brilliant application of statistics to a computer science problem. I enjoyed hearing Peter Bailis, a member of the research team, describe the research at a Basho-sponsored meet-up this Tuesday. To learn the details of PBS, visit the team’s web site. If your business depends on the tradeoffs of eventual consistency, this research is of tremendous importance.

Even if R+W>N, sloppy quorum ala Dynamo means that consistency is not guaranteed.

Thanks for pointing this out. I’ve just written a follow up post explaining my mistake. See http://jimdowney.net/2012/03/05/be-careful-with-sloppy-quorums/.

Pingback: Windows Azure and Cloud Computing Posts for 2/24/2012+ - Windows Azure Blog

Regarding “How long might eventual take?” I can only recommend my 2011 paper on that topic: http://dl.acm.org/citation.cfm?id=2093186