Be Careful with Sloppy Quorums

In my last posts on eventual consistency, I mentioned that R+W>N guarantees consistency. Thanks to commentator senderista for pointing that this statement does not hold in the case of sloppy quorums.

The Amazon Dynamo article describes sloppy quorum as such:

If Dynamo used a traditional quorum approach it would be unavailable during server failures and network partitions, and would have reduced durability even under the simplest of failure conditions. To remedy this it does not enforce strict quorum membership and instead it uses a “sloppy quorum”; all read and write operations are performed on the first N healthy nodes from the preference list, which may not always be the first N nodes encountered while walking the consistent hashing ring.

In other words, the cluster has more than N nodes. And in the case of network partitions, writes could be made to nodes outside of the set of top N preferred nodes. In this case, there would be no guarantee that writes and reads over the N nodes would overlap since the nodes that constitute N are in flux. Therefore, the formula R+W>N has no meaning.

For example, if N equals three in a cluster of five nodes (A, B, C, D, and E) and nodes A, B, and C are the top three preferred nodes, then minus any error conditions writes will be made to nodes A, B, and C. But if B and C were not available for a write, then a system using a sloppy quorum would write to D and E instead. If this were to happen, then even if the write quorum (W) was 3 and the read quorum (R) was 2, making R+W>N, a read immediately following this write could return data from B and C, which would be inconsistent because only A, D, and E would have the latest value.

According to the Amazon Dynamo article, Dynamo mitigates this inconsistency through hinted handoffs. If the system needs to write to nodes D and E instead of B and C, it informs D that its write was meant for B and informs E that its write was meant for C. Nodes D and E keep this information in a temporary store and periodically poll B and C for availability. Once B and C become available, D and E send over the writes.

When balancing the tradeoffs between consistency and availability, it is vital to understand how any particular system handles quorums, whether strict or sloppy.

In some cases, I’ve found the literature unclear on this point. In a post on eventual consistency, Werner Vogels writes “If W+R > N, then the write set and the read set always overlap and one can guarantee strong consistency.” While true in a strict quorum system, this statement is not true in the case of Dynamo and not true of systems based on Dynamo that utilize sloppy quorums.

A page in the Riak wiki states that “R + W > N ensures strong consistency in a cluster” and includes a reference to the post by Vogels on eventual consistency. However, a recent Basho posting states that Riak uses sloppy quorums by default, though it uses strict quorums whenever the values of PR and PW are used rather than R and W. Overall, I didn’t find the Riak documentation clear on this important distinction.

Thanks again to the commentator who pointed out my mistake. As I continue my series exploring NoSQL databases, I’ll be more careful to point out where sloppy quorums could affect consistency.

I'm the Director of Threat Solutions at Shape Security, a top 50 startup defending the world's leading websites and mobile apps against malicious automation. Request our 2017 Credential Spill Report at ShapeSecurity.com to get the big picture of the threats we all face. See my LinkedIn profile at http://www.linkedin.com/in/jamesdowney and follow me on Twitter at http://twitter.com/james_downey.

Posted in NoSQL, Riak
One comment on “Be Careful with Sloppy Quorums
  1. Nawazish Mohammad Khan says:

    Short and crispy.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: