In my last posts on eventual consistency, I mentioned that R+W>N guarantees consistency. Thanks to commentator senderista for pointing that this statement does not hold in the case of sloppy quorums.
The Amazon Dynamo article describes sloppy quorum as such:
If Dynamo used a traditional quorum approach it would be unavailable during server failures and network partitions, and would have reduced durability even under the simplest of failure conditions. To remedy this it does not enforce strict quorum membership and instead it uses a “sloppy quorum”; all read and write operations are performed on the first N healthy nodes from the preference list, which may not always be the first N nodes encountered while walking the consistent hashing ring.
In other words, the cluster has more than N nodes. And in the case of network partitions, writes could be made to nodes outside of the set of top N preferred nodes. In this case, there would be no guarantee that writes and reads over the N nodes would overlap since the nodes that constitute N are in flux. Therefore, the formula R+W>N has no meaning.
For example, if N equals three in a cluster of five nodes (A, B, C, D, and E) and nodes A, B, and C are the top three preferred nodes, then minus any error conditions writes will be made to nodes A, B, and C. But if B and C were not available for a write, then a system using a sloppy quorum would write to D and E instead. If this were to happen, then even if the write quorum was 3 and the read quorum (R) was 2, making R+W>N, a read immediately following this write could return data from B and C, which would be inconsistent because only A, D, and E would have the latest value.
According to the Amazon Dynamo article, Dynamo mitigates this inconsistency through hinted handoffs. If the system needs to write to nodes D and E instead of B and C, it informs D that its write was meant for B and informs E that its write was meant for C. Nodes D and E keep this information in a temporary store and periodically poll B and C for availability. Once B and C become available, D and E send over the writes.
When balancing the tradeoffs between consistency and availability, it is vital to understand how any particular system handles quorums, whether strict or sloppy.
In some cases, I’ve found the literature unclear on this point. In a post on eventual consistency, Werner Vogels writes “If W+R > N, then the write set and the read set always overlap and one can guarantee strong consistency.” While true in a strict quorum system, this statement is not true in the case of Dynamo and not true of systems based on Dynamo that utilize sloppy quorums.
A page in the Riak wiki states that “R + W > N ensures strong consistency in a cluster” and includes a reference to the post by Vogels on eventual consistency. However, a recent Basho posting states that Riak uses sloppy quorums by default, though it uses strict quorums whenever the values of PR and PW are used rather than R and W. Overall, I didn’t find the Riak documentation clear on this important distinction.
Thanks again to the commentator who pointed out my mistake. As I continue my series exploring NoSQL databases, I’ll be more careful to point out where sloppy quorums could affect consistency.