In previous posts, I explored MongoDB and Redis. MongoDB, a document-oriented database, allows developers to work with JSON documents in interesting ways. Redis offers developers a set of fundamental data structures upon which to build applications. Riak, in contrast, offers only the simplest of interfaces: put a key-value pair and get a value by its key. Values are simple binary strings, of which Riak understands nothing. It cannot operate on these values as JSON objects or sets or hashes. It cannot even increment a scalar numeric value. But Riak does provide developers with the power to fine tune the trade-offs between consistency, availability, and partition tolerance.
To understand the why of Riak, let us review the CAP Theorem, the conjecture that distributed databases must sacrifice one of three desirable characteristics: consistency, availability, or partition tolerance. Partitioning, in this context, means a network failure that prevents some of the nodes in a database from communicating with others even though each node may remain accessible to some clients. At a large enough scale of distribution, with hundreds or thousands of nodes scattered around the globe, the possibility of partitioning becomes real enough that it must be anticipated. And if partitioning is anticipated, it follows that it may not always be possible to distribute a write across all nodes such that the value remains consistent. And if a write cannot be distributed across all nodes, the database must either reject the write, making the system unavailable, or commit the write to only a subset of nodes, making the system inconsistent.
So scalability demands distribution, and distribution demands trade-offs, trade-offs that developers must understand, control, and accommodate. Riak enables developers to fine tune these trade-offs through three numeric dials: N, W, and R. Each value in Riak will be replicated to N nodes. Each write will be considered a success once the data has been replicated to W nodes. And each read will return successfully after results are returned from R nodes. Adjusting these three values creates data stores of fundamentally different natures.
The difference by which N exceeds W defines how many nodes may become unavailable before a write would fail. The greater this difference, the greater the availability, but the greater the likelihood of inconsistency. Likewise, the difference by which N exceeds R defines how many nodes may become unavailable before a read would fail. The lower R, the greater the availability, the greater the load capacity, the lower the latency, but the greater the chance of inconsistency. If W plus R is greater than N, then the database achieves a form of consistency; because of the overlap, the read will include at least one node holding the latest value. [See subsequent post that clarifies this point.]
If a network partitions such that more than one partition contains at least W servers, it is possible for more than one client to update the same value, creating separate but valid versions. Riak refers to these forked versions as siblings. And while Riak can resolve the conflict in favor of the latest write, it could also be configured to return multiple siblings, leaving the work of resolving the conflict to the client application, which makes sense because the client understands the business logic.
And how does Riak distribute values to N nodes? Riak hashes each key using a consistent algorithm, producing an integer that falls within one of many equally sized partitions of a 160-bit integer space, an addressing system Riak refers to as a ring. Each Riak vnode (one physical node hosts multiple vnodes) claims one partition, making it responsible for storing values with key hashes falling within the address range of that partition. Multiple vnodes will claim the same partition, which forms the basis of replication. Each node tracks the vnodes available in the cluster and their respective partitions, making it possible for any node to direct a write across the necessary number of vnodes.
To get started setting up a development server and exploring the API, I suggest following along with the Riak fast track tutorial.