more info ⬇


subscribe for more
stuff like this:

SW engineering, engineering management and the business of software

2012 09 07

Riak's Two Contentions and CRDTs

Every time I talk to someone about Riak, I mention about how difficult it was to get distributed counters working. Then I mention about how I ended up implementing a impoverished man’s version of CRDTs (Conflict-free Replicated Data Types).

The usual response is along the lines but doesn’t Riak solve that with their read/write quorum functionality?

The answer is no.

This is a surprising answer to a common misperception of Riak. So common that at my very first Riak meetup, I asked a variation of the same question.

In a distributed data store such as Riak, you have two basic kinds of inconsistencies that require resolution:

  1. Simultaneous Read and Write
  2. Simultaneous Write and Write

In case 1, If you are writing a value to a key, that value needs to be replicated to a few other nodes. This is where the read quorum (for fetches) and write quorum (for stores) come into play: While reading or writing, I must have X number of nodes agree on what the correct value is. You can set X to be all the nodes to get a strong certainty of getting the most recent value. If X is 1 then you might give up on getting the latest value in exchange for some improvement to latency. The typical, balanced and default solution is to set X to be a simple majority of nodes.

In case 2, the write quorum has no practical use. If you have a 10-node cluster, and some client writes “flub” to nodeA and some other client writes “biggle” to nodeF, then we have something called siblings. How do you decide who wins? This is where sibling resolution comes into play. There are many, many strategies for this. The simplest involve last-write-wins (which is a good way to lose data if you have a counter).

What happens in practice is that you cannot think about a distributed data store in terms of sets and gets. You need to approach it more like an operation log. A counter works well in a distributed system if you are only adding. Each client simply tags his increment with some arbitrary but unique client ID. The operation is not get x, then set x+1, but rather counterID:ABC:clientID:XYZ:count:+1. Since it’s add-only, if you have multiple counters incrementing at once, they will only modify their own clientID entry. If client PRQ adds 1 to nodeA while client XYZ adds 1, 1 and 1 to nodeF, you would see something like this:


or posisbly this:


To get the total count of the ABC counter, simply add up all the counts, taking the highest value per clientID. Both of the above examples resolve to 4.

At this point, we get to be clever. Add-only-counters are great but if you need counters that go up and down, just keep two counters, posABC and negABC and subtract.

At this point, we now have Conflict-free Replicated Data Types:

CRDTs can be thought of as primitive, resolvable operation logs that can be composed into useful data types.

CRDTS are exciting (in the way that only useful mathematical properties are). The add-only-counter and the add-only-set versions of these are relatively straight-forward. Some very smart people have created CRDTs for more complex data types like lists, maps and even directed graphs. Searching for CRDTs should bring you to the work of Marc Shapiro.

If you are patient, the very smart people at Basho are working to integrate CRDTs into Riak itself.

But if you really want to learn this stuff go implement a counter in a three node Riak cluster (make sure allow_mult=true).

you may also be interested in some of the greatest hits of

〜 You Should Foster a Culture of Readability

〜 The Customer's Semi-Lucid Trance State

〜 ARC Best Practices

〜 Weighted Credit Pools for API Rate Limiting

the fine print:
© matt nunogawa 2010 - 2017
back ⬆