I’ve been spending lots of time thinking about and discussing APIs lately.
Eventually, the topic of rate limiting comes up, because an API is an open invitation for people to cause work to be done on your servers. Most of the time, people are polite about it, but a few curious people (and very rarely outright malicious people) will stress the limits of any exposed API.
At a high level, developer API keys and user authentication helps, but ultimately some form of rate limiting becomes necessary.
One scheme I thought up was likely born out of my youth going to local arcades. Back in the stone age, if you wanted to play video games, you had to pester your mother until she drove you to a dedicated place of business. At that point you continue to pester until a conversion of money to quarters or tokens takes place. You would then insert one or more of the tokens into a game of your choice for a few chances at dopamine release.
The rate-limiting model works like the arcades of antiquity with a very generous and patient mother. Each user has a set number of credits in a credit pool. Credits are deducted from the pool each time you hit an API endpoint. These credits have a regeneration rate (X/minute) and also a cap (MAX_CRED). Each endpoint would consume one or more credits.
The trick is that endpoints have a credit cost relative to the resources required. For example, GET methods to items that are easily cached would only cost a few credits. Expensive endpoints, such as multi-server queries with JOINS or POST methods that upload/create resources that take permanent storage would cost an order of magnitude more.
This system allows “spiky” user behavior without arbitrary stalling. Other rate-limiting systems use a time window of 15 to 60 minutes. If an end user exceeds their quota in the first minute of a 15 minute window, they have to wait 14 minutes before doing anything. To differentiate between expensive and cheap actions, each endpoint (or group of endpoints) would have to track its own time window.
With a credit based system, the window can be arbitrarily small. A regeneration rate of 6/minute corresponds to an effective time window of 10 seconds. If an end user blows through CRED_MAX, they aren’t stalled for very long before they can resume inexpensive actions.
In the simplest case, you are only tracking one pool per user, rather than having a time window for every endpoint. An API could could certainly have multiple pools, but it is not required.
Furthermore, the system doesn’t need to actually keep track of every user’s credit balance every minute, but rather just the user’s last known balance at a point in time.
A hypothetical example where credits regenerate at one/minute would be:
Time: Event: Cost: Pool Balance:
00:00 User A has CRED_MAX -- 100
00:10 User A POSTs a new image 20 80
00:10 User A POSTs a new image 20 60
00:10 User A POSTs a new image 20 40
00:20 User A GETs list of images 02 48
When the app reads the credit balance at 00:20, we have a record that states the balance was 40 at time 00:10. The hypothetical getCreditBalanceForUser() function does some math knowing that 10 minutes have passed (during which 10 credits have been regenerated) and returns the current pool balance of 50 which is enough to cover the cost of the GET. There is no need to iterate across all users and increment the credit value every minute.
This system adheres to one of the principles of scalable architectures:
Don’t incur resource costs for actions that aren’t taken.
In this case, no work is being done by the system from time 00:11 ~ 00:19, even though conceptually, credits are regenerating during that time.
One of the advantages of this scheme is that the back-end storage of credits can live happily in Redis or some other memory based system without permanent storage. If the server is reset, then everyone just gets a free play; their credits can be temporarily reset to CRED_MAX. A memory based system is extremely unlikely to be a bottleneck for any given endpoint. Redis is doubly appropriate because of its nice increment and decrement operations and EXPIRE can be used to clear out users that reach the credit cap.
In practice, I’m looking at a very fine-grained production implementation, where CRED_MAX is in the thousands and regeneration rate is near one per second. Cheap, cacheable endpoints cost 5-10 credits and expensive ones are in the double or triple digits. Ultimately, you’ve succeeded if the vast majority of end-users never notice the rate-limiting system at all.
Now, if only I had a quarter for every time my articles caused a dopamine release.
Many thanks to @jedlau, @TimHaines, @jkubicek and @nolancaudill for reading drafts.
The best way to learn is to do.
The other best way is to teach.
A distant third is to hangout with people who do or teach.
Pretty far down in the rankings is to hang out on the internet and read blogs.
Every time I talk to someone about Riak, I mention about how difficult it was to get distributed counters working. Then I mention about how I ended up implementing a impoverished man’s version of CRDTs (Conflict-free Replicated Data Types).
The usual response is along the lines but doesn’t Riak solve that with their read/write quorum functionality?
The answer is no.
This is a surprising answer to a common misperception of Riak. So common that at my very first Riak meetup, I asked a variation of the same question.
In a distributed data store such as Riak, you have two basic kinds of inconsistencies that require resolution:
In case 1, If you are writing a value to a key, that value needs to be replicated to a few other nodes. This is where the read quorum (for fetches) and write quorum (for stores) come into play: While reading or writing, I must have X number of nodes agree on what the correct value is. You can set X to be all the nodes to get a strong certainty of getting the most recent value. If X is 1 then you might give up on getting the latest value in exchange for some improvement to latency. The typical, balanced and default solution is to set X to be a simple majority of nodes.
In case 2, the write quorum has no practical use. If you have a 10-node cluster, and some client writes “flub” to nodeA and some other client writes “biggle” to nodeF, then we have something called siblings. How do you decide who wins? This is where sibling resolution comes into play. There are many, many strategies for this. The simplest involve last-write-wins (which is a good way to lose data if you have a counter).
What happens in practice is that you cannot think about a distributed data store in terms of sets and gets. You need to approach it more like an operation log. A counter works well in a distributed system if you are only adding. Each client simply tags his increment with some arbitrary but unique client ID. The operation is not get x, then set x+1, but rather counterID:ABC:clientID:XYZ:count:+1. Since it’s add-only, if you have multiple counters incrementing at once, they will only modify their own clientID entry. If client PRQ adds 1 to nodeA while client XYZ adds 1, 1 and 1 to nodeF, you would see something like this:
[
counterID:ABC:clientID:PRQ:count:1
counterID:ABC:clientID:XYZ:count:3
]
or posisbly this:
[
counterID:ABC:clientID:PRQ:count:1
counterID:ABC:clientID:XYZ:count:1
counterID:ABC:clientID:XYZ:count:2
counterID:ABC:clientID:XYZ:count:3
]
To get the total count of the ABC counter, simply add up all the counts, taking the highest value per clientID. Both of the above examples resolve to 4.
At this point, we get to be clever. Add-only-counters are great but if you need counters that go up and down, just keep two counters, posABC and negABC and subtract.
At this point, we now have Conflict-free Replicated Data Types:
CRDTs can be thought of as primitive, resolvable operation logs that can be composed into useful data types.
CRDTS are exciting (in the way that only useful mathematical properties are). The add-only-counter and the add-only-set versions of these are relatively straight-forward. Some very smart people have created CRDTs for more complex data types like lists, maps and even directed graphs. Searching for CRDTs should bring you to the work of Marc Shapiro.
If you are patient, the very smart people at Basho are working to integrate CRDTs into Riak itself.
But if you really want to learn this stuff go implement a counter in a three node Riak cluster (make sure allow_mult=true).
I have produced code for over three decades, and it turns out that I am a slow learner. It’s taken me most all of those years to distill the indispensable principle of quality software: Readable code is often more important than correct code.
Code that is understood can be fixed, improved and extended by anyone. This is not always true of code that is merely correct.
It is far easier to make code understandable than to make sure it is perfectly correct. Making code readable for the unfortunate soul who needs to touch it is the very core of being maintainable. It’s the difference between being able to fix/modify/refactor vs deciding it needs to be rewritten (and the unfortunate expenditure of time, capital, opportunity cost and lost experience that rewriting code entails).
It’s important to remember that the ultimate goal is not code that contains function names filled with prepositional phrases, super descriptive variable names or long blocks of comments. When I propose a Culture of Readability, the idea is that any developer should be able to rapidly understand any length of code. Ideally, the overall point and structure should be obvious. Any subtleties or curious design decisions should be explained.
Time to Understanding Metric (TTU) is defined as “How quickly can you understand a segment of code?”
The tools at your disposal are many. Naming and comments are the primary vectors of conveyance. Overall software architecture can lead to better readability. The importance of documentations rises as the number of people who are looking at a given codebase increases. Humor can often be used as a neurological hack to make important points stick. As a last resort, create a Hat of Shame to bestow on those whose commits to master lower the average TTU.
Even though the discipline required to keep a high TTU has a cost, it ends up being a tremendous productivity multiplier over the long term. You will see the time it takes to debug, refactor and add new features shorten. Ultimately this means that you can spend more time improving your product and less time tracing existing code and swearing at those who have come before you.
One of my favorite anecdotes comes from an Apple engineer I met during WWDC 1999. (Ironically, this Apple engineer’s nickname was support droid… probably not as cool now as it was then.)
The story goes that during the rather oppressive reign of System 7.5.3 for the Mac, one of the most common user complaints was that boot time was too long. The very next version featured a fancy rename to Mac OS 7.6. It also saw the boot time complaint plummet out of the top five into the high teens or so.
When asked what they did at the time to improve boot time, the engineer said “Not a damn thing”. Apparently, the mere name change was enough to hypnotize users into semi-lucid trance state while staring at the extensions loading across the bottom of the splash screen.
In actuality, Mac OS 7.6 just crashed far less than it’s multi-pointed predecessor. Users just weren’t needing to reboot as often.
Many, many times during the product development process, you are going to get customer feedback. The worst possible course of action would be to take their requests, demands, and thinly-veiled attempts at blackmail and implement them. The endgoal is to fundamentally understand the customer’s needs and problems. The disheartening alternative is to find that you have wasted time and capital on the wrong problem.
Always, always listen to your customers, but don’t always do what they say.