How our rate limiter works
A token-bucket walkthrough — diagram, the four steps, the actual code, and the things that bite you.
~4 min · written for someone reading it once
01At a glance
Each user has a virtual bucket that holds up to N tokens. Every request takes one token. The bucket refills at a fixed rate. If the bucket is empty when a request arrives, the request is rejected with HTTP 429.
Click Send request to see it.
5 / 5 tokens · refilling 1/sec
An HTML artifact can include a real demo. A markdown file can describe one.
02The algorithm in four steps
- A request arrives. We look up the user's bucket in Redis (or create one with N tokens if it's the first request).
- We refill the bucket: add
(now − last_refill) × ratetokens, capped at N. - If the bucket has at least 1 token, we decrement and let the request through.
- Otherwise we reject with 429 and a
Retry-Afterheader equal to the time until the next token.
03The implementation
Three increasingly correct versions. The first is what most people write, the third is what we actually run.
v1 — naive get/set
Broken
const current = await redis.get(key); const count = current ? parseInt(current, 10) : 0; if (count >= MAX) return reject(); await redis.set(key, count + 1, 'PX', WINDOW);
Looks fine. Has a race condition: two requests both read
count = 99, both pass the check, both write 100. Under load you let through 2× the limit at every boundary.
v2 — atomic incr
Better
const count = await redis.incr(key); if (count === 1) await redis.expire(key, WINDOW_S); if (count > MAX) return reject();
incr is atomic so no more race. But the expire only sets on the first hit — if that call fails between incr and expire, the key is permanent and the user is locked out forever.
v3 — Lua script
Correct
-- INCR + EXPIRE in one atomic operation
local count = redis.call('INCR', KEYS[1])
if count == 1 then
redis.call('EXPIRE', KEYS[1], ARGV[1])
end
return count
Lua runs as a single Redis command.
incr and expire can't separate. Returns the count to the application, which checks against MAX. This is what's in scripts/rate-limit.lua.
04Gotchas
The things that bit us, in order of how much they hurt.
- Pre-auth IP keying punishes shared NATs. Office networks, mobile carriers, and VPN'd customer orgs all share an IP. If you limit by IP before authentication, one bad actor on the same NAT can starve everyone else. Limit by user ID after auth; keep a looser per-IP limit upstream for the unauthenticated case.
- Decide fail-open vs fail-closed up front. If Redis is down, do you allow requests through (fail-open) or reject them all (fail-closed)? Both are defensible. Pick one, document it, emit a metric. The wrong answer is "we didn't think about it" — which is a 500.
- Fixed bucket bursts at the boundary. A 100 req/min bucket lets through 200 requests in the second straddling the window boundary (100 at the end of one window, 100 at the start of the next). For most cases this is fine. If it isn't, switch to a sliding window.
- Coordinate across replicas. In-memory rate limiters in a 4-replica deploy don't limit at N — they limit at 4N. Always use Redis (or another shared store) unless you genuinely have one process.
-
Always set
Retry-After. RFC 6585 says you should. Well-behaved clients (and SDKs) honour it and back off correctly. Without it, retries pile on.