Code explainer

How our rate limiter works

A token-bucket walkthrough — diagram, the four steps, the actual code, and the things that bite you.

~4 min · written for someone reading it once

01At a glance

Each user has a virtual bucket that holds up to N tokens. Every request takes one token. The bucket refills at a fixed rate. If the bucket is empty when a request arrives, the request is rejected with HTTP 429.

Click Send request to see it.

5 / 5 tokens · refilling 1/sec

An HTML artifact can include a real demo. A markdown file can describe one.

02The algorithm in four steps

A request arrives. We look up the user's bucket in Redis (or create one with N tokens if it's the first request).
We refill the bucket: add (now − last_refill) × rate tokens, capped at N.
If the bucket has at least 1 token, we decrement and let the request through.
Otherwise we reject with 429 and a Retry-After header equal to the time until the next token.

03The implementation

Three increasingly correct versions. The first is what most people write, the third is what we actually run.

v1 — naive get/set Broken

const current = await redis.get(key);
const count = current ? parseInt(current, 10) : 0;
if (count >= MAX) return reject();
await redis.set(key, count + 1, 'PX', WINDOW);

Looks fine. Has a race condition: two requests both read count = 99, both pass the check, both write 100. Under load you let through 2× the limit at every boundary.

v2 — atomic incr Better

const count = await redis.incr(key);
if (count === 1) await redis.expire(key, WINDOW_S);
if (count > MAX) return reject();

incr is atomic so no more race. But the expire only sets on the first hit — if that call fails between incr and expire, the key is permanent and the user is locked out forever.

v3 — Lua script Correct

-- INCR + EXPIRE in one atomic operation
local count = redis.call('INCR', KEYS[1])
if count == 1 then
  redis.call('EXPIRE', KEYS[1], ARGV[1])
end
return count

Lua runs as a single Redis command. incr and expire can't separate. Returns the count to the application, which checks against MAX. This is what's in scripts/rate-limit.lua.

04Gotchas

The things that bit us, in order of how much they hurt.

Pre-auth IP keying punishes shared NATs. Office networks, mobile carriers, and VPN'd customer orgs all share an IP. If you limit by IP before authentication, one bad actor on the same NAT can starve everyone else. Limit by user ID after auth; keep a looser per-IP limit upstream for the unauthenticated case.
Decide fail-open vs fail-closed up front. If Redis is down, do you allow requests through (fail-open) or reject them all (fail-closed)? Both are defensible. Pick one, document it, emit a metric. The wrong answer is "we didn't think about it" — which is a 500.
Fixed bucket bursts at the boundary. A 100 req/min bucket lets through 200 requests in the second straddling the window boundary (100 at the end of one window, 100 at the start of the next). For most cases this is fine. If it isn't, switch to a sliding window.
Coordinate across replicas. In-memory rate limiters in a 4-replica deploy don't limit at N — they limit at 4N. Always use Redis (or another shared store) unless you genuinely have one process.
Always set Retry-After. RFC 6585 says you should. Well-behaved clients (and SDKs) honour it and back off correctly. Without it, retries pile on.