back to home

Token Bucket Rate Limiting

Rate limiting prevents resource exhaustion when clients flood servers with requests. Without it, a single malicious or buggy client takes down your API.

How it works

Each user gets a bucket with fixed capacity. Tokens refill at constant rate (e.g., 2 tokens/sec). Incoming requests consume tokens. If tokens available → pass. If empty → denied.

Example

Capacity = 4, Rate = 2/sec, starting with 4 tokens

t=0s:  4 requests → all pass (4→0 tokens)
t=0.5s: 1 request → denied (only 1 refilled)
t=1s:  1 request → pass (2→1 tokens)
t=2s:  3 requests → 2 pass, 1 denied (4→2→0, last denied)

Local implementation

time_elapsed = (now - last_refill).total_seconds()
tokens = min(capacity, tokens + time_elapsed * rate)

if tokens >= requested:
    tokens -= requested
    return "allowed"
else:
    return "denied"

Store per-user: tokens and last_refill timestamp.

Distributed systems

Multiple servers hitting shared Redis creates race conditions:

Server A: reads count=99
Server B: reads count=99
Both write 100 → limit broken (101 requests passed)

Solution: Redis Lua scripts execute atomically

local tokens = tonumber(redis.call('HGET', key, 'tokens')) or capacity
local last_refill = tonumber(redis.call('HGET', key, 'last_refill')) or now

local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * rate)

if tokens >= requested then
    tokens = tokens - requested
    redis.call('HSET', key, 'tokens', tokens)
    redis.call('HSET', key, 'last_refill', now)
    return 1
else
    return 0
end

Entire script runs as single atomic operation. No race conditions.

Trade-offs

Pros: Simple to implement, controlled bursts up to capacity

Cons: Allows burst abuse (user consumes all tokens instantly), memory overhead (per-user state), cold start gives full bucket

Alternative: Leaky bucket enforces constant rate with no bursts.

Code

Local: plain_token_bucket.py

Redis: redis_token_bucket.py

Key Takeaways

back to all posts