Rate Limiter Low-Level Design
Overview
Design a rate limiter system that restricts the number of requests a client can make to an API within a specified time window. This is crucial for preventing abuse, ensuring fair resource allocation, and protecting backend services from overload.
Requirements
Functional Requirements
- Limit requests per user/IP/API key
- Support multiple time windows (second, minute, hour, day)
- Different rate limits for different API endpoints
- Allow burst traffic within limits
- Provide clear error responses when limit exceeded
- Support distributed systems
Non-Functional Requirements
- Low latency (<10ms overhead)
- High throughput (handle 100K+ requests/sec)
- Accurate counting
- Fault tolerant
- Scalable horizontally
Use Cases
- API rate limiting (1000 requests/hour per user)
- Login attempt limiting (5 attempts per minute)
- Message sending limits (100 SMS per day)
- File upload limits (10 uploads per hour)
Rate Limiting Algorithms
1. Token Bucket
Concept:
- Bucket holds tokens (capacity = max requests)
- Tokens added at fixed rate
- Each request consumes one token
- Request rejected if no tokens available
Implementation:
1public class TokenBucket {
2 private final long capacity;
3 private final long refillRate; // tokens per second
4 private long availableTokens;
5 private long lastRefillTime;
6
7 public TokenBucket(long capacity, long refillRate) {
8 this.capacity = capacity;
9 this.refillRate = refillRate;
10 this.availableTokens = capacity;
11 this.lastRefillTime = System.currentTimeMillis();
12 }
13
14 public synchronized boolean allowRequest() {
15 refill();
16
17 if (availableTokens > 0) {
18 availableTokens--;
19 return true;
20 }
21 return false;
22 }
23
24 private void refill() {
25 long now = System.currentTimeMillis();
26 long elapsedTime = now - lastRefillTime;
27 long tokensToAdd = (elapsedTime * refillRate) / 1000;
28
29 availableTokens = Math.min(capacity, availableTokens + tokensToAdd);
30 lastRefillTime = now;
31 }
32
33 public long getWaitTime() {
34 if (availableTokens > 0) return 0;
35 return (1000 / refillRate); // milliseconds
36 }
37}Pros:
- Allows burst traffic
- Smooth rate limiting
- Memory efficient
Cons:
- Complex to implement
- Requires timestamp tracking
2. Leaky Bucket
Concept:
- Requests enter bucket at any rate
- Processed at fixed rate (leak rate)
- Bucket has maximum capacity
- Overflow requests rejected
Implementation:
1public class LeakyBucket {
2 private final long capacity;
3 private final long leakRate; // requests per second
4 private Queue<Long> requestQueue;
5
6 public LeakyBucket(long capacity, long leakRate) {
7 this.capacity = capacity;
8 this.leakRate = leakRate;
9 this.requestQueue = new LinkedList<>();
10 }
11
12 public synchronized boolean allowRequest() {
13 leak();
14
15 if (requestQueue.size() < capacity) {
16 requestQueue.offer(System.currentTimeMillis());
17 return true;
18 }
19 return false;
20 }
21
22 private void leak() {
23 long now = System.currentTimeMillis();
24 long leakInterval = 1000 / leakRate; // ms per request
25
26 while (!requestQueue.isEmpty()) {
27 long oldestRequest = requestQueue.peek();
28 if (now - oldestRequest >= leakInterval) {
29 requestQueue.poll();
30 } else {
31 break;
32 }
33 }
34 }
35}Pros:
- Smooth output rate
- Protects downstream services
Cons:
- Burst traffic rejected
- Higher memory usage
3. Fixed Window Counter
Concept:
- Divide time into fixed windows (e.g., 1-minute windows)
- Count requests in current window
- Reset counter at window boundary
- Reject if count exceeds limit
Implementation:
1public class FixedWindowCounter {
2 private final long windowSize; // milliseconds
3 private final long maxRequests;
4 private long windowStart;
5 private long requestCount;
6
7 public FixedWindowCounter(long windowSize, long maxRequests) {
8 this.windowSize = windowSize;
9 this.maxRequests = maxRequests;
10 this.windowStart = System.currentTimeMillis();
11 this.requestCount = 0;
12 }
13
14 public synchronized boolean allowRequest() {
15 long now = System.currentTimeMillis();
16
17 // Check if window expired
18 if (now - windowStart >= windowSize) {
19 windowStart = now;
20 requestCount = 0;
21 }
22
23 if (requestCount < maxRequests) {
24 requestCount++;
25 return true;
26 }
27 return false;
28 }
29
30 public long getRemainingRequests() {
31 return Math.max(0, maxRequests - requestCount);
32 }
33}Pros:
- Simple to implement
- Memory efficient
- Easy to understand
Cons:
- Boundary spike issue (2x traffic at window edge)
- Not accurate for short periods
4. Sliding Window Log
Concept:
- Store timestamp of each request
- Remove timestamps outside window
- Count remaining timestamps
- Reject if count exceeds limit
Implementation:
1public class SlidingWindowLog {
2 private final long windowSize; // milliseconds
3 private final long maxRequests;
4 private TreeMap<Long, Integer> requestLog; // timestamp -> count
5
6 public SlidingWindowLog(long windowSize, long maxRequests) {
7 this.windowSize = windowSize;
8 this.maxRequests = maxRequests;
9 this.requestLog = new TreeMap<>();
10 }
11
12 public synchronized boolean allowRequest() {
13 long now = System.currentTimeMillis();
14 long windowStart = now - windowSize;
15
16 // Remove old entries
17 requestLog.headMap(windowStart).clear();
18
19 // Count requests in window
20 long totalRequests = requestLog.values().stream()
21 .mapToLong(Long::longValue)
22 .sum();
23
24 if (totalRequests < maxRequests) {
25 requestLog.put(now, requestLog.getOrDefault(now, 0L) + 1);
26 return true;
27 }
28 return false;
29 }
30}Pros:
- Very accurate
- No boundary issues
Cons:
- High memory usage
- Expensive cleanup operations
5. Sliding Window Counter
Concept:
- Hybrid of fixed window and sliding window
- Use previous and current window
- Weight based on overlap
Implementation:
1public class SlidingWindowCounter {
2 private final long windowSize;
3 private final long maxRequests;
4 private long prevWindowCount;
5 private long currWindowCount;
6 private long currWindowStart;
7
8 public SlidingWindowCounter(long windowSize, long maxRequests) {
9 this.windowSize = windowSize;
10 this.maxRequests = maxRequests;
11 this.prevWindowCount = 0;
12 this.currWindowCount = 0;
13 this.currWindowStart = System.currentTimeMillis();
14 }
15
16 public synchronized boolean allowRequest() {
17 long now = System.currentTimeMillis();
18
19 // Check if new window started
20 if (now - currWindowStart >= windowSize) {
21 prevWindowCount = currWindowCount;
22 currWindowCount = 0;
23 currWindowStart = now;
24 }
25
26 // Calculate weighted count
27 long timeIntoWindow = now - currWindowStart;
28 double prevWeight = 1.0 - ((double) timeIntoWindow / windowSize);
29 long estimatedCount = (long) (prevWindowCount * prevWeight) + currWindowCount;
30
31 if (estimatedCount < maxRequests) {
32 currWindowCount++;
33 return true;
34 }
35 return false;
36 }
37}Pros:
- Smooth rate limiting
- Memory efficient
- Good approximation
Cons:
- Not 100% accurate
- Slightly complex
Distributed Rate Limiter
Redis-based Implementation
1public class RedisRateLimiter {
2 private final Jedis redis;
3 private final String keyPrefix = "rate_limit:";
4
5 public boolean allowRequest(String userId, int maxRequests, int windowSeconds) {
6 String key = keyPrefix + userId;
7 long now = System.currentTimeMillis();
8 long windowStart = now - (windowSeconds * 1000);
9
10 Transaction multi = redis.multi();
11
12 // Remove old entries
13 multi.zremrangeByScore(key, 0, windowStart);
14
15 // Count current requests
16 multi.zcard(key);
17
18 // Add current request
19 multi.zadd(key, now, String.valueOf(now));
20
21 // Set expiry
22 multi.expire(key, windowSeconds);
23
24 List<Object> results = multi.exec();
25
26 long count = (Long) results.get(1);
27 return count < maxRequests;
28 }
29}Token Bucket with Redis
1-- Redis Lua script for token bucket
2local key = KEYS[1]
3local capacity = tonumber(ARGV[1])
4local refill_rate = tonumber(ARGV[2])
5local requested = tonumber(ARGV[3])
6local now = tonumber(ARGV[4])
7
8local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
9local tokens = tonumber(bucket[1]) or capacity
10local last_refill = tonumber(bucket[2]) or now
11
12-- Refill tokens
13local elapsed = now - last_refill
14local tokens_to_add = elapsed * refill_rate / 1000
15tokens = math.min(capacity, tokens + tokens_to_add)
16
17-- Try to consume tokens
18if tokens >= requested then
19 tokens = tokens - requested
20 redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
21 redis.call('EXPIRE', key, 3600)
22 return 1
23else
24 return 0
25endClass Design
// Configuration public class RateLimitConfig { private long maxRequests; private long windowSize; // milliseconds private RateLimitAlgorithm algorithm;
1public RateLimitConfig(long maxRequests, long windowSize,
2 RateLimitAlgorithm algorithm) {
3 this.maxRequests = maxRequests;
4 this.windowSize = windowSize;
5 this.algorithm = algorithm;
6}
7
8// Getters}
// Algorithm enum public enum RateLimitAlgorithm { TOKEN_BUCKET, LEAKY_BUCKET, FIXED_WINDOW, SLIDING_WINDOW_LOG, SLIDING_WINDOW_COUNTER }
// Factory public class RateLimiterFactory { public static RateLimiter create(RateLimitConfig config) { switch (config.getAlgorithm()) { case TOKEN_BUCKET: return new TokenBucketRateLimiter(config); case LEAKY_BUCKET: return new LeakyBucketRateLimiter(config); case FIXED_WINDOW: return new FixedWindowRateLimiter(config); case SLIDING_WINDOW_LOG: return new SlidingWindowLogRateLimiter(config); case SLIDING_WINDOW_COUNTER: return new SlidingWindowCounterRateLimiter(config); default: throw new IllegalArgumentException("Unknown algorithm"); } } }
// Distributed implementation public class DistributedRateLimiter implements RateLimiter { private final RedisClient redis; private final RateLimitConfig config;
1public DistributedRateLimiter(RedisClient redis, RateLimitConfig config) {
2 this.redis = redis;
3 this.config = config;
4}
5
6@Override
7public boolean allowRequest(String identifier) {
8 // Use Redis for distributed state
9 return executeRedisScript(identifier);
10}
11
12private boolean executeRedisScript(String identifier) {
13 // Lua script execution
14 // ...
15 return true;
16}}
1## Multi-tier Rate Limiting
2
3```java
4public class MultiTierRateLimiter implements RateLimiter {
5 private Map<String, RateLimiter> tiers;
6
7 public MultiTierRateLimiter() {
8 this.tiers = new HashMap<>();
9
10 // Per-second limit
11 tiers.put("second", new TokenBucket(10, 10));
12
13 // Per-minute limit
14 tiers.put("minute", new TokenBucket(100, 100.0/60));
15
16 // Per-hour limit
17 tiers.put("hour", new TokenBucket(1000, 1000.0/3600));
18 }
19
20 @Override
21 public boolean allowRequest(String identifier) {
22 // All tiers must allow the request
23 for (RateLimiter limiter : tiers.values()) {
24 if (!limiter.allowRequest(identifier)) {
25 return false;
26 }
27 }
28 return true;
29 }
30}API Response
1public class RateLimitResponse {
2 private boolean allowed;
3 private long remainingRequests;
4 private long resetTime;
5 private long retryAfter; // seconds
6
7 public Map<String, String> toHeaders() {
8 Map<String, String> headers = new HashMap<>();
9 headers.put("X-RateLimit-Limit", String.valueOf(maxRequests));
10 headers.put("X-RateLimit-Remaining", String.valueOf(remainingRequests));
11 headers.put("X-RateLimit-Reset", String.valueOf(resetTime));
12 if (!allowed) {
13 headers.put("Retry-After", String.valueOf(retryAfter));
14 }
15 return headers;
16 }
17}Testing Strategy
1@Test
2public void testTokenBucket_allowsBurstTraffic() {
3 TokenBucket bucket = new TokenBucket(10, 1);
4
5 // Burst of 10 requests should succeed
6 for (int i = 0; i < 10; i++) {
7 assertTrue(bucket.allowRequest());
8 }
9
10 // 11th request should fail
11 assertFalse(bucket.allowRequest());
12
13 // Wait for refill
14 Thread.sleep(1000);
15
16 // Should allow one more request
17 assertTrue(bucket.allowRequest());
18}
19
20@Test
21public void testFixedWindow_handlesWindowBoundary() {
22 FixedWindowCounter counter = new FixedWindowCounter(1000, 5);
23
24 // Fill up window
25 for (int i = 0; i < 5; i++) {
26 assertTrue(counter.allowRequest());
27 }
28
29 assertFalse(counter.allowRequest());
30
31 // Wait for new window
32 Thread.sleep(1000);
33
34 // Should allow requests in new window
35 assertTrue(counter.allowRequest());
36}Design Patterns Used
- Strategy Pattern: Different rate limiting algorithms
- Factory Pattern: Create rate limiter instances
- Decorator Pattern: Add logging, metrics
- Singleton Pattern: Shared rate limiter instances
Trade-offs
| Algorithm | Accuracy | Memory | Burst Support | Complexity |
|---|---|---|---|---|
| Token Bucket | Good | Low | Yes | Medium |
| Leaky Bucket | Good | Medium | No | Medium |
| Fixed Window | Low | Very Low | No | Low |
| Sliding Window Log | Excellent | High | No | High |
| Sliding Window Counter | Good | Low | No | Medium |
💡 Interview Tips & Out-of-the-Box Thinking
Common Pitfalls
- Fixed window boundary issue: Users can do 2x limit by splitting requests across window boundary
- Not handling distributed systems: Single-server doesn't work for load-balanced systems (use Redis)
- Memory leaks: Not cleaning up old entries in sliding window log
- Race conditions: Multiple threads checking/updating counter simultaneously
Algorithm Selection
- Token Bucket: Burst traffic (API with occasional spikes) - Most popular
- Leaky Bucket: Steady output (video streaming, message queues)
- Fixed Window: Simplest but inaccurate (non-critical use cases)
- Sliding Window Counter: Good balance (accurate + low memory)
- Sliding Log: Most accurate but memory intensive (financial transactions)
Advanced Considerations
- Distributed limiting: Redis with atomic INCR + EXPIRE
- Multi-level: Global + per-user + per-IP
- Adaptive: Adjust limits based on system load
- Quota vs Rate: Daily limit vs requests/second
- Hierarchical: Organization → Team → User cascading limits
Creative Solutions
- Tiered: Free (100/hr) vs Paid (10K/hr)
- Cost-based: GET = 1 cost, complex POST = 10 cost
- Time-varying: Higher limits off-peak
- Token carry-forward: Unused tokens from last window carry over
- Priority queue: Premium requests bypass limit
Edge Cases
- Clock skew: Use Redis TIME for consistent timestamp
- Thundering herd: Limit resets at midnight - jittered reset times
- Anonymous users: Rate limit by IP with higher threshold
- Shared IP/NAT: Office with 100 users - combine IP + User-Agent
- Request bursts: Token bucket with max burst size
Follow-up Questions
- How to handle clock skew in distributed systems?
- How to implement rate limiting for anonymous users?
- How to support dynamic rate limit updates?
- How to implement fair queuing for rate-limited requests?
- How to handle rate limiting for websocket connections?