System Design

Rate Limiter

20 min

Rate Limiter

Rate Limiter Low-Level Design

Overview

Design a rate limiter system that restricts the number of requests a client can make to an API within a specified time window. This is crucial for preventing abuse, ensuring fair resource allocation, and protecting backend services from overload.

Requirements

Functional Requirements

  • Limit requests per user/IP/API key
  • Support multiple time windows (second, minute, hour, day)
  • Different rate limits for different API endpoints
  • Allow burst traffic within limits
  • Provide clear error responses when limit exceeded
  • Support distributed systems

Non-Functional Requirements

  • Low latency (<10ms overhead)
  • High throughput (handle 100K+ requests/sec)
  • Accurate counting
  • Fault tolerant
  • Scalable horizontally

Use Cases

  • API rate limiting (1000 requests/hour per user)
  • Login attempt limiting (5 attempts per minute)
  • Message sending limits (100 SMS per day)
  • File upload limits (10 uploads per hour)

Rate Limiting Algorithms

1. Token Bucket

Concept:

  • Bucket holds tokens (capacity = max requests)
  • Tokens added at fixed rate
  • Each request consumes one token
  • Request rejected if no tokens available

Implementation:

java
1public class TokenBucket { 2 private final long capacity; 3 private final long refillRate; // tokens per second 4 private long availableTokens; 5 private long lastRefillTime; 6 7 public TokenBucket(long capacity, long refillRate) { 8 this.capacity = capacity; 9 this.refillRate = refillRate; 10 this.availableTokens = capacity; 11 this.lastRefillTime = System.currentTimeMillis(); 12 } 13 14 public synchronized boolean allowRequest() { 15 refill(); 16 17 if (availableTokens > 0) { 18 availableTokens--; 19 return true; 20 } 21 return false; 22 } 23 24 private void refill() { 25 long now = System.currentTimeMillis(); 26 long elapsedTime = now - lastRefillTime; 27 long tokensToAdd = (elapsedTime * refillRate) / 1000; 28 29 availableTokens = Math.min(capacity, availableTokens + tokensToAdd); 30 lastRefillTime = now; 31 } 32 33 public long getWaitTime() { 34 if (availableTokens > 0) return 0; 35 return (1000 / refillRate); // milliseconds 36 } 37}

Pros:

  • Allows burst traffic
  • Smooth rate limiting
  • Memory efficient

Cons:

  • Complex to implement
  • Requires timestamp tracking

2. Leaky Bucket

Concept:

  • Requests enter bucket at any rate
  • Processed at fixed rate (leak rate)
  • Bucket has maximum capacity
  • Overflow requests rejected

Implementation:

java
1public class LeakyBucket { 2 private final long capacity; 3 private final long leakRate; // requests per second 4 private Queue<Long> requestQueue; 5 6 public LeakyBucket(long capacity, long leakRate) { 7 this.capacity = capacity; 8 this.leakRate = leakRate; 9 this.requestQueue = new LinkedList<>(); 10 } 11 12 public synchronized boolean allowRequest() { 13 leak(); 14 15 if (requestQueue.size() < capacity) { 16 requestQueue.offer(System.currentTimeMillis()); 17 return true; 18 } 19 return false; 20 } 21 22 private void leak() { 23 long now = System.currentTimeMillis(); 24 long leakInterval = 1000 / leakRate; // ms per request 25 26 while (!requestQueue.isEmpty()) { 27 long oldestRequest = requestQueue.peek(); 28 if (now - oldestRequest >= leakInterval) { 29 requestQueue.poll(); 30 } else { 31 break; 32 } 33 } 34 } 35}

Pros:

  • Smooth output rate
  • Protects downstream services

Cons:

  • Burst traffic rejected
  • Higher memory usage

3. Fixed Window Counter

Concept:

  • Divide time into fixed windows (e.g., 1-minute windows)
  • Count requests in current window
  • Reset counter at window boundary
  • Reject if count exceeds limit

Implementation:

java
1public class FixedWindowCounter { 2 private final long windowSize; // milliseconds 3 private final long maxRequests; 4 private long windowStart; 5 private long requestCount; 6 7 public FixedWindowCounter(long windowSize, long maxRequests) { 8 this.windowSize = windowSize; 9 this.maxRequests = maxRequests; 10 this.windowStart = System.currentTimeMillis(); 11 this.requestCount = 0; 12 } 13 14 public synchronized boolean allowRequest() { 15 long now = System.currentTimeMillis(); 16 17 // Check if window expired 18 if (now - windowStart >= windowSize) { 19 windowStart = now; 20 requestCount = 0; 21 } 22 23 if (requestCount < maxRequests) { 24 requestCount++; 25 return true; 26 } 27 return false; 28 } 29 30 public long getRemainingRequests() { 31 return Math.max(0, maxRequests - requestCount); 32 } 33}

Pros:

  • Simple to implement
  • Memory efficient
  • Easy to understand

Cons:

  • Boundary spike issue (2x traffic at window edge)
  • Not accurate for short periods

4. Sliding Window Log

Concept:

  • Store timestamp of each request
  • Remove timestamps outside window
  • Count remaining timestamps
  • Reject if count exceeds limit

Implementation:

java
1public class SlidingWindowLog { 2 private final long windowSize; // milliseconds 3 private final long maxRequests; 4 private TreeMap<Long, Integer> requestLog; // timestamp -> count 5 6 public SlidingWindowLog(long windowSize, long maxRequests) { 7 this.windowSize = windowSize; 8 this.maxRequests = maxRequests; 9 this.requestLog = new TreeMap<>(); 10 } 11 12 public synchronized boolean allowRequest() { 13 long now = System.currentTimeMillis(); 14 long windowStart = now - windowSize; 15 16 // Remove old entries 17 requestLog.headMap(windowStart).clear(); 18 19 // Count requests in window 20 long totalRequests = requestLog.values().stream() 21 .mapToLong(Long::longValue) 22 .sum(); 23 24 if (totalRequests < maxRequests) { 25 requestLog.put(now, requestLog.getOrDefault(now, 0L) + 1); 26 return true; 27 } 28 return false; 29 } 30}

Pros:

  • Very accurate
  • No boundary issues

Cons:

  • High memory usage
  • Expensive cleanup operations

5. Sliding Window Counter

Concept:

  • Hybrid of fixed window and sliding window
  • Use previous and current window
  • Weight based on overlap

Implementation:

java
1public class SlidingWindowCounter { 2 private final long windowSize; 3 private final long maxRequests; 4 private long prevWindowCount; 5 private long currWindowCount; 6 private long currWindowStart; 7 8 public SlidingWindowCounter(long windowSize, long maxRequests) { 9 this.windowSize = windowSize; 10 this.maxRequests = maxRequests; 11 this.prevWindowCount = 0; 12 this.currWindowCount = 0; 13 this.currWindowStart = System.currentTimeMillis(); 14 } 15 16 public synchronized boolean allowRequest() { 17 long now = System.currentTimeMillis(); 18 19 // Check if new window started 20 if (now - currWindowStart >= windowSize) { 21 prevWindowCount = currWindowCount; 22 currWindowCount = 0; 23 currWindowStart = now; 24 } 25 26 // Calculate weighted count 27 long timeIntoWindow = now - currWindowStart; 28 double prevWeight = 1.0 - ((double) timeIntoWindow / windowSize); 29 long estimatedCount = (long) (prevWindowCount * prevWeight) + currWindowCount; 30 31 if (estimatedCount < maxRequests) { 32 currWindowCount++; 33 return true; 34 } 35 return false; 36 } 37}

Pros:

  • Smooth rate limiting
  • Memory efficient
  • Good approximation

Cons:

  • Not 100% accurate
  • Slightly complex

Distributed Rate Limiter

Redis-based Implementation

java
1public class RedisRateLimiter { 2 private final Jedis redis; 3 private final String keyPrefix = "rate_limit:"; 4 5 public boolean allowRequest(String userId, int maxRequests, int windowSeconds) { 6 String key = keyPrefix + userId; 7 long now = System.currentTimeMillis(); 8 long windowStart = now - (windowSeconds * 1000); 9 10 Transaction multi = redis.multi(); 11 12 // Remove old entries 13 multi.zremrangeByScore(key, 0, windowStart); 14 15 // Count current requests 16 multi.zcard(key); 17 18 // Add current request 19 multi.zadd(key, now, String.valueOf(now)); 20 21 // Set expiry 22 multi.expire(key, windowSeconds); 23 24 List<Object> results = multi.exec(); 25 26 long count = (Long) results.get(1); 27 return count < maxRequests; 28 } 29}

Token Bucket with Redis

lua
1-- Redis Lua script for token bucket 2local key = KEYS[1] 3local capacity = tonumber(ARGV[1]) 4local refill_rate = tonumber(ARGV[2]) 5local requested = tonumber(ARGV[3]) 6local now = tonumber(ARGV[4]) 7 8local bucket = redis.call('HMGET', key, 'tokens', 'last_refill') 9local tokens = tonumber(bucket[1]) or capacity 10local last_refill = tonumber(bucket[2]) or now 11 12-- Refill tokens 13local elapsed = now - last_refill 14local tokens_to_add = elapsed * refill_rate / 1000 15tokens = math.min(capacity, tokens + tokens_to_add) 16 17-- Try to consume tokens 18if tokens >= requested then 19 tokens = tokens - requested 20 redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now) 21 redis.call('EXPIRE', key, 3600) 22 return 1 23else 24 return 0 25end

Class Design

Main Class
+ properties
+ attributes
+ methods()
+ operations()
Helper Class
+ fields
+ data
+ helpers()
+ utilities()
Interface
«interface»
+ abstract methods()
━━━▶ Dependency
◆━━━ Composition
△━━━ Inheritance
```java // Main interface public interface RateLimiter { boolean allowRequest(String identifier); long getWaitTimeMs(String identifier); void reset(String identifier); }

// Configuration public class RateLimitConfig { private long maxRequests; private long windowSize; // milliseconds private RateLimitAlgorithm algorithm;

text
1public RateLimitConfig(long maxRequests, long windowSize, 2 RateLimitAlgorithm algorithm) { 3 this.maxRequests = maxRequests; 4 this.windowSize = windowSize; 5 this.algorithm = algorithm; 6} 7 8// Getters

}

// Algorithm enum public enum RateLimitAlgorithm { TOKEN_BUCKET, LEAKY_BUCKET, FIXED_WINDOW, SLIDING_WINDOW_LOG, SLIDING_WINDOW_COUNTER }

// Factory public class RateLimiterFactory { public static RateLimiter create(RateLimitConfig config) { switch (config.getAlgorithm()) { case TOKEN_BUCKET: return new TokenBucketRateLimiter(config); case LEAKY_BUCKET: return new LeakyBucketRateLimiter(config); case FIXED_WINDOW: return new FixedWindowRateLimiter(config); case SLIDING_WINDOW_LOG: return new SlidingWindowLogRateLimiter(config); case SLIDING_WINDOW_COUNTER: return new SlidingWindowCounterRateLimiter(config); default: throw new IllegalArgumentException("Unknown algorithm"); } } }

// Distributed implementation public class DistributedRateLimiter implements RateLimiter { private final RedisClient redis; private final RateLimitConfig config;

text
1public DistributedRateLimiter(RedisClient redis, RateLimitConfig config) { 2 this.redis = redis; 3 this.config = config; 4} 5 6@Override 7public boolean allowRequest(String identifier) { 8 // Use Redis for distributed state 9 return executeRedisScript(identifier); 10} 11 12private boolean executeRedisScript(String identifier) { 13 // Lua script execution 14 // ... 15 return true; 16}

}

text
1## Multi-tier Rate Limiting 2 3```java 4public class MultiTierRateLimiter implements RateLimiter { 5 private Map<String, RateLimiter> tiers; 6 7 public MultiTierRateLimiter() { 8 this.tiers = new HashMap<>(); 9 10 // Per-second limit 11 tiers.put("second", new TokenBucket(10, 10)); 12 13 // Per-minute limit 14 tiers.put("minute", new TokenBucket(100, 100.0/60)); 15 16 // Per-hour limit 17 tiers.put("hour", new TokenBucket(1000, 1000.0/3600)); 18 } 19 20 @Override 21 public boolean allowRequest(String identifier) { 22 // All tiers must allow the request 23 for (RateLimiter limiter : tiers.values()) { 24 if (!limiter.allowRequest(identifier)) { 25 return false; 26 } 27 } 28 return true; 29 } 30}

API Response

java
1public class RateLimitResponse { 2 private boolean allowed; 3 private long remainingRequests; 4 private long resetTime; 5 private long retryAfter; // seconds 6 7 public Map<String, String> toHeaders() { 8 Map<String, String> headers = new HashMap<>(); 9 headers.put("X-RateLimit-Limit", String.valueOf(maxRequests)); 10 headers.put("X-RateLimit-Remaining", String.valueOf(remainingRequests)); 11 headers.put("X-RateLimit-Reset", String.valueOf(resetTime)); 12 if (!allowed) { 13 headers.put("Retry-After", String.valueOf(retryAfter)); 14 } 15 return headers; 16 } 17}

Testing Strategy

java
1@Test 2public void testTokenBucket_allowsBurstTraffic() { 3 TokenBucket bucket = new TokenBucket(10, 1); 4 5 // Burst of 10 requests should succeed 6 for (int i = 0; i < 10; i++) { 7 assertTrue(bucket.allowRequest()); 8 } 9 10 // 11th request should fail 11 assertFalse(bucket.allowRequest()); 12 13 // Wait for refill 14 Thread.sleep(1000); 15 16 // Should allow one more request 17 assertTrue(bucket.allowRequest()); 18} 19 20@Test 21public void testFixedWindow_handlesWindowBoundary() { 22 FixedWindowCounter counter = new FixedWindowCounter(1000, 5); 23 24 // Fill up window 25 for (int i = 0; i < 5; i++) { 26 assertTrue(counter.allowRequest()); 27 } 28 29 assertFalse(counter.allowRequest()); 30 31 // Wait for new window 32 Thread.sleep(1000); 33 34 // Should allow requests in new window 35 assertTrue(counter.allowRequest()); 36}

Design Patterns Used

  1. Strategy Pattern: Different rate limiting algorithms
  2. Factory Pattern: Create rate limiter instances
  3. Decorator Pattern: Add logging, metrics
  4. Singleton Pattern: Shared rate limiter instances

Trade-offs

AlgorithmAccuracyMemoryBurst SupportComplexity
Token BucketGoodLowYesMedium
Leaky BucketGoodMediumNoMedium
Fixed WindowLowVery LowNoLow
Sliding Window LogExcellentHighNoHigh
Sliding Window CounterGoodLowNoMedium

💡 Interview Tips & Out-of-the-Box Thinking

Common Pitfalls

  • Fixed window boundary issue: Users can do 2x limit by splitting requests across window boundary
  • Not handling distributed systems: Single-server doesn't work for load-balanced systems (use Redis)
  • Memory leaks: Not cleaning up old entries in sliding window log
  • Race conditions: Multiple threads checking/updating counter simultaneously

Algorithm Selection

  • Token Bucket: Burst traffic (API with occasional spikes) - Most popular
  • Leaky Bucket: Steady output (video streaming, message queues)
  • Fixed Window: Simplest but inaccurate (non-critical use cases)
  • Sliding Window Counter: Good balance (accurate + low memory)
  • Sliding Log: Most accurate but memory intensive (financial transactions)

Advanced Considerations

  • Distributed limiting: Redis with atomic INCR + EXPIRE
  • Multi-level: Global + per-user + per-IP
  • Adaptive: Adjust limits based on system load
  • Quota vs Rate: Daily limit vs requests/second
  • Hierarchical: Organization → Team → User cascading limits

Creative Solutions

  • Tiered: Free (100/hr) vs Paid (10K/hr)
  • Cost-based: GET = 1 cost, complex POST = 10 cost
  • Time-varying: Higher limits off-peak
  • Token carry-forward: Unused tokens from last window carry over
  • Priority queue: Premium requests bypass limit

Edge Cases

  • Clock skew: Use Redis TIME for consistent timestamp
  • Thundering herd: Limit resets at midnight - jittered reset times
  • Anonymous users: Rate limit by IP with higher threshold
  • Shared IP/NAT: Office with 100 users - combine IP + User-Agent
  • Request bursts: Token bucket with max burst size

Follow-up Questions

  1. How to handle clock skew in distributed systems?
  2. How to implement rate limiting for anonymous users?
  3. How to support dynamic rate limit updates?
  4. How to implement fair queuing for rate-limited requests?
  5. How to handle rate limiting for websocket connections?
Press j for next, k for previous