Rate Limiting Architecture

Rate limiting is a critical component of our infrastructure, ensuring accurate and robust rate limiting for our customers. This document provides an in-depth look at our rate limiting architecture, explaining each component and concept in detail.

Overview

The Unkey API implements a distributed rate limiting system using a sliding window algorithm with Redis-backed coordination. This architecture provides high accuracy while maintaining low latency and horizontal scalability across multiple nodes and regions.

Architectural Components

Sliding Window Algorithm

The rate limiting system uses a sliding window algorithm that provides more accurate rate limiting than fixed window approaches:

Time Windows: Time is divided into fixed-duration windows (e.g., 1-minute intervals)
Weighted Calculation: For each request, the system calculates an effective count by combining:
- 100% of the current window's count
- A decreasing portion of the previous window's count based on how far we are into the current window
Smooth Transitions: This approach prevents the "cliff effect" at window boundaries, where all counters reset at once

Redis as Centralized Counter Store

Redis serves as the authoritative source of truth for rate limit counters across all API nodes:

Distributed Counters: Redis maintains atomic counters for each rate limit window
TTL Management: Counters automatically expire after their relevant window passes
Atomic Operations: Redis' atomic operations ensure consistent counting across nodes

Local In-Memory State

Each API node maintains local in-memory state for fast decision making:

Buckets: Track rate limit state for each unique identifier+limit+duration combination
Windows: Time-based counters that maintain accurate request counts
Memory Management: Background processes clean up expired windows to prevent memory leaks

Request Flow

1. Initial Rate Limit Check

When a rate limit request arrives:

The node validates the request parameters (identifier, limit, duration)
It retrieves or creates local buckets and windows for the current and previous time periods
For new windows or after limit exceedance, it synchronizes state with Redis
It calculates the effective count using the sliding window algorithm
It makes a local decision (allow/deny) based on this calculation

2. Asynchronous Synchronization

After making a local decision:

The node buffers the request for asynchronous processing
Background workers send counter updates to Redis
This ensures eventual consistency without adding latency to API responses

3. Redis Synchronization

The system synchronizes with Redis in these scenarios:

When a new time window is encountered
After a rate limit has been exceeded (strict mode)
Periodically through background processes

Implementation Details

Bucket and Window Management

Buckets: Maintain rate limit state for each identifier+limit+duration
Windows: Aligned to time boundaries (e.g., minute start) for consistent behavior
Sequence Numbers: Monotonically increasing numbers identify windows across nodes

Resilience Features

Circuit Breaker: Prevents cascading failures during Redis communication issues
Eventual Consistency: System remains operational even during temporary Redis unavailability
Memory Management: Automatic cleanup of expired windows prevents memory leaks

Performance Optimizations

Local Decision Making: Most decisions made locally avoid Redis round trips
Asynchronous Updates: Background synchronization minimizes latency
Batched Operations: Multiple counters retrieved in a single Redis operation
Multiple Replay Workers: Parallel processing of synchronization tasks

Deployment Considerations

Redis Configuration

For production deployments:

Use Redis with appropriate persistence settings
Consider Redis Cluster for high availability
Position Redis instances close to API nodes to minimize latency

Scaling

The system scales horizontally with minimal coordination overhead:

Add more API nodes to handle increased request volume
Scale Redis independently based on counter volume
Each API node operates independently with eventual consistency

This architecture strikes a balance between accuracy, performance, and operational simplicity, providing robust rate limiting across a distributed system without complex clustering mechanisms.

Rate Limiting Architecture

On this page