Rate Limiting protects a service or system from excessive load caused by bad or misconfigured actors and allows good actors to cooperate with the system, sharing resources equally among clients.


A publically available system can be overwhelmed by client requests, causing degraded performance or outages. Misconfigured or malicious clients may issue excessive requests, using up the available resources and impacting availability for other clients.

Clients should be able to expect a level of availability for a service that meets a stated service level agreement or receives an equal share of resources as other clients when performance degrades.

Story Content


Use rate limiting to promise and enforce a request limit over time intervals for each client. Communicate the client’s limit, remaining allocation, and time before reset via standard means, such as HTTP headers.

Service Classes

Define different classes of rate-limiting, depending on the service, to encourage clients to authenticate, allowing for better auditing and accounting in the system. As trust in a client increases, increase their request limit. Anonymous clients should receive the lowest limits. Authenticated clients should receive a higher limit. Administrative clients should receive the highest limits.

Variable Costs

If some resources are disproportionately expensive, use cost-based rate limiting. Requests that cost the system more in compute or other resources use more of a client’s allocated limit. Increasing fairness amongst clients and reducing the target surface for a denial of service.


Two common implementations of rate limiting are the fixed window and the token bucket.

In the fixed window algorithm, a client is allocated a maximum of n requests for every fixed size time window, with a known boundary. Once the client has performed n requests, they must wait until the end of the window before their request allotment is refilled.

The token bucket algorithm (similar to the leaky bucket algorithm) grants each client a bucket of at most n tokens. Each request uses 1 token. The bucket’s tokens are refilled at a rate of m per interval t. Once the client has emptied their bucket, they must only wait for the interval t to pass before they can perform another m requests.

The fixed window is conceptually simpler to respond to in a client. The token bucket accommodates clients that may burst requests.


Story Content

Pros and Cons


  • Rate limiting is an effective first defense against denial of service attacks.
  • As a common pattern, rate limiting is well understood and simple for clients to work with.
  • By granting increased limits to authenticated users, clients are encouraged to authenticate, making it easier to identify and resolve misconfigurations in clients.


  • Rate limiting does not protect against numerous malicious clients (for example, a distributed denial of service).
  • Rate limiting is not ideal for internal use, where the system has full control over the server and client. Use backpressure to allow for dynamic limits and optimal resource utilization.

Relevant Tools and Services


With its low latency and data structures that support expiry and sorting, Redis is a popular choice for implementing rate limiting shared amongst application instances. supports multiple rate-limiting algorithms as a service.

Prefab Original - retired
Prefab Original - retiredFeature Flags, RateLimits, RemoteConfig. Microservices as a Service.
Try Nowarrow_right


In addition to its other content acceleration and security features Cloudflare, offers a rate-limiting product for an additional fee.


Load balancers like HAProxy can provide some rate limiting. As they operate below the application layer, a load balancer may lack the context needed to perform fine-grained limiting based on resources or users.

Manifold Background

Rate Limiting

Give all clients a fair share of your API with Rate Limiting
twitterShare Pattern