Social Network

Twitter

How do most APIs handle rate limiting?

I got into a discussion this past week with one of my colleagues about rate limiting or throttling for APIs. In particular, how we might handle a user going beyond their limit and how we would inform them of what the threshold values are so they can continue calling later on. Neither of us came to an agreement – he took the 503 route and I took the 429 route.

As a side effect though, we took a look at some various companies out there, and found only a couple of HTTP response codes and headers, which all at least follow the same model, with only moderately different header names. For the most part, they all seemed to have these exact headers, or variations of them with slightly different names.

X-RateLimit-Limit – The limit that you cannot surpass in a given amount of time

X-RateLimit-Remaining – The number of calls you have available until a given reset time stamp, or calculated given some sort of sliding time window.

X-RateLimit-Reset – The timestamp in UTC formatted to HTTP spec per RFC 1123 for when the limits will be reset.

In my colleague’s defense, he wasn’t the only one to go the 503 route for rate limiting as this StackExchange post covered (along with the ‘Retry-After’ header), but we couldn’t find a company that practiced it on an API. It is however, a convention used by web browsers for websites according to a StackOverflow post. We were really hoping for a uniform standard, but at least it’s not all over the place.

GitHub

They use the three response headers:

X-RateLimit-Limit
X-RateLimit-Remaining
X-RateLimit-Reset

In the event that you hit the limit, they return a HTTP 403 (Forbidden) with a JSON body with a message about hitting the limit.

Interestingly enough, Google opted for not using any headers on the responses. At least if they do use them, there is no documentation that says that they would.

They do however use a 403 (Forbidden) with a JSON body that has a OVER_QUERY_LIMIT element, but doesn’t appear to have any additional details as far as when you should come back or how many calls you have available for a given timeframe. The only thing I could find was that they will pass back a cache header as such on all of their calls which they expect you to honor by actually caching the response:

Cache-Control: public, max-age=86400

Recap

Now that we know that companies are mostly split between 429 and 403 with them almost using the exact same headers, what exactly do the standards (if any) say about this?

It appears that RFC 6585, Section 4 extended the original HTTP specification to add a few status codes, such as the 429 response code specifically created for limiting requests. In regards to the 429, it says this:

The response representations SHOULD include details explaining the condition, and MAY include a Retry-After header indicating how long to wait before making a new request.
…
Note that this specification does not define how the origin server identifies the user, nor how it counts requests. For example, an origin server that is limiting request rates can do so based upon counts of requests on a per-resource basis, across the entire server, or even among a set of servers.

The downside is that this RFC doesn’t define way to convey back to the client what the limits are and there is no documented standard for the ‘x-ratelimit-*’ headers that I could find. In fact, I couldn’t even find who actually implemented them in the first place. The upside is that there is no documented standard, yet almost everyone has been able to land on some sort of consensus by using this set of headers as is or with minimal variation.