4 Answers
4

The documentation here has an explanation that sounds like what you want to know:

The directive specifies the zone
(zone) and the maximum possible bursts
of requests (burst). If the rate
exceeds the demands outlined in the
zone, the request is delayed, so that
queries are processed at a given speed

From what I understand, requests over the burst will be delayed (take more time and wait until they can be served), with the nodelay options the delay is not used and excess requests are denied with a 503 error.

If you’re like me, you’re probably wondering what the heck burst
really means. Here is the trick: replace the word ‘burst’ with
‘bucket’, and assume that every user is given a bucket with 5 tokens.
Every time that they exceed the rate of 1 request per second, they
have to pay a token. Once they’ve spent all of their tokens, they are
given an HTTP 503 error message, which has essentially become the
standard for ‘back off, man!’.

I think you're incorrect, the nginx manual states: "Excessive requests are delayed until their number exceeds the maximum burst size". Note that until exceeds maximum burst is entirely different meaning than over the burst that you said. You also conflated burst with excess requests, I believe excess requests means it's above the zone, while it may still be below the maximum burst.
– Hendy IrawanDec 19 '14 at 7:09

The burst parameter defines how many requests a client can make in
excess of the rate specified by the zone (with our sample mylimit
zone, the rate limit is 10 requests per second, or 1 every 100
milliseconds). A request that arrives sooner than 100 milliseconds
after the previous one is put in a queue, and here we are setting the
queue size to 20.

That means if 21 requests arrive from a given IP address
simultaneously, NGINX forwards the first one to the upstream server
group immediately and puts the remaining 20 in the queue. It then
forwards a queued request every 100 milliseconds, and returns 503 to
the client only if an incoming request makes the number of queued
requests go over 20.

If you add nodelay:

location /login/ {
limit_req zone=mylimit burst=20 nodelay;
...
}

With the nodelay parameter, NGINX still allocates slots in the queue
according to the burst parameter and imposes the configured rate
limit, but not by spacing out the forwarding of queued requests.
Instead, when a request arrives “too soon”, NGINX forwards it
immediately as long as there is a slot available for it in the queue.
It marks that slot as “taken” and does not free it for use by another
request until the appropriate time has passed (in our example, after
100 milliseconds).

The setting defines whether requests will be delayed so that they conform to the desired rate or whether they will be simply rejected...somewhat whether the rate limiting is managed by the server or responsibility is passed to the client.

nodelay present

Requests will be handled as quickly as possible; any requests sent over the specified limit will be rejected with the code set as limit_req_status

nodelay absent (aka delayed)

Requests will be handled at a rate that conforms with the specified limit. So for example if a rate is set of 10 req/s then each request will be handled in >= .1 (1/rate) seconds, thereby not allowing the rate to be exceeded, but allowing the requests to get backed up. If enough requests back up to overflow the bucket (which would also be prevented by a concurrent connection limit), then they are rejected with the code set as limit_req_status.