See, 31 requests in order, then one request "backwards", then 31 in order, etc.

I found out what causes this. It's get_request_wait().

When the request queue is full, and a new request needs to be created,
__make_request() blocks in get_request_wait().

Another process wakes up first (pdflush / process submitting I/O itself /
xfsdatad / etc) and sends the next bio's to __make_request().
In the mean time some free requests have become available, and the bios
are merged into a new request. Those requests are submitted to the device.

Then, get_request_wait() returns but the bio is not mergeable anymore -
and that results in a backwards seek, severely limiting the I/O rate.

Wouldn't it be better to allow the request allocation and queue the
request, and /then/ put the process to sleep ? The queue will grow larger
than nr_requests, but it does that anyway.

The "batching" logic there should allow a process to submit
a number of requests even above the nr_requests limit to
prevent this interleave and context switching.