On Wed, 2003-07-02 at 18:28, Marcelo Tosatti wrote:> On Wed, 2 Jul 2003, Marcelo Tosatti wrote:> > >> > Hello people,> >> > What is the status of the IO scheduler fixes for increased fairness for> > 2.4 ?> >> > I haven't had time to read and think about everything you guys discussed,> > so a brief summary would be very helpful for me.

Ok, based on mail from Andrea, it seems like reusing the existingelvtune ioctl is the way to go for now. The patch below usesmax_bomb_segments for two things.

1) the number of sectors allowed on a given request queue before new iotriggers an unplug (q->max_queue_sectors)

2) to enable or disable q->full checks. If max_bomb_segments is odd, itmeans the q->full checks are on, even means they are off.

The ioctl code on the kernel side is setup to allow this:

elvtune -b 1 /dev/xxx elvtune -b 0 /dev/xxx

For just switching q->full on and off without changingq->max_queue_sectors.

elvtune -b 8192 /dev/xxx will get you the current default behavior (4MBin flight, q->full checks off)

/*- * Here's the request allocation design:+ * get a free request, honoring the queue_full condition+ */+static inline struct request *get_request(request_queue_t *q, int rw)+{+ if (q->full)+ return NULL;+ return __get_request(q, rw);+}++/* + * helper func to do memory barriers and wakeups when we finally decide+ * to clear the queue full condition+ */+static inline void clear_full_and_wake(request_queue_t *q)+{+ q->full = 0;+ mb();+ if (waitqueue_active(&q->wait_for_requests))+ wake_up(&q->wait_for_requests);+}++/*+ * Here's the request allocation design, low latency version: * * 1: Blocking on request exhaustion is a key part of I/O throttling. * * 2: We want to be `fair' to all requesters. We must avoid starvation, and * attempt to ensure that all requesters sleep for a similar duration. Hence * no stealing requests when there are other processes waiting.- * - * 3: We also wish to support `batching' of requests. So when a process is- * woken, we want to allow it to allocate a decent number of requests- * before it blocks again, so they can be nicely merged (this only really- * matters if the process happens to be adding requests near the head of- * the queue).- * - * 4: We want to avoid scheduling storms. This isn't really important, because- * the system will be I/O bound anyway. But it's easy.- * - * There is tension between requirements 2 and 3. Once a task has woken,- * we don't want to allow it to sleep as soon as it takes its second request.- * But we don't want currently-running tasks to steal all the requests- * from the sleepers. We handle this with wakeup hysteresis around- * 0 .. batch_requests and with the assumption that request taking is much,- * much faster than request freeing.+ *+ * There used to be more here, attempting to allow a process to send in a+ * number of requests once it has woken up. But, there's no way to + * tell if a process has just been woken up, or if it is a new process+ * coming in to steal requests from the waiters. So, we give up and force+ * everyone to wait fairly. * * So here's what we do: * @@ -561,50 +628,67 @@ * * When a process wants a new request: * - * b) If free_requests == 0, the requester sleeps in FIFO manner.- * - * b) If 0 < free_requests < batch_requests and there are waiters,- * we still take a request non-blockingly. This provides batching.- *- * c) If free_requests >= batch_requests, the caller is immediately- * granted a new request.+ * b) If free_requests == 0, the requester sleeps in FIFO manner, and+ * the queue full condition is set. The full condition is not+ * cleared until there are no longer any waiters. Once the full+ * condition is set, all new io must wait, hopefully for a very+ * short period of time. * * When a request is released: * - * d) If free_requests < batch_requests, do nothing.- * - * f) If free_requests >= batch_requests, wake up a single waiter.+ * c) If free_requests < batch_requests, do nothing. * - * The net effect is that when a process is woken at the batch_requests level,- * it will be able to take approximately (batch_requests) requests before- * blocking again (at the tail of the queue).- * - * This all assumes that the rate of taking requests is much, much higher- * than the rate of releasing them. Which is very true.+ * d) If free_requests >= batch_requests, wake up a single waiter. *- * -akpm, Feb 2002.+ * As each waiter gets a request, he wakes another waiter. We do this+ * to prevent a race where an unplug might get run before a request makes+ * it's way onto the queue. The result is a cascade of wakeups, so delaying+ * the initial wakeup until we've got batch_requests available helps avoid+ * wakeups where there aren't any requests available yet. */

+/*+ * This must be called after every submit_bh with end_io+ * callbacks that would result into the blkdev layer waking+ * up the page after a queue unplug.+ */+void wakeup_page_waiters(struct page * page)+{+ wait_queue_head_t * head;++ head = page_waitqueue(page);+ if (waitqueue_active(head))+ wake_up(head);+}+ /* * Wait for a page to get unlocked. *