On 2011-03-07 21:41, Peter Zijlstra wrote:> On Mon, 2011-03-07 at 20:43 +0100, Jens Axboe wrote:>> On 2011-03-07 11:23, Peter Zijlstra wrote:>>> On Sat, 2011-03-05 at 21:54 +0100, Jens Axboe wrote:>>>>>>>> Apparently so. Peter/Ingo, please shoot this one down in flames.>>>> Summary:>>>>>>>> - Need a way to trigger this flushing when a task is going to sleep>>>> - It's currently done right before calling deactivate_task(). We know>>>> the task is going to sleep here, but it's also under the runqueue>>>> lock. Not good.>>>> - In the new location, it's not completely clear to me whether we can>>>> safely deref 'prev' or not. The usage of prev_state would seem to>>>> indicate that we cannot, and as far as I can tell, prev could at this>>>> point already potentially be running on another CPU.>>>>>>>> Help? Peter, we talked about this in Tokyo in September. Initial>>>> suggestion was to use preempt notifiers, which we can't because:>>>>>>>> - runqueue lock is also held>>>> - It's not unconditionally available, depends on config.>>>>>>>> diff --git a/kernel/sched.c b/kernel/sched.c>>>> index e806446..8581ad3 100644>>>> --- a/kernel/sched.c>>>> +++ b/kernel/sched.c>>>> @@ -2826,6 +2826,14 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)>>>> #endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */>>>> finish_lock_switch(rq, prev);>>>> >>>> + /*>>>> + * If this task has IO plugged, make sure it>>>> + * gets flushed out to the devices before we go>>>> + * to sleep>>>> + */>>>> + if (prev_state != TASK_RUNNING)>>>> + blk_flush_plug(prev);>>>> +>>>> fire_sched_in_preempt_notifiers(current);>>>> if (mm)>>>> mmdrop(mm);>>>> @@ -3973,14 +3981,6 @@ need_resched_nonpreemptible:>>>> if (to_wakeup)>>>> try_to_wake_up_local(to_wakeup);>>>> }>>>> - /*>>>> - * If this task has IO plugged, make sure it>>>> - * gets flushed out to the devices before we go>>>> - * to sleep>>>> - */>>>> - blk_flush_plug(prev);>>>> - BUG_ON(prev->plug && !list_empty(&prev->plug->list));>>>> ->>>> deactivate_task(rq, prev, DEQUEUE_SLEEP);>>>> }>>>> switch_count = &prev->nvcsw;>>>>>>>>>> Right, so your new location is still under rq->lock for a number of>>> architectures (including x86). finish_lock_switch() doesn't actually>>> release the lock unless __ARCH_WANT_INTERRUPTS_ON_CTXSW ||>>> __ARCH_WANT_UNLOCKED_CTXSW (the former implies the latter since rq->lock>>> is IRQ-safe).>>>> Ah, thanks for that.>>>>> If you want a safe place to drop rq->lock (but keep in mind to keep IRQs>>> disabled there) and use prev, do something like the below. Both>>> pre_schedule() and idle_balance() can already drop the rq->lock do doing>>> it once more is quite all-right ;-)>>>>>> Note that once you drop rq->lock prev->state can change to TASK_RUNNING>>> again so don't re-check that.>>>> So that's a problem. If I end up flushing this structure that sits on>> the stack of the process, I cannot have it running on another CPU at>> that time.>>>> I need the process to be in such a state that it will not get scheduled>> on another CPU before this has completed.>>>> Is that even possible? > > Yes, if prev will be flipped back to TASK_RUNNING it will still stay on> that cpu, it will not migrate until the cpu that schedules it away (the> cpu you're on) will have flipped rq->curr, and that happens way after> this point. So you're good to go, just don't rely on ->state once you> release rq->lock.

Great, that'll work for me! Your patch should work as-is, then. ThanksPeter.