On Tue, 8 Dec 2009, Al Viro wrote:> > Why do we want user_get_pages(), anyway? It's not that we lacked an> easy way to do large arrays, especially since the use is purely sequential.> Even a linked list of vmalloc'ed pages would do just fine (i.e. start with> static array in bprm, keep the pointer to last filled entry + number of> entries left before the next allocation; use the last pointer in array> for finding the next page-sized chunk).> > What do we lose if we go that way? Inserting all these pages into mm> at once shouldn't be slower. Memory overhead is not really an issue> (one page per 511 or 1023 pages of argv). Am I missing something?

I think what you lose that way is swappability.

Since we're supporting unlimited args and env here, it is importantthat those pages can belong to an mm, be discoverable by rmap, andbe swapped out if really necessary. Whereas I think you're proposingan internal list of those pages, unknown to rmap, unswappable.

Of course, a page is pinned in core between get_user_pages() andput_page(), but unless I've got it wrong, get_user_pages() is beingapplied one by one to these pages, each unpinned as the next is pinned.

It is conceivable that it actually doesn't work in the way I'mexpecting: that although it's all designed to leave those pagesswappable, some mod here or there has interfered with that. But ifso, that would be a bug: the intention, and I believe it's important,is that those pages are swappable.

I have a different reason for wanting to change how it's done:it's the major user of non-atomic kmap() and its global kmap_lock,and rather swamps other uses of kmap() (which have better use forthe cached virtual address). So I'd be happy with a better wayof doing it, but not at the cost of losing swappability.

Hugh

> > The benefit, AFAICS, is that we get rid of the mess with forced high> address use, get *sane* get_user_pages() (we always have matching> task_struct with the right personality, so we can avoid massive PITA> for doing checks right) and we get unified mmu/nommu code in fs/exec.c> out of that.> > If you see serious problems I've missed, please tell. Otherwise I'm> going to hack up a prototype and post it here...