On Monday 23 January 2006 20:59, Jacob Bachmeyer wrote:
> Blaisorblade wrote:
> >On Thursday 19 January 2006 23:23, Jacob Bachmeyer wrote:
> >>Blaisorblade wrote:
> >>>On Thursday 19 January 2006 00:52, Jacob Bachmeyer wrote:
> >>>I.e. extend ptrace to trap lcall gates, right? That's another thing,
> >>> could be done, but it relates more to the Linux-ABI project... at least
> >>> this can't be merged in mainline since we don't support lcall gates.
> >>
> >>Why not? And for that matter, why does ptrace not currently catch
> >> lcalls?
> >
> >The lcall stub was removed from arch/i386/kernel/entry.S a little time ago
> >(about 2.6.12 IIRC). So vanilla Linux can't handle lcalls. Clear now?
> Yes, the last time I looked into that part of the kernel was back in
> 2.4. So, does this mean that lcalls can no longer be potentially used
> to escape from UML?
Yes, and IIRC that was also fixed directly time ago via LDT clearing, IIRC.
> >Yes, it is thought to be only an error path, but UML abuses of it for
> > normal control, and I said that the kernel supports "fasttrap", but only
> > via SIGSEGV, i.e. in a slow way.
> That is the exact problem. It shouldn't be abused--a proper interface
> that has acceptable performance should be devised. (You mention
> netlink--was it looked into?
No, and I while I mentioned netlink it's not an interface of which I've a deep
knowledge. However it's being used for various things, including a proposed
rewrite of the wireless API, and the already existing implementation of
userspace packet filtering, so we can assume it has reasonable performance,
momentum, user base and thus maintainance.
> This might help with some UML performance
> issues.)
Possibly yes, but Ingo Molnar already designed a custom API for this purpose -
it is grown up for UML usage.
> Basically what is needed is a means to set a page to no access
> but cause some other action to occur rather than generate SIGSEGV.
> >>>We do that: make them unmapped and trap SIGSEGV through ptrace.
> >>
> >>The overhead is not all that large, as most Win32 API calls ultimately
> >>go into the kernel anyway.
> >
> >A kernel switch only costs about some thousands TSC units (see the rdtsc
> >assembly instruction), while a signal delivery to a foreign process can
> > cost a lot more (I measure it in the order of 4* 10^5 TSC units, even
> > without a memory switch).
>
> Then a more efficient interface is needed. Besides, this would need to
> be synchronous.
>
> >>This also should allow WINE to work well on
> >>platforms such as x86-64, without needing multiple WINE binaries.
> >>(64-bit control process managing mix of 32 and 64 bit address spaces)
> >
> >Writing 64-bit code handling cleanly 32-bit syscalls is hard. Compiling
> > 32-bit code in 32-bit mode to do the same is simpler.
>
> The problem is that they need to communicate, especially once Win64
> actually hits. WINE currently has a (confusing) "relay" layer that
> already does similar tasks for 16/32 bit. Furthermore, the Win32 API
> calling convention is fairly well defined, (parameters on stack; return
> in EAX) so this shouldn't be more of a problem than has been solved in
> the past. (That doesn't mean it won't be a real PITA.)
>
> >>The reason to trap is to allow WINE to intercept the call while
> >>sitting in another address space. (Each Win32 process would have its
> >>own guest address space.) The idea is to have the interfaces UML uses
> >>be generic enough for WINE to also use.
> >>
> >>The reason is simple--improved security by enforcing a sandbox around
> >>WINE.
> Seccomp (see below--thanks for bringing it up) could more easily be used
> to solve this. (Why bother with trapping all the time when only a few
> pages really need protection? Furthermore, the external control thread
> would thus have veto power over all syscalls made, so the sandbox can be
> easily enforced.)
> >Andrea Arcangeli merged such a "padded cell" functionality, but the
> > allowed interface is read, not a page fault. The former is faster and
> > easier to use, and also allows writing arbitrary amounts of data.
> >
> >It's called secure computing (see kernel/seccomp.c for details, and/or
> > look on LWN.net for an article about it).
>
> I had looked at this earlier, but hadn't realized that it could be used
> to implement this--provided that mm_indirect can make syscalls in a
> seccomp address space (bypassing the restriction),
Wait a moment - you're clearly talking about the runtime thread calling
mm_indirect(), or I mistook something?
In this case there's no problem - seccomp jails the process only. If we tried
to inject in the process code to perform syscalls (like UML does in SKAS0
mode, which is not a host patch) it wouldn't work, but mm_indirect is a
normal syscall borrowing the foreign address space.
> this can do
> everything that "fasttrap" could (using some help from appropriate code
> in userspace).
> Maybe SKAS4 should add a new seccomp level?
I don't remember about "levels" in seccomp... and that was intended to be
simple. Beyond they shouldn't be needed (see above).
> >Wait a moment - Windows 3.1 uses 286 paging, and Win16 userspace progs use
> > up to 16M of Ram. You don't have this on vm86(), right?
> No, but as I said vm86 is gone on x86-64, which means that DOS soft ints
> are somehow caught--inside the address space in question. (WINE
> currently runs in-process, I am trying to lay the groundwork to change
> that--thus all the crazy stuff previously about "fasttrap" to another
> userspace.) Current WINE can use vm86 on i386 platform, however.
> This (Win16 programs with 16MiB of RAM) also means that WINE could
> always intercept soft interrupts--even without use of vm86.
Good.
> The other catch is that 64 and 32 bit code doesn't mix very well, and
> they must be kept in separate processes normally--thus the reason for a
> 64-bit control process to be able to handle both 32 and 64 bit address
> spaces. The entire kernel is 64-bit anyway, so leaving the option open
> can't be too insanely hard.
> The other problem is that a more specific interface could be much
> faster. OTOH, perhaps a better strategy would be to improve the
> signals--thus also lessening the other problem (slowness of SIGSEGV) as
> well as improving performance generally.
Signals are very slow, but in many ways they can't be optimized. The only big
optimization which can be done is when _tracing_ a process which gets a
signal. The signal is first delivered to the target process, a context switch
is made towards it, and only afterwards, before returning to user mode, is
the signal notification delivered to the tracing process, a context switch is
performed towards it and then the traced process is switched again to ready
state and then scheduled. I.e. the first switch to the target process is
totally useless.
> >>>However, currently the idea is sys_mm_indirect , taking an fd
> >>> representing an mm context, a syscall number and its parameters, plus a
> >>> syscall to get a fd representing a mm context.
> >>How are address spaces manipulated? Could ioctls on the mm context's fd
> >>be useful?
> >We don't use ioctls, they are inelegants; SKAS3 uses write which is just
> > as bad.
> What is inelegant about an ioctl on a special fd? I say that ioctls are
> far preferrable to more fds (on other files), or the extra complexity of
> implementing some other interface (maybe using netlink?).
ioctl is totally unstructured and thus inelegant, and 32/64-bit compatibility
is a PITA.
Using them for devices is tolerable, for general APIs isn't. Many recently
included APIs were born as ioctl()s set and were rewritten as either syscalls
sets or special filesystems (say inotify(), for instance).
Device mapper uses ioctls only because it was merged in the dark age of 2.5
and it was really needed.
> Besides, if
> you implement your own struct file_operations, you get ioctl support by
> writing the handler function for it.
> (If I understand the Linux 2.6.14
> VFS correctly).
You do, that's not the problem... and the inelegance is not totally in the
implementation, but in the API.
> OTOH, if no operations that fall into ioctl's area are
> needed, then implementing ioctl for its own sake is silly.
> >For SKAS4, instead, you'd use sys_mm_indirectI(); you say:
> >
> >mm_indirect(addr_space_fd, __NR_MMAP, <mmap_args>)
> >mm_indirect(addr_space_fd, __NR_MUNMAP, <munmap_args>)
> >
> >and so on, for each syscall (excluding fork and exit, for now). To destroy
> > an address space you simply call close on its fd.
> How do you map region X of the guest address space to region Y (or
> somewhere) in your own? mmap/munmap on the address space's fd would
> make sense here.
That's not possible, to my knowledge, unless you use a shared backing storage,
i.e. a tmpfs file.
I.e. the memory must be set up as shareable from the very beginning.
--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade
___________________________________
Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive
http://it.messenger.yahoo.com

Ok, I can't seem to make minor changes to ubd_kern to handle paged bitmaps
so
I guess I will fall back my old cow_user.c with its library of
cow_open/close/read/write
and working paged bitmaps
Here is the current version of cow-many against uml-2.6.15-bs1 it should be
relatively unlimited as to # of ubd devices I was able to open my hard disk
(read only since cow breaks) 986 times each with 3 partitions before running
out of file descriptors and about 100 seperate major numbers all in the
default mem=32M (29M were in use though). found a oddness formerly I called
each major #'s container the ubd_controller# but that is too long for proc
so now it is just ubd_bus# like the ide# for ide disk controllers.
if you use ubda in mconsole devices will be created with ubda as the disk
name
if you use ubd0 in mconsole devices will be created with ubd0 as the disk
name
I plan on using it to overlay my raid array hda-hds and have it use the real
device #s
since the raid stuff seems to have them coded in as a check
For my use I will be setting MAX_MINOR 128 and UBD_SHIFT 6 for the 2
devices/major like real ide drivers and MAX_LETTER 19 for drive ubds
on the command line
linux ubd=3 ubdaC27095H255S63=COWa,/dev/hda ubdbC20034H255S63=COWb,/dev/hdb
...
and via uml_mconsole
config ubd=22 /* see the uml_mconsole patch to allow this to work */
config ubdcC20034H255S63=COWc,/dev/hdc
config ubddC20034H255S63=COWd,/dev/hdd
config ubd=33
config ubdeC20034H255S63=COWe,/dev/hde
...
config ubdsC20034H255S63=COWs,/dev/hds
I did not bother this time with keeping fake_ide/fake_major since this
exactly maps to my
ide drives anyway, I also did not put in the config
ubd=3,22,33,34,56,57,88,89,90,91
which would be the way to go for heavy use
I also noted the your V3 cow format does not comply with the 64bit alignment
rules and
the compiler should insert a 32bit pad, or you could try __packed but that
should
segfault a real 64bit machine I think AMD64/EMT64 may allow unaligned access
still someday PPC64/SPARC64/IA64 might get user mode linux and 64 bit
machines used to be picky about things like that.
The current ubd_remove takes a integer rather than the device name this
makes it harder to adjust for MAX_LETTER and knowing where the last ubd# is.
it used to take the string from uml_mconsole which I just gave to parse_unit
and the used the *dev to remove it now I have to go search for the number...
On a related note what is ubd_id() really supposed to do I am guessing it is
the reporting function used to limit remove to the correct devices? I just
found it in arch/um/drivers/mconsole_kern.c, so I end up calling parse_unit
in ubd_id and then converting the device I found to a integer sending up for
a range check which I just did and then back down to remove where I have to
walk the list again to find that # to get the device I just had in ubd_id...
seems a bit awkward I liked the string better.
Here are my thoughts about sequential formats for non-sparse filesystems
(e.g. FAT32)
Indexed SAM fast but not very space efficient 1/64th of the disk size for
the index alone
though we could make it 1/128 by limiting the sectors to 32bit but still
ick.
u64 offset[blocks]
data...
so we get data[offset[sector]]
List SAM slower smaller? have to walk array 0..list_index to find if sector
is present
list_index /* where we are in sector_number array */
list_limit /* how many sectors we are willing to remap can be less than all
of the disk */
u64 sector_number[list_limit]
data...
for i=0;i<list_index;i++
if sector_number[i] == sector then read data[i*sectorsize]
Direct SAM very slow but small--just record after record containing sector #
and data
struct {
u64 sector_number
char data[sectorsize]
} record
i=0;
seek(fd,0)
do
read(fd,tmp,sizeof(tmp))
if tmp.sector_number == sector then we have the data in tmp.data break;
i++
while(!EOF)
Paged SAM since it only allocates 1 block of offsets at a time much smaller
file
int mapsize /* 4k would seem a good starting point 512 S#/page allocated */
struct {
u64 sector_number[MAPSIZE/sizeof(u64)]
char data[MAPSIZE/sizeof(u64][sectorsize]
}
but you would have to walk the blocks

Humm, I will look to see if that happens on my machines, nope
I just tested on a machine with more real memory but no skas3 no tmpfs I
appear to have ~64M in use with one COW bitmap and 140M in use with 2 (from
top in guest) the
other machine was shrinking the guest virtual address space to ~128M even
when told to use 384M and the tmp(on tmpfs) was remounted with
-oremount,size=10G
Though come to think of it the guest does not really need to have the
vmalloced space
actually allocated with the backing file on tmp... but I can't think of a
way to have it just mapped into high memory or something, but that would
really be the place to put the 600M of bitmaps I expect to have if I can get
things working.
>From: Jeff Dike <jdike@...>
>To: James McMechan <james_mcmechan@...>
>CC: user-mode-linux-devel@...
>Subject: Re: [uml-devel] many cow files with ubd_kern.c
>Date: Sat, 28 Jan 2006 21:46:20 -0500
>
>On Sat, Jan 28, 2006 at 11:11:31PM +0000, James McMechan wrote:
> > allocation failed: out of vmalloc space - use "vmalloc=<size>" to
>increse
> > size
> > Failed to vmalloc COW bitmap
> > ubd0: Can't open "COW0": errno -12
>
>I have a report (which I haven't looked into yet) of the ubd driver
>double-vmallocing the bitmap. This is the call trace of the vmallocs
>as reported on IRC:
>
>ubd_add
> enter
>Attempting to vmalloc COW bitmap \"cow0\" at
> 25600
>cow.bitmap = 0x00000000 (before vmalloc)
>abrooks
> cow.bitmap = 0xa2800000 (after vmalloc)
>ubd_open enter
> Attempting to vmalloc COW bitmap \"cow0\" at 25600
>cow.bitmap =
> 0xa2800000 (before vmalloc)
>cow.bitmap = 0xa280a000 (after
> vmalloc)
>ubd_open exit
> /dev/ubd/disc0: p1
>abrooks
> ubd_close() enter
>ubd_close() exit
>ubd_close()
> enter
>ubd_close() exit
>ubd_add exit
>
> Jeff

Once again I have been tweaking ubd_kern to support lots of disks since I
need it for recovery
but I hit (even on a plain 2.6.15-bs1 kernel)
allocation failed: out of vmalloc space - use "vmalloc=<size>" to increse
size
Failed to vmalloc COW bitmap
ubd0: Can't open "COW0": errno -12
but I already had vmalloc=128M (which should be the default) and
vmalloc=256M and mem=384M on the command line.
I can't seem to locate where the equivlaent of arch/i386/kernel/setup.c has
the vmalloc= option on arch/um though
has anyone hit something like this before each of the bitmaps should only
take about 60M of memory and at this point I have only tried to allocate 1
for a single 250G ubd device
This will present a larger problem when I get everything working and try to
allocate all 10 drives worth of bitmap ~600M
I suppose I may have to go back to the paged bitmap logic but it is still a
mess...
The multiple major numbers and C:H:S from mconsole do seem to work correctly
though

On Fri, Jan 27, 2006 at 04:37:17PM -0600, Rob Landley wrote:
> Understood. Is there any way to autodetect this at runtime?
Not sure, that would be nice.
It would have to look at its own maps and figure out that two pages mapped
at the top of memory are the stub pages of the outer UML. That seems a bit
ad-hoc to me.
Although maybe we could just say that any two pages there (as long as they
are not stack) cause the stub pages to be relocated simply to avoid a
collision. Here, we're not explicitly looking for an outer UML, just pages
that we are about to stomp on.
Jeff

On Friday 27 January 2006 12:08, Jeff Dike wrote:
> On Sat, Oct 08, 2005 at 06:37:25PM -0500, Rob Landley wrote:
> > Start with "nesting level".
>
> Actually, NEST_LEVEL is needed for skas0 because of the stub pages. They
> need to be relocated in a nested UML. So, I'm dropping this patch.
Understood. Is there any way to autodetect this at runtime?
Rob
--
Steve Ballmer: Innovation! Inigo Montoya: You keep using that word.
I do not think it means what you think it means.

On Sat, Oct 08, 2005 at 06:37:25PM -0500, Rob Landley wrote:
> Start with "nesting level".
Actually, NEST_LEVEL is needed for skas0 because of the stub pages. They
need to be relocated in a nested UML. So, I'm dropping this patch.
BTW, the other config options which needed to be made dependent on MODE_TT
now are.
Jeff

Hi,
I apologize for intruding into your busy schedule. Please allow me to introduce ourselves to you.
We are engaged in providing book-related services to international clientele (http://www.e-BookServices.com). Located in
India, we cater to both organizations as well as individuals.
Our range of services include, but are not limited to:
- Typesetting in QuarkXPress, InDesign, FrameMaker etc (in multiple languages)
- Format conversion (eg. conversion from PageMaker to QuarkXPress, InDesign etc, including cross-platform)
- Formatting of manuscripts (MS-Word, WordPerfect etc)
- Graphic Design and Pre-Press jobs (including logo/book cover design)
- Desktop Publishing (DTP)
- Keying-in, scanning and OCR (for re-prints, archives, web publishing etc)
- e-Books Creation (PDFs, MobiPocket etc)
- XML Conversion
Many companies are now outsourcing their work to India. Not only do they achieve substantial cost savings, but the
availability of a large number of professionally trained personnel also ensures a high quality of work. We can serve you
efficiently and cost effectively, and deliver as per your customized needs.
We are one of the leading Indian organizations in our field, and serve many international organizations. For accomplishing
jobs with us, a team of highly trained and dedicated professionals work at our state-of-the-art facilities located in New
Delhi, India.
We understand the basic international business processes and ensure that all projects undertaken by us are completed within
the stipulated time period with complete confidentiality, accuracy, and quality.
You do not need to pay us a deposit. Entire payment may be made after receiving satisfactory delivery of service.
References can be provided on request.
If you are interested in cutting your current costs to at least one-third, increasing your profit margins without
compromising on the quality and turn-around time, then please do get back to us. We will be glad to send you a most
competitive business proposal.
Sincerely,
Dhiraj Aggarwal
Mobile: +(91) 98100 50809
E-mail: info@... , info@...
URL: http://www.e-BookServices.com

On Wednesday 25 January 2006 16:09, Pekka J Enberg wrote:
> On Wed, 25 Jan 2006, Blaisorblade wrote:
> > So, first thing, you're still running in TT mode.
> > FYI, you should probably enable CONFIG_MODE_SKAS - it now (since 2.6.13)
> > works even on unpatched hosts, in SKAS0 mode (even if it's a bit slower
> > than SKAS3).
> > The only reason TT mode is still in is that its code is useful for
> > debugging and future extension (i.e. there's no SMP support in SKAS0/3
> > modes).
> Well, yeah, I know. I haven't been able to get the uml debug mode work
> with SKAS which is why I use TT.
With SKAS you simply do the normal "gdb ./vmlinux" rather than using the debug
parameter - that was needed for TT only because of its weirdness (and it
recently the ptrace proxy started being buggy too).
Make sure CONFIG_CMDLINE_ON_HOST is still disabled, because gdb seems unable
to debug a program which ptraces itself.
--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade
___________________________________
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB
http://mail.yahoo.it

On Wed, 25 Jan 2006, Blaisorblade wrote:
> So, first thing, you're still running in TT mode.
>
> FYI, you should probably enable CONFIG_MODE_SKAS - it now (since 2.6.13) works
> even on unpatched hosts, in SKAS0 mode (even if it's a bit slower than
> SKAS3).
>
> The only reason TT mode is still in is that its code is useful for debugging
> and future extension (i.e. there's no SMP support in SKAS0/3 modes).
Well, yeah, I know. I haven't been able to get the uml debug mode work
with SKAS which is why I use TT.
On Wed, 25 Jan 2006, Blaisorblade wrote:
> Please report whether problems arise even in SKAS0 mode.
Nope, SKAS boots up ok but I ./linux debug doesn't bring up the gdb
terminal. I have included the config I am using for 2.6.16-rc1-git4 with
SKAS enabled.
Pekka
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.16-rc1-git4
# Wed Jan 25 17:01:47 2006
#
CONFIG_GENERIC_HARDIRQS=y
CONFIG_UML=y
CONFIG_MMU=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_IRQ_RELEASE_METHOD=y
#
# UML-specific options
#
CONFIG_MODE_TT=y
# CONFIG_HOST_2G_2G is not set
CONFIG_KERNEL_HALF_GIGS=1
CONFIG_MODE_SKAS=y
#
# Host processor type and features
#
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
CONFIG_M686=y
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_PPRO_FENCE=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_UML_X86=y
# CONFIG_64BIT is not set
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_TOP_ADDR=0xc0000000
# CONFIG_3_LEVEL_PGTABLES is not set
CONFIG_STUB_CODE=0xbfffe000
CONFIG_STUB_DATA=0xbffff000
CONFIG_STUB_START=0xbfffe000
CONFIG_ARCH_HAS_SC_SIGNALS=y
CONFIG_ARCH_REUSE_HOST_VSYSCALL_AREA=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_LD_SCRIPT_STATIC=y
CONFIG_NET=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=m
# CONFIG_HOSTFS is not set
# CONFIG_HPPFS is not set
CONFIG_MCONSOLE=y
# CONFIG_MAGIC_SYSRQ is not set
# CONFIG_SMP is not set
CONFIG_NEST_LEVEL=0
# CONFIG_HIGHMEM is not set
CONFIG_KERNEL_STACK_ORDER=2
CONFIG_UML_REAL_TIME_CLOCK=y
#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_CLEAN_COMPILE=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
#
# General setup
#
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_SYSCTL=y
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_UID16=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
# CONFIG_EMBEDDED is not set
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_CC_ALIGN_FUNCTIONS=0
CONFIG_CC_ALIGN_LABELS=0
CONFIG_CC_ALIGN_LOOPS=0
CONFIG_CC_ALIGN_JUMPS=0
CONFIG_SLAB=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set
#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_OBSOLETE_MODPARM=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y
#
# Block layer
#
# CONFIG_LBD is not set
#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="anticipatory"
#
# Block devices
#
CONFIG_BLK_DEV_UBD=y
CONFIG_BLK_DEV_UBD_SYNC=y
CONFIG_BLK_DEV_COW_COMMON=y
# CONFIG_MMAPPER is not set
CONFIG_BLK_DEV_LOOP=m
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
CONFIG_BLK_DEV_NBD=m
# CONFIG_BLK_DEV_RAM is not set
CONFIG_BLK_DEV_RAM_COUNT=16
# CONFIG_ATA_OVER_ETH is not set
#
# Character Devices
#
CONFIG_STDERR_CONSOLE=y
CONFIG_STDIO_CONSOLE=y
CONFIG_SSL=y
CONFIG_NULL_CHAN=y
CONFIG_PORT_CHAN=y
CONFIG_PTY_CHAN=y
CONFIG_TTY_CHAN=y
CONFIG_XTERM_CHAN=y
# CONFIG_NOCONFIG_CHAN is not set
CONFIG_CON_ZERO_CHAN="fd:0,fd:1"
CONFIG_CON_CHAN="xterm"
CONFIG_SSL_CHAN="pty"
CONFIG_UNIX98_PTYS=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
# CONFIG_WATCHDOG is not set
CONFIG_UML_SOUND=m
CONFIG_SOUND=m
CONFIG_HOSTAUDIO=m
CONFIG_UML_RANDOM=y
#
# Generic Driver Options
#
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
# CONFIG_FW_LOADER is not set
# CONFIG_DEBUG_DRIVER is not set
#
# Networking
#
#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
# CONFIG_NET_KEY is not set
CONFIG_INET=y
# CONFIG_IP_MULTICAST is not set
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_FIB_HASH=y
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_ARPD is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_TUNNEL is not set
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_BIC=y
# CONFIG_IPV6 is not set
# CONFIG_NETFILTER is not set
#
# DCCP Configuration (EXPERIMENTAL)
#
# CONFIG_IP_DCCP is not set
#
# SCTP Configuration (EXPERIMENTAL)
#
# CONFIG_IP_SCTP is not set
#
# TIPC Configuration (EXPERIMENTAL)
#
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
# CONFIG_BRIDGE is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
#
# QoS and/or fair queueing
#
# CONFIG_NET_SCHED is not set
#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_HAMRADIO is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_IEEE80211 is not set
#
# UML Network Devices
#
CONFIG_UML_NET=y
CONFIG_UML_NET_ETHERTAP=y
CONFIG_UML_NET_TUNTAP=y
CONFIG_UML_NET_SLIP=y
CONFIG_UML_NET_DAEMON=y
CONFIG_UML_NET_MCAST=y
# CONFIG_UML_NET_PCAP is not set
CONFIG_UML_NET_SLIRP=y
#
# Network device support
#
CONFIG_NETDEVICES=y
CONFIG_DUMMY=m
# CONFIG_BONDING is not set
# CONFIG_EQUALIZER is not set
CONFIG_TUN=m
#
# PHY device support
#
#
# Wan interfaces
#
# CONFIG_WAN is not set
CONFIG_PPP=m
# CONFIG_PPP_MULTILINK is not set
# CONFIG_PPP_FILTER is not set
# CONFIG_PPP_ASYNC is not set
# CONFIG_PPP_SYNC_TTY is not set
# CONFIG_PPP_DEFLATE is not set
# CONFIG_PPP_BSDCOMP is not set
# CONFIG_PPP_MPPE is not set
# CONFIG_PPPOE is not set
CONFIG_SLIP=m
# CONFIG_SLIP_COMPRESSED is not set
# CONFIG_SLIP_SMART is not set
# CONFIG_SLIP_MODE_SLIP6 is not set
# CONFIG_SHAPER is not set
# CONFIG_NETCONSOLE is not set
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set
#
# Connector - unified userspace <-> kernelspace linker
#
# CONFIG_CONNECTOR is not set
#
# File systems
#
CONFIG_EXT2_FS=y
# CONFIG_EXT2_FS_XATTR is not set
# CONFIG_EXT2_FS_XIP is not set
CONFIG_EXT3_FS=y
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
# CONFIG_FS_POSIX_ACL is not set
# CONFIG_XFS_FS is not set
# CONFIG_OCFS2_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_ROMFS_FS is not set
CONFIG_INOTIFY=y
CONFIG_QUOTA=y
# CONFIG_QFMT_V1 is not set
# CONFIG_QFMT_V2 is not set
CONFIG_QUOTACTL=y
CONFIG_DNOTIFY=y
CONFIG_AUTOFS_FS=m
CONFIG_AUTOFS4_FS=m
# CONFIG_FUSE_FS is not set
#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
# CONFIG_ZISOFS is not set
# CONFIG_UDF_FS is not set
#
# DOS/FAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_NTFS_FS is not set
#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
# CONFIG_HUGETLB_PAGE is not set
CONFIG_RAMFS=y
# CONFIG_RELAYFS_FS is not set
# CONFIG_CONFIGFS_FS is not set
#
# Miscellaneous filesystems
#
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
#
# Network File Systems
#
# CONFIG_NFS_FS is not set
# CONFIG_NFSD is not set
# CONFIG_SMB_FS is not set
# CONFIG_CIFS is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
# CONFIG_9P_FS is not set
#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
#
# Native Language Support
#
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
# CONFIG_NLS_CODEPAGE_437 is not set
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
# CONFIG_NLS_ASCII is not set
# CONFIG_NLS_ISO8859_1 is not set
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_UTF8 is not set
#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY is not set
#
# Cryptographic options
#
# CONFIG_CRYPTO is not set
#
# Hardware crypto devices
#
#
# Library routines
#
# CONFIG_CRC_CCITT is not set
# CONFIG_CRC16 is not set
CONFIG_CRC32=m
# CONFIG_LIBCRC32C is not set
#
# Multi-device support (RAID and LVM)
#
# CONFIG_MD is not set
# CONFIG_INPUT is not set
#
# Kernel hacking
#
# CONFIG_PRINTK_TIME is not set
CONFIG_DEBUG_KERNEL=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_DETECT_SOFTLOCKUP=y
# CONFIG_SCHEDSTATS is not set
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_MUTEXES=y
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_FS is not set
# CONFIG_DEBUG_VM is not set
CONFIG_FRAME_POINTER=y
CONFIG_FORCED_INLINING=y
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_CMDLINE_ON_HOST is not set
CONFIG_PT_PROXY=y
# CONFIG_GCOV is not set
# CONFIG_SYSCALL_DEBUG is not set

Hi,
UML in 2.6.16-rc1-git4 fails to boot a Debian 3.0r2 image. The same
image works boots properly with 2.6.15. Here's the error:
penberg@... ~/src/linux/2.6 $ ./linux
ubd0=3D/home/penberg/virtualized/Debian-3.0r2.ext2
Checking that ptrace can change system call numbers...OK
Checking syscall emulation patch for ptrace...OK
Checking advanced syscall emulation patch for ptrace...OK
Checking PROT_EXEC mmap in /tmp...OK
UML running in TT mode
tracing thread pid =3D 32600
Linux version 2.6.16-rc1-git4 (penberg@...) (gcc version 3.3.6
(Gentoo 3.3.6, ssp-3.3.6-1.0, pie-8.7.8)) #1 Wed Jan 25 11:07:25 EET
2006
Built 1 zonelists
Kernel command line: ubd0=3D/home/penberg/virtualized/Debian-3.0r2.ext2 roo=
t=3D98:0
PID hash table entries: 256 (order: 8, 4096 bytes)
Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)
Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)
Memory: 29388k available
Mount-cache hash table entries: 512
Checking for host processor cmov support...Yes
Checking for host processor xmm support...No
Checking that host ptys support output SIGIO...Yes
Checking that host ptys support SIGIO on close...No, enabling workaround
Checking for /dev/anon on the host...Not available (open failed with errno =
2)
Disabling 2.6 AIO in tt mode
2.6 host AIO support not used - falling back to I/O thread
NET: Registered protocol family 16
mconsole (version 2) initialized on /home/penberg/.uml/Z81xOu/mconsole
ubd: Synchronous mode
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
NET: Registered protocol family 2
IP route cache hash table entries: 512 (order: -1, 2048 bytes)
TCP established hash table entries: 2048 (order: 1, 8192 bytes)
TCP bind hash table entries: 2048 (order: 1, 8192 bytes)
TCP: Hash tables configured (established 2048 bind 2048)
TCP reno registered
TCP bic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
Initialized stdio console driver
Console initialized on /dev/tty0
Initializing software serial port version 1
ubda: unknown partition table
VFS: Mounted root (ext2 filesystem) readonly.
line_ioctl: tty0: ioctl KDSIGACCEPT called
os_set_fd_async : failed to set O_ASYNC and O_NONBLOCK on fd # 6, errno =3D=
9
register_winch_irq - failed to register IRQ
os_set_fd_async : failed to set O_ASYNC and O_NONBLOCK on fd # 6, errno =3D=
9
register_winch_irq - failed to register IRQ
INIT: os_set_fd_async : failed to set O_ASYNC and O_NONBLOCK on fd #
6, errno =3D 9
register_winch_irq - failed to register IRQ
version 2.84 bootingos_set_fd_async : failed to set O_ASYNC and
O_NONBLOCK on fd # 6, errno =3D 9
register_winch_irq - failed to register IRQ
Kernel panic - not syncing: read of switch_pipe failed, errno =3D 11
EIP: 0073:[<400b6e17>] CPU: 0 Not tainted ESP: 007b:9fe4a090 EFLAGS: 002002=
82
Not tainted
EAX: 00000271 EBX: 9fe4a25c ECX: 0000540e EDX: 9fe4a25c
ESI: 00000000 EDI: 00000000 EBP: 9fe4a35c DS: 007b ES: 007b
a0bc3b74: [<a0021de7>] show_regs+0xdf/0xe1
a0bc3ba0: [<a0010521>] panic_exit+0x25/0x3f
a0bc3bb0: [<a0033393>] notifier_call_chain+0x1c/0x3c
a0bc3bd0: [<a0025bd2>] panic+0x4b/0xd3
a0bc3be8: [<a0010ca2>] switch_to_tt+0xc2/0x129
a0bc3c1c: [<a000d0c2>] _switch_to+0x39/0xbe
a0bc3c50: [<a015d319>] schedule+0x41f/0x4a0
a0bc3cb8: [<a000d163>] interrupt_end+0x1c/0x3a
a0bc3cc8: [<a0012167>] sig_handler_common_tt+0xd3/0x100
a0bc3d04: [<a001de03>] sig_handler+0x2f/0x3c
a0bc3d1c: [<ffffe420>] _etext+0x5fe9ea61/0x0
sleeping process 1392 got unexpected signal : 10
remove_umid_dir - actually_do_remove failed with err =3D -2
remove_umid_dir - actually_do_remove failed with err =3D -2
Here's host information using Gentoo's emerge info:
Portage 2.0.53 (default-linux/x86/2005.0, gcc-3.3.6, glibc-2.3.5-r2,
2.6.15 i686)
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
System uname: 2.6.15 i686 Intel(R) Pentium(R) M processor 2.00GHz
Gentoo Base System version 1.6.13
dev-lang/python: 2.3.5-r2, 2.4.2
sys-apps/sandbox: 1.2.12
sys-devel/autoconf: 2.13, 2.59-r6
sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1
sys-devel/binutils: 2.16.1
sys-devel/libtool: 1.5.20
virtual/os-headers: 2.6.11-r2
ACCEPT_KEYWORDS=3D"x86"
AUTOCLEAN=3D"yes"
CBUILD=3D"i686-pc-linux-gnu"
CFLAGS=3D"-O2 -march=3Dpentium3 -pipe -g"
CHOST=3D"i686-pc-linux-gnu"
CONFIG_PROTECT=3D"/etc /usr/kde/2/share/config /usr/kde/3.3/env
/usr/kde/3.3/share/config /usr/kde/3.3/shutdown
/usr/kde/3/share/config /usr/lib/X11/xkb
/usr/lib/mozilla/defaults/pref /usr/share/config
/usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/
/usr/share/texmf/tex/generic/config/
/usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/
/var/qmail/control"
CONFIG_PROTECT_MASK=3D"/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS=3D"-O2 -march=3Dpentium3 -pipe -g"
DISTDIR=3D"/usr/portage/distfiles"
FEATURES=3D"autoconfig distlocks nostrip sandbox sfperms strict"
GENTOO_MIRRORS=3D"http://distfiles.gentoo.orghttp://distro.ibiblio.org/pub/linux/distributions/gentoo";
LANG=3D"en_US.UTF-8"
PKGDIR=3D"/usr/portage/packages"
PORTAGE_TMPDIR=3D"/var/tmp"
PORTDIR=3D"/usr/portage"
PORTDIR_OVERLAY=3D"/usr/local/portage"
SYNC=3D"rsync://rsync.gentoo.org/gentoo-portage";
USE=3D"x86 X acl alsa apm audiofile avi berkdb bitmap-fonts bzip2
cdparanoia cdr crypt cups curl dvd dvdr eds emboss encode esd exif
expat fam fontconfig foomaticdb fortran gd gdbm gif gitsendemail glut
gmp gnome gpm gstreamer gtk gtk2 guile idn imagemagick imap imlib
innodb ipv6 ithread ithreads java jpeg junit lcms ldap libg++ libwww
mbox mhash mikmod mng motif mozilla mp3 mpeg mysql ncurses nls nptl
ogg oggvorbis opengl pam pcre pdflib perl plotutils png postgres
python quicktime readline recode ruby samba sdl slang spell sse ssl
svg tcltk tcpd tetex tiff truetype truetype-fonts type1-fonts udev
unicode usb utf8 vorbis win32codecs xml xml2 xmms xv zlib userland_GNU
kernel_linux elibc_glibc"
Unset: ASFLAGS, CTARGET, LC_ALL, LDFLAGS, LINGUAS, MAKEOPTS
Here is my config:
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.16-rc1-git4
# Wed Jan 25 11:04:03 2006
#
CONFIG_GENERIC_HARDIRQS=3Dy
CONFIG_UML=3Dy
CONFIG_MMU=3Dy
CONFIG_GENERIC_CALIBRATE_DELAY=3Dy
CONFIG_IRQ_RELEASE_METHOD=3Dy
#
# UML-specific options
#
CONFIG_MODE_TT=3Dy
# CONFIG_HOST_2G_2G is not set
CONFIG_KERNEL_HALF_GIGS=3D1
# CONFIG_MODE_SKAS is not set
#
# Host processor type and features
#
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
CONFIG_M686=3Dy
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=3Dy
CONFIG_X86_XADD=3Dy
CONFIG_X86_L1_CACHE_SHIFT=3D5
CONFIG_RWSEM_XCHGADD_ALGORITHM=3Dy
CONFIG_X86_PPRO_FENCE=3Dy
CONFIG_X86_WP_WORKS_OK=3Dy
CONFIG_X86_INVLPG=3Dy
CONFIG_X86_BSWAP=3Dy
CONFIG_X86_POPAD_OK=3Dy
CONFIG_X86_CMPXCHG64=3Dy
CONFIG_X86_GOOD_APIC=3Dy
CONFIG_X86_USE_PPRO_CHECKSUM=3Dy
CONFIG_X86_TSC=3Dy
CONFIG_UML_X86=3Dy
# CONFIG_64BIT is not set
CONFIG_SEMAPHORE_SLEEPERS=3Dy
CONFIG_TOP_ADDR=3D0xc0000000
# CONFIG_3_LEVEL_PGTABLES is not set
CONFIG_STUB_CODE=3D0xbfffe000
CONFIG_STUB_DATA=3D0xbffff000
CONFIG_STUB_START=3D0xbfffe000
CONFIG_ARCH_HAS_SC_SIGNALS=3Dy
CONFIG_ARCH_REUSE_HOST_VSYSCALL_AREA=3Dy
CONFIG_SELECT_MEMORY_MODEL=3Dy
CONFIG_FLATMEM_MANUAL=3Dy
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=3Dy
CONFIG_FLAT_NODE_MEM_MAP=3Dy
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPLIT_PTLOCK_CPUS=3D4
CONFIG_LD_SCRIPT_STATIC=3Dy
CONFIG_NET=3Dy
CONFIG_BINFMT_ELF=3Dy
CONFIG_BINFMT_MISC=3Dm
# CONFIG_HOSTFS is not set
# CONFIG_HPPFS is not set
CONFIG_MCONSOLE=3Dy
# CONFIG_MAGIC_SYSRQ is not set
# CONFIG_SMP is not set
CONFIG_NEST_LEVEL=3D0
# CONFIG_HIGHMEM is not set
CONFIG_KERNEL_STACK_ORDER=3D2
CONFIG_UML_REAL_TIME_CLOCK=3Dy
#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=3Dy
CONFIG_CLEAN_COMPILE=3Dy
CONFIG_BROKEN_ON_SMP=3Dy
CONFIG_INIT_ENV_ARG_LIMIT=3D32
#
# General setup
#
CONFIG_LOCALVERSION=3D""
CONFIG_LOCALVERSION_AUTO=3Dy
CONFIG_SWAP=3Dy
CONFIG_SYSVIPC=3Dy
CONFIG_POSIX_MQUEUE=3Dy
CONFIG_BSD_PROCESS_ACCT=3Dy
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_SYSCTL=3Dy
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=3Dy
CONFIG_IKCONFIG_PROC=3Dy
CONFIG_INITRAMFS_SOURCE=3D""
CONFIG_UID16=3Dy
CONFIG_CC_OPTIMIZE_FOR_SIZE=3Dy
# CONFIG_EMBEDDED is not set
CONFIG_KALLSYMS=3Dy
# CONFIG_KALLSYMS_ALL is not set
CONFIG_KALLSYMS_EXTRA_PASS=3Dy
CONFIG_HOTPLUG=3Dy
CONFIG_PRINTK=3Dy
CONFIG_BUG=3Dy
CONFIG_ELF_CORE=3Dy
CONFIG_BASE_FULL=3Dy
CONFIG_FUTEX=3Dy
CONFIG_EPOLL=3Dy
CONFIG_SHMEM=3Dy
CONFIG_CC_ALIGN_FUNCTIONS=3D0
CONFIG_CC_ALIGN_LABELS=3D0
CONFIG_CC_ALIGN_LOOPS=3D0
CONFIG_CC_ALIGN_JUMPS=3D0
CONFIG_SLAB=3Dy
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=3D0
# CONFIG_SLOB is not set
#
# Loadable module support
#
CONFIG_MODULES=3Dy
CONFIG_MODULE_UNLOAD=3Dy
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_OBSOLETE_MODPARM=3Dy
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=3Dy
#
# Block layer
#
# CONFIG_LBD is not set
#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=3Dy
CONFIG_IOSCHED_AS=3Dy
CONFIG_IOSCHED_DEADLINE=3Dy
CONFIG_IOSCHED_CFQ=3Dy
CONFIG_DEFAULT_AS=3Dy
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED=3D"anticipatory"
#
# Block devices
#
CONFIG_BLK_DEV_UBD=3Dy
CONFIG_BLK_DEV_UBD_SYNC=3Dy
CONFIG_BLK_DEV_COW_COMMON=3Dy
# CONFIG_MMAPPER is not set
CONFIG_BLK_DEV_LOOP=3Dm
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
CONFIG_BLK_DEV_NBD=3Dm
# CONFIG_BLK_DEV_RAM is not set
CONFIG_BLK_DEV_RAM_COUNT=3D16
# CONFIG_ATA_OVER_ETH is not set
#
# Character Devices
#
CONFIG_STDERR_CONSOLE=3Dy
CONFIG_STDIO_CONSOLE=3Dy
CONFIG_SSL=3Dy
CONFIG_NULL_CHAN=3Dy
CONFIG_PORT_CHAN=3Dy
CONFIG_PTY_CHAN=3Dy
CONFIG_TTY_CHAN=3Dy
CONFIG_XTERM_CHAN=3Dy
# CONFIG_NOCONFIG_CHAN is not set
CONFIG_CON_ZERO_CHAN=3D"fd:0,fd:1"
CONFIG_CON_CHAN=3D"xterm"
CONFIG_SSL_CHAN=3D"pty"
CONFIG_UNIX98_PTYS=3Dy
CONFIG_LEGACY_PTYS=3Dy
CONFIG_LEGACY_PTY_COUNT=3D256
# CONFIG_WATCHDOG is not set
CONFIG_UML_SOUND=3Dm
CONFIG_SOUND=3Dm
CONFIG_HOSTAUDIO=3Dm
CONFIG_UML_RANDOM=3Dy
#
# Generic Driver Options
#
CONFIG_STANDALONE=3Dy
CONFIG_PREVENT_FIRMWARE_BUILD=3Dy
# CONFIG_FW_LOADER is not set
# CONFIG_DEBUG_DRIVER is not set
#
# Networking
#
#
# Networking options
#
CONFIG_PACKET=3Dy
CONFIG_PACKET_MMAP=3Dy
CONFIG_UNIX=3Dy
# CONFIG_NET_KEY is not set
CONFIG_INET=3Dy
# CONFIG_IP_MULTICAST is not set
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_FIB_HASH=3Dy
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_ARPD is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_TUNNEL is not set
CONFIG_INET_DIAG=3Dy
CONFIG_INET_TCP_DIAG=3Dy
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_BIC=3Dy
# CONFIG_IPV6 is not set
# CONFIG_NETFILTER is not set
#
# DCCP Configuration (EXPERIMENTAL)
#
# CONFIG_IP_DCCP is not set
#
# SCTP Configuration (EXPERIMENTAL)
#
# CONFIG_IP_SCTP is not set
#
# TIPC Configuration (EXPERIMENTAL)
#
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
# CONFIG_BRIDGE is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
#
# QoS and/or fair queueing
#
# CONFIG_NET_SCHED is not set
#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_HAMRADIO is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_IEEE80211 is not set
#
# UML Network Devices
#
CONFIG_UML_NET=3Dy
CONFIG_UML_NET_ETHERTAP=3Dy
CONFIG_UML_NET_TUNTAP=3Dy
CONFIG_UML_NET_SLIP=3Dy
CONFIG_UML_NET_DAEMON=3Dy
CONFIG_UML_NET_MCAST=3Dy
# CONFIG_UML_NET_PCAP is not set
CONFIG_UML_NET_SLIRP=3Dy
#
# Network device support
#
CONFIG_NETDEVICES=3Dy
CONFIG_DUMMY=3Dm
# CONFIG_BONDING is not set
# CONFIG_EQUALIZER is not set
CONFIG_TUN=3Dm
#
# PHY device support
#
#
# Wan interfaces
#
# CONFIG_WAN is not set
CONFIG_PPP=3Dm
# CONFIG_PPP_MULTILINK is not set
# CONFIG_PPP_FILTER is not set
# CONFIG_PPP_ASYNC is not set
# CONFIG_PPP_SYNC_TTY is not set
# CONFIG_PPP_DEFLATE is not set
# CONFIG_PPP_BSDCOMP is not set
# CONFIG_PPP_MPPE is not set
# CONFIG_PPPOE is not set
CONFIG_SLIP=3Dm
# CONFIG_SLIP_COMPRESSED is not set
# CONFIG_SLIP_SMART is not set
# CONFIG_SLIP_MODE_SLIP6 is not set
# CONFIG_SHAPER is not set
# CONFIG_NETCONSOLE is not set
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set
#
# Connector - unified userspace <-> kernelspace linker
#
# CONFIG_CONNECTOR is not set
#
# File systems
#
CONFIG_EXT2_FS=3Dy
# CONFIG_EXT2_FS_XATTR is not set
# CONFIG_EXT2_FS_XIP is not set
CONFIG_EXT3_FS=3Dy
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_JBD=3Dy
# CONFIG_JBD_DEBUG is not set
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
# CONFIG_FS_POSIX_ACL is not set
# CONFIG_XFS_FS is not set
# CONFIG_OCFS2_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_ROMFS_FS is not set
CONFIG_INOTIFY=3Dy
CONFIG_QUOTA=3Dy
# CONFIG_QFMT_V1 is not set
# CONFIG_QFMT_V2 is not set
CONFIG_QUOTACTL=3Dy
CONFIG_DNOTIFY=3Dy
CONFIG_AUTOFS_FS=3Dm
CONFIG_AUTOFS4_FS=3Dm
# CONFIG_FUSE_FS is not set
#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=3Dm
CONFIG_JOLIET=3Dy
# CONFIG_ZISOFS is not set
# CONFIG_UDF_FS is not set
#
# DOS/FAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_NTFS_FS is not set
#
# Pseudo filesystems
#
CONFIG_PROC_FS=3Dy
CONFIG_PROC_KCORE=3Dy
CONFIG_SYSFS=3Dy
CONFIG_TMPFS=3Dy
# CONFIG_HUGETLB_PAGE is not set
CONFIG_RAMFS=3Dy
# CONFIG_RELAYFS_FS is not set
# CONFIG_CONFIGFS_FS is not set
#
# Miscellaneous filesystems
#
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
#
# Network File Systems
#
# CONFIG_NFS_FS is not set
# CONFIG_NFSD is not set
# CONFIG_SMB_FS is not set
# CONFIG_CIFS is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
# CONFIG_9P_FS is not set
#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=3Dy
#
# Native Language Support
#
CONFIG_NLS=3Dy
CONFIG_NLS_DEFAULT=3D"iso8859-1"
# CONFIG_NLS_CODEPAGE_437 is not set
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
# CONFIG_NLS_ASCII is not set
# CONFIG_NLS_ISO8859_1 is not set
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_UTF8 is not set
#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY is not set
#
# Cryptographic options
#
# CONFIG_CRYPTO is not set
#
# Hardware crypto devices
#
#
# Library routines
#
# CONFIG_CRC_CCITT is not set
# CONFIG_CRC16 is not set
CONFIG_CRC32=3Dm
# CONFIG_LIBCRC32C is not set
#
# Multi-device support (RAID and LVM)
#
# CONFIG_MD is not set
# CONFIG_INPUT is not set
#
# Kernel hacking
#
# CONFIG_PRINTK_TIME is not set
CONFIG_DEBUG_KERNEL=3Dy
CONFIG_LOG_BUF_SHIFT=3D14
CONFIG_DETECT_SOFTLOCKUP=3Dy
# CONFIG_SCHEDSTATS is not set
CONFIG_DEBUG_SLAB=3Dy
CONFIG_DEBUG_MUTEXES=3Dy
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_INFO=3Dy
# CONFIG_DEBUG_FS is not set
# CONFIG_DEBUG_VM is not set
CONFIG_FRAME_POINTER=3Dy
CONFIG_FORCED_INLINING=3Dy
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_CMDLINE_ON_HOST is not set
CONFIG_PT_PROXY=3Dy
# CONFIG_SYSCALL_DEBUG is not set

Blaisorblade wrote:
>On Thursday 19 January 2006 23:23, Jacob Bachmeyer wrote:
>
>>Blaisorblade wrote:
>>
>>>On Thursday 19 January 2006 00:52, Jacob Bachmeyer wrote:
>>>
>>>>Blaisorblade wrote:
>>>>
>>>>>On Monday 16 January 2006 20:34, Jacob Bachmeyer wrote:
>>>>>
>>>>>>Has any thought been given to making SKAS4 suitably generic that it
>>>>>>could be used for more than just UML?
>>>>>>
>>>>>Not yet, thoughts welcome.
>>>>>
>>>>Let's see:
>>>>
>>>>to support HURD (which uses the Mach ABI):
>>>>
>>>> -- existing facilities plus trap lcall gates
>>>
>>>I.e. extend ptrace to trap lcall gates, right? That's another thing, could
>>>be done, but it relates more to the Linux-ABI project... at least this
>>>can't be merged in mainline since we don't support lcall gates.
>>
>>Why not? And for that matter, why does ptrace not currently catch lcalls?
>
>The lcall stub was removed from arch/i386/kernel/entry.S a little time ago
>(about 2.6.12 IIRC). So vanilla Linux can't handle lcalls. Clear now?
Yes, the last time I looked into that part of the kernel was back in
2.4. So, does this mean that lcalls can no longer be potentially used
to escape from UML?
>>>>to support WINE (which follows Win32 conventions (ick!)): (x86 only)
>>>>
>>>> --existing facilities plus
>>>> -- trap on access to specified pages
>>>>
>>>We do that: make them unmapped and trap SIGSEGV through ptrace. Doesn't
>>>work for accesses from kernel-space (you don't get SIGSEGV, just, likely,
>>>-EFAULT). And it's horribly slow. And trapping for kernelspace accesses
>>>is bad.
>>>
>>You don't have to trap kernelspace accesses; (-EFAULT there would be a
>>good thing--the host kernel shouldn't be looking in these pages anyway)
>>this is only to apply to userspace code, but SIGSEGV is slow--why should
>>it be fast? It's an error path.
>
>Yes, it is thought to be only an error path, but UML abuses of it for normal
>control, and I said that the kernel supports "fasttrap", but only via
>SIGSEGV, i.e. in a slow way.
That is the exact problem. It shouldn't be abused--a proper interface
that has acceptable performance should be devised. (You mention
netlink--was it looked into? This might help with some UML performance
issues.) Basically what is needed is a means to set a page to no access
but cause some other action to occur rather than generate SIGSEGV.
>>>We do that: make them unmapped and trap SIGSEGV through ptrace.
>>>
>>The overhead is not all that large, as most Win32 API calls ultimately
>>go into the kernel anyway.
>
>A kernel switch only costs about some thousands TSC units (see the rdtsc
>assembly instruction), while a signal delivery to a foreign process can cost
>a lot more (I measure it in the order of 4* 10^5 TSC units, even without a
>memory switch).
Then a more efficient interface is needed. Besides, this would need to
be synchronous.
>>This also should allow WINE to work well on
>>platforms such as x86-64, without needing multiple WINE binaries.
>>(64-bit control process managing mix of 32 and 64 bit address spaces)
>
>Writing 64-bit code handling cleanly 32-bit syscalls is hard. Compiling 32-bit
>code in 32-bit mode to do the same is simpler.
The problem is that they need to communicate, especially once Win64
actually hits. WINE currently has a (confusing) "relay" layer that
already does similar tasks for 16/32 bit. Furthermore, the Win32 API
calling convention is fairly well defined, (parameters on stack; return
in EAX) so this shouldn't be more of a problem than has been solved in
the past. (That doesn't mean it won't be a real PITA.)
>>The reason to trap is to allow WINE to intercept the call while
>>sitting in another address space. (Each Win32 process would have its
>>own guest address space.) The idea is to have the interfaces UML uses
>>be generic enough for WINE to also use.
>>
>>The reason is simple--improved security by enforcing a sandbox around
>>WINE.
>>
Seccomp (see below--thanks for bringing it up) could more easily be used
to solve this. (Why bother with trapping all the time when only a few
pages really need protection? Furthermore, the external control thread
would thus have veto power over all syscalls made, so the sandbox can be
easily enforced.)
>>>>Then, when the program
>>>>attempts to access a DLL's memory image, the kernel would intercept the
>>>>request and quickly pass it to a userspace thread,
>>>
>>>Good saying, quickly pass it... signals are slow. There faster but more
>>>complicated primitives (I remind netlink for instance).
>>
>>User DLLs (those from the program itself) would actually be mapped. The
>>system DLLs (kernel32, user32, etc.) that WINE itself implements on
>>Linux and that must trap to kernelspace on Windows would be loaded this
>>way.
>>
>>One benefit is to reduce the chance of conflict, as various
>>internal modules in WINE that don't exist in Windows could thus be
>>removed from the visible (to the Win32 app) address space. This could
>>have uses other than WINE, too. One possibility is as a "padded cell"
>>of sorts--a process is started in a guest address space under a control
>>program that intercepts and discards all syscalls. However, certain
>>pages in that address space are used as a restricted system
>>interface--accessing them blocks the accessing thread and causes a
>>(host) syscall to return in the control process. This syscall would
>>block until a guest thread trips a "fasttrap" page and then returns
>>information such as exact address accessed, read or write, and if write,
>>value written. This syscall need not be new--read or ioctl on an
>>appropriate fd (netlink socket perhaps?) would be enough. The control
>>thread then carries out the requested action (whatever that maybe) and
>>permits the jailed thread to again run.
>
>Andrea Arcangeli merged such a "padded cell" functionality, but the allowed
>interface is read, not a page fault. The former is faster and easier to use,
>and also allows writing arbitrary amounts of data.
>
>It's called secure computing (see kernel/seccomp.c for details, and/or look on
>LWN.net for an article about it).
I had looked at this earlier, but hadn't realized that it could be used
to implement this--provided that mm_indirect can make syscalls in a
seccomp address space (bypassing the restriction), this can do
everything that "fasttrap" could (using some help from appropriate code
in userspace). Maybe SKAS4 should add a new seccomp level?
>>>> -- read/write in guest address space
>>>> Explanation: mmap is fine for big changes to an address space
>>>>(such as loading modules), but one capability WINE would need for this
>>>>to be truly useful is 1/2/4/8/16-byte PEEK and POKE. (Some Win32
>>>>programs like to do wierd things with Windows' system code--in
>>>>conjunction with "fasttrap", this would allow WINE to keep such programs
>>>>happy.) As I understand, ptrace already provides this, hopefully
>>>>adequetely.
>>>
>>>It provides this, it could be made a bit faster (I've reviewed a patch
>>>from another project which uses heavily ptrace, which makes that faster).
One down, more to go.
>>>> -- intercept arbitrary interrupts in guest address space
>>>> Explanation: Many older Windows programs (Win16 era)
>>>>occasionally directly invoke various soft interrupts (these are
>>>>basically DOS syscalls). The ability to intercept these is necessary,
>>>>but need not be particularly efficient or fast.
>>>
>>>I recall that hardware IRQ n. x is mapped to k+x, where k is fixed and
>>>low; we now have with ACPI 32 IRQs I guess (on my machine the kernel uses
>>>up to 22 IRQs), so I guess int 0x21 it's going to conflict somewhere.
>>>
>>>That said, this could be added too for interrupts not reserved by the
>>>kernel (that is CPU exceptions). But DOSEMU already runs x86 programs, so
>>>WINE should be able to do it too... ah, yep, it uses vm86, while you need
>>>to do that on a paged system.
>>
>>The only requirement here is to call vm86 in another address space,
>>which is already doable--except on 64-bit hardware, where vm86 doesn't
>>exist anyway.
>
>Wait a moment - Windows 3.1 uses 286 paging, and Win16 userspace progs use up
>to 16M of Ram. You don't have this on vm86(), right?
No, but as I said vm86 is gone on x86-64, which means that DOS soft ints
are somehow caught--inside the address space in question. (WINE
currently runs in-process, I am trying to lay the groundwork to change
that--thus all the crazy stuff previously about "fasttrap" to another
userspace.) Current WINE can use vm86 on i386 platform, however.
This (Win16 programs with 16MiB of RAM) also means that WINE could
always intercept soft interrupts--even without use of vm86.
The other catch is that 64 and 32 bit code doesn't mix very well, and
they must be kept in separate processes normally--thus the reason for a
64-bit control process to be able to handle both 32 and 64 bit address
spaces. The entire kernel is 64-bit anyway, so leaving the option open
can't be too insanely hard.
>>How about a PTRACE_SET_THREAD_RUNNABLE that takes a 1 (RUN) or 0 (STOP)
>>as its argument and has immediate effects? The problem (IIRC) with
>>SIGSTOP is that signals are delivered to all threads in a process,
>
>Isn't there tkill() for this purpose (signals to a specific thread)? And if it
>doesn't work, it should be fixed. Having tons of incoherent APIs is bad, as
>long as things can be done with current ones.
The other problem is that a more specific interface could be much
faster. OTOH, perhaps a better strategy would be to improve the
signals--thus also lessening the other problem (slowness of SIGSEGV) as
well as improving performance generally.
>>>However, currently the idea is sys_mm_indirect , taking an fd representing
>>>an mm context, a syscall number and its parameters, plus a syscall to get
>>>a fd representing a mm context.
>>
>>How are address spaces manipulated? Could ioctls on the mm context's fd
>>be useful?
>
>We don't use ioctls, they are inelegants; SKAS3 uses write which is just as
>bad.
What is inelegant about an ioctl on a special fd? I say that ioctls are
far preferrable to more fds (on other files), or the extra complexity of
implementing some other interface (maybe using netlink?). Besides, if
you implement your own struct file_operations, you get ioctl support by
writing the handler function for it. (If I understand the Linux 2.6.14
VFS correctly). OTOH, if no operations that fall into ioctl's area are
needed, then implementing ioctl for its own sake is silly.
>For SKAS4, instead, you'd use sys_mm_indirectI(); you say:
>
>mm_indirect(addr_space_fd, __NR_MMAP, <mmap_args>)
>mm_indirect(addr_space_fd, __NR_MUNMAP, <munmap_args>)
>
>and so on, for each syscall (excluding fork and exit, for now). To destroy an
>address space you simply call close on its fd.
How do you map region X of the guest address space to region Y (or
somewhere) in your own? mmap/munmap on the address space's fd would
make sense here.
PS: Sorry about the long delay. Mozilla crashed while I had the
compose window for this message buried under several browsers (and
totally forgotten, too--oops).

I cherish you so much and I honestly wish this med shop can help you out.
There are so many misc. things to stress every hour, why have your
affliction be another?
http://xi.1B.smartshoppersdiscovery.com
Their packages always come to patrons quickly and they also make use of
package tracking. I completely have a feeling that this store will stock
things you want and its pricetags are much more economical than what you
currently pay. Then, we can at long last commence speaking about issues more
capitivating than alleviations!
I have some wonderful news that I chanced upon on the world web today!
regard me as stir a poor-spirited slave, wherefore I now went about to shun
his herald presence and eschew dialect his conversation.
type announcement initial
Be productive,
Aegea

On Thursday 19 January 2006 23:23, Jacob Bachmeyer wrote:
> Blaisorblade wrote:
> >On Thursday 19 January 2006 00:52, Jacob Bachmeyer wrote:
> >>Blaisorblade wrote:
> >>>On Monday 16 January 2006 20:34, Jacob Bachmeyer wrote:
> >>>>Has any thought been given to making SKAS4 suitably generic that it
> >>>>could be used for more than just UML?
> >>>
> >>>Not yet, thoughts welcome.
> >>
> >>Let's see:
> >>
> >>to support HURD (which uses the Mach ABI):
> >>
> >> -- existing facilities plus trap lcall gates
> >I.e. extend ptrace to trap lcall gates, right? That's another thing, could
> > be done, but it relates more to the Linux-ABI project... at least this
> > can't be merged in mainline since we don't support lcall gates.
> Why not? And for that matter, why does ptrace not currently catch lcalls?
The lcall stub was removed from arch/i386/kernel/entry.S a little time ago
(about 2.6.12 IIRC). So vanilla Linux can't handle lcalls. Clear now?
> >>to support WINE (which follows Win32 conventions (ick!)): (x86 only)
> >>
> >> --existing facilities plus
> >> -- trap on access to specified pages
> >
> >We do that: make them unmapped and trap SIGSEGV through ptrace. Doesn't
> > work for accesses from kernel-space (you don't get SIGSEGV, just, likely,
> > -EFAULT). And it's horribly slow. And trapping for kernelspace accesses
> > is bad.
> You don't have to trap kernelspace accesses; (-EFAULT there would be a
> good thing--the host kernel shouldn't be looking in these pages anyway)
> this is only to apply to userspace code, but SIGSEGV is slow--why should
> it be fast? It's an error path.
Yes, it is thought to be only an error path, but UML abuses of it for normal
control, and I said that the kernel supports "fasttrap", but only via
SIGSEGV, i.e. in a slow way.
> >We do that: make them unmapped and trap SIGSEGV through ptrace.
> >>These DLLs
> >>are mapped into the process' address space on Windows and under current
> >>WINE, much like shared objects in normal Linux. This idea would enable
> >>WINE to not actually map these DLLs, but rather simply set the pages
> >>where the DLLs would be mapped as "fasttrap".
> >Which is the reason to trap to the kernel? It's going to be slow. A page
> >fault, like a syscall, is costly (and probably more since it's an
> > interrupt).
> >If there is a good reason not to map the DLLs, it may at least make sense,
> > but WINE users aren't going to use special patches, and getting such an
> > hackish thing in mainline may be a hard sell (except the reason is
> > _really_ good).
> The overhead is not all that large, as most Win32 API calls ultimately
> go into the kernel anyway.
A kernel switch only costs about some thousands TSC units (see the rdtsc
assembly instruction), while a signal delivery to a foreign process can cost
a lot more (I measure it in the order of 4* 10^5 TSC units, even without a
memory switch).
> This also should allow WINE to work well on
> platforms such as x86-64, without needing multiple WINE binaries.
> (64-bit control process managing mix of 32 and 64 bit address spaces)
Writing 64-bit code handling cleanly 32-bit syscalls is hard. Compiling 32-bit
code in 32-bit mode to do the same is simpler.
> Also, what exactly are vsyscalls?
> Executables are already demand-paged--so page faults routinely happen
> anyway.
Not the same thing - assuming the working set fits in memory, you get page
faults only for the first access to a given page, and they just jump to the
kernel.
What you're proposing is that for each call to GDI functions, for instance, or
whatever, a signal delivery (or in the best case, just a context switch) is
triggered. That's another thing.
> The reason to trap is to allow WINE to intercept the call while
> sitting in another address space. (Each Win32 process would have its
> own guest address space.) The idea is to have the interfaces UML uses
> be generic enough for WINE to also use.
> The reason is simple--improved security by enforcing a sandbox around
> WINE.
> >>Then, when the program
> >>attempts to access a DLL's memory image, the kernel would intercept the
> >>request and quickly pass it to a userspace thread,
> >Good saying, quickly pass it... signals are slow. There faster but more
> >complicated primitives (I remind netlink for instance).
> User DLLs (those from the program itself) would actually be mapped. The
> system DLLs (kernel32, user32, etc.) that WINE itself implements on
> Linux and that must trap to kernelspace on Windows would be loaded this
> way.
> One benefit is to reduce the chance of conflict, as various
> internal modules in WINE that don't exist in Windows could thus be
> removed from the visible (to the Win32 app) address space. This could
> have uses other than WINE, too. One possibility is as a "padded cell"
> of sorts--a process is started in a guest address space under a control
> program that intercepts and discards all syscalls. However, certain
> pages in that address space are used as a restricted system
> interface--accessing them blocks the accessing thread and causes a
> (host) syscall to return in the control process. This syscall would
> block until a guest thread trips a "fasttrap" page and then returns
> information such as exact address accessed, read or write, and if write,
> value written. This syscall need not be new--read or ioctl on an
> appropriate fd (netlink socket perhaps?) would be enough. The control
> thread then carries out the requested action (whatever that maybe) and
> permits the jailed thread to again run.
Andrea Arcangeli merged such a "padded cell" functionality, but the allowed
interface is read, not a page fault. The former is faster and easier to use,
and also allows writing arbitrary amounts of data.
It's called secure computing (see kernel/seccomp.c for details, and/or look on
LWN.net for an article about it).
> "fasttrap" may have been a poor choice of terms. The idea is to have
> more or less generic kernel-in-userspace functionality with one process
> as a"usermode supervisor" watching a set of other processes.
> >Also, for security reasons it's not possible to let userspace trap OS
> > accesses (as the OS is more privileged - search TENEX at
> >http://www.isi.edu/~faber/cs402/notes/lecture19.html to see how bad is
> > that).
> Perform the API call. It would alter the CPU context, possibly, (if the
> call requires it) also changing the guest address space. There should
> be no OS accesses to these pages--those would not trap, but would return
> -EFAULT because the pages would not actually be allocated. (Win32
> programs should not be making Linux syscalls--a version of WINE that
> uses this would need to catch and ignore any Linux syscalls made.)
> >> -- read/write in guest address space
> >> Explanation: mmap is fine for big changes to an address space
> >>(such as loading modules), but one capability WINE would need for this
> >>to be truly useful is 1/2/4/8/16-byte PEEK and POKE. (Some Win32
> >>programs like to do wierd things with Windows' system code--in
> >>conjunction with "fasttrap", this would allow WINE to keep such programs
> >>happy.) As I understand, ptrace already provides this, hopefully
> >>adequetely.
> >It provides this, it could be made a bit faster (I've reviewed a patch
> > from another project which uses heavily ptrace, which makes that faster).
> >> -- intercept arbitrary interrupts in guest address space
> >> Explanation: Many older Windows programs (Win16 era)
> >>occasionally directly invoke various soft interrupts (these are
> >>basically DOS syscalls). The ability to intercept these is necessary,
> >>but need not be particularly efficient or fast.
> >I recall that hardware IRQ n. x is mapped to k+x, where k is fixed and
> > low; we now have with ACPI 32 IRQs I guess (on my machine the kernel uses
> > up to 22 IRQs), so I guess int 0x21 it's going to conflict somewhere.
> >That said, this could be added too for interrupts not reserved by the
> > kernel (that is CPU exceptions). But DOSEMU already runs x86 programs, so
> > WINE should be able to do it too... ah, yep, it uses vm86, while you need
> > to do that on a paged system.
> The only requirement here is to call vm86 in another address space,
> which is already doable--except on 64-bit hardware, where vm86 doesn't
> exist anyway.
Wait a moment - Windows 3.1 uses 286 paging, and Win16 userspace progs use up
to 16M of Ram. You don't have this on vm86(), right?
> This is exactly it--I wanted to be sure that distinct threads can share
> an address space, while one control process can manage as many address
> spaces as are needed/wanted. There should be no addition here--this was
> mentioned for completeness.
UML will need to have this functionality debugged and working sooner or later
- when it will do SMP with SKAS, it'll need exactly this (you have multiple
managed threads, corresponding to multiple virtual CPUs, and a thread and its
address space can be executed on each of those virtual CPUs).
> How about a PTRACE_SET_THREAD_RUNNABLE that takes a 1 (RUN) or 0 (STOP)
> as its argument and has immediate effects? The problem (IIRC) with
> SIGSTOP is that signals are delivered to all threads in a process,
Isn't there tkill() for this purpose (signals to a specific thread)? And if it
doesn't work, it should be fixed. Having tons of incoherent APIs is bad, as
long as things can be done with current ones.
> >However, currently the idea is sys_mm_indirect , taking an fd representing
> > an mm context, a syscall number and its parameters, plus a syscall to get
> > afd representing a mm context.
> How are address spaces manipulated? Could ioctls on the mm context's fd
> be useful?
We don't use ioctls, they are inelegants; SKAS3 uses write which is just as
bad.
For SKAS4, instead, you'd use sys_mm_indirectI(); you say:
mm_indirect(addr_space_fd, __NR_MMAP, <mmap_args>)
mm_indirect(addr_space_fd, __NR_MUNMAP, <munmap_args>)
and so on, for each syscall (excluding fork and exit, for now). To destroy an
address space you simply call close on its fd.
--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade
___________________________________
Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive
http://it.messenger.yahoo.com

On Thursday 19 January 2006 20:43, Jeff Dike wrote:
> On Thu, Jan 19, 2006 at 04:01:28PM +0100, Blaisorblade wrote:
> > Gerd Knorr in his tty patch, instead, used forward declarations, like:
> >
> > struct task_struct;
> >
> > what about that?
> I don't think so. At least when you use void *, you are using a type
> that's not incorrect. In userspace code, those task_structs start
> referring to host task_structs, which is definitely very wrong.
Possibly yes, but as long as we don't dereference the pointer (and in a
prototype you're not going to do that) there's no problem.
Using a type makes the code clearer, and it doesn't hide any warning GCC may
give (behaving well is left to us only).
In fact, btw (before I forget) we have currently the wrong errno used in
sys-i386/ldt.c. Just wrote the fix (it's adding a silly os_ptrace_ldt). Going
to compile and send.
> > Those functions probably should be moved anyway because they're
> > useless there
> Yeah.
> Jeff
--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade
___________________________________
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB
http://mail.yahoo.it

Blaisorblade wrote:
>On Thursday 19 January 2006 00:52, Jacob Bachmeyer wrote:
>
>>Blaisorblade wrote:
>>
>>>On Monday 16 January 2006 20:34, Jacob Bachmeyer wrote:
>>>
>>>>Has any thought been given to making SKAS4 suitably generic that it
>>>>could be used for more than just UML?
>>>>
>>>Not yet, thoughts welcome.
>>>
>>Let's see:
>>
>>to support HURD (which uses the Mach ABI):
>>
>> -- existing facilities plus trap lcall gates
>
>I.e. extend ptrace to trap lcall gates, right? That's another thing, could be
>done, but it relates more to the Linux-ABI project... at least this can't be
>merged in mainline since we don't support lcall gates.
Why not? And for that matter, why does ptrace not currently catch lcalls?
>>to support WINE (which follows Win32 conventions (ick!)): (x86 only)
>>
>> --existing facilities plus
>> -- trap on access to specified pages
>
>We do that: make them unmapped and trap SIGSEGV through ptrace. Doesn't work
>for accesses from kernel-space (you don't get SIGSEGV, just, likely,
>-EFAULT). And it's horribly slow. And trapping for kernelspace accesses is
>bad.
You don't have to trap kernelspace accesses; (-EFAULT there would be a
good thing--the host kernel shouldn't be looking in these pages anyway)
this is only to apply to userspace code, but SIGSEGV is slow--why should
it be fast? It's an error path.
>> Explanation: Win32 API calls are not syscalls in the normal
>>sense--rather they are made by calling into a system DLL.
>
>Yep, it then can decide whether to trap into the kernel or not (depending on
>that version's implementation).
>
>>These DLLs
>>are mapped into the process' address space on Windows and under current
>>WINE, much like shared objects in normal Linux. This idea would enable
>>WINE to not actually map these DLLs, but rather simply set the pages
>>where the DLLs would be mapped as "fasttrap".
>
>Which is the reason to trap to the kernel? It's going to be slow. A page
>fault, like a syscall, is costly (and probably more since it's an interrupt).
>
>If there is a good reason not to map the DLLs, it may at least make sense, but
>WINE users aren't going to use special patches, and getting such an hackish
>thing in mainline may be a hard sell (except the reason is _really_ good).
The overhead is not all that large, as most Win32 API calls ultimately
go into the kernel anyway. This also should allow WINE to work well on
platforms such as x86-64, without needing multiple WINE binaries.
(64-bit control process managing mix of 32 and 64 bit address spaces)
Also, what exactly are vsyscalls?
Executables are already demand-paged--so page faults routinely happen
anyway. The reason to trap is to allow WINE to intercept the call while
sitting in another address space. (Each Win32 process would have its
own guest address space.) The idea is to have the interfaces UML uses
be generic enough for WINE to also use.
The reason is simple--improved security by enforcing a sandbox around
WINE.
>>Then, when the program
>>attempts to access a DLL's memory image, the kernel would intercept the
>>request and quickly pass it to a userspace thread,
>
>Good saying, quickly pass it... signals are slow. There faster but more
>complicated primitives (I remind netlink for instance).
User DLLs (those from the program itself) would actually be mapped. The
system DLLs (kernel32, user32, etc.) that WINE itself implements on
Linux and that must trap to kernelspace on Windows would be loaded this
way. One benefit is to reduce the chance of conflict, as various
internal modules in WINE that don't exist in Windows could thus be
removed from the visible (to the Win32 app) address space. This could
have uses other than WINE, too. One possibility is as a "padded cell"
of sorts--a process is started in a guest address space under a control
program that intercepts and discards all syscalls. However, certain
pages in that address space are used as a restricted system
interface--accessing them blocks the accessing thread and causes a
(host) syscall to return in the control process. This syscall would
block until a guest thread trips a "fasttrap" page and then returns
information such as exact address accessed, read or write, and if write,
value written. This syscall need not be new--read or ioctl on an
appropriate fd (netlink socket perhaps?) would be enough. The control
thread then carries out the requested action (whatever that maybe) and
permits the jailed thread to again run.
"fasttrap" may have been a poor choice of terms. The idea is to have
more or less generic kernel-in-userspace functionality with one process
as a"usermode supervisor" watching a set of other processes.
>>which handles the "page fault".
>>
>>The page remains set as "fasttrap", and the control
>>process modifies the address space and CPU context appropriately before
>>allowing execution to continue.
>
>"Modifies" to return the call or to map the page in? You seem to imply it
>performs the call and sets the return value in EAX, right?
>
>Also, for security reasons it's not possible to let userspace trap OS accesses
>(as the OS is more privileged - search TENEX at
>http://www.isi.edu/~faber/cs402/notes/lecture19.html to see how bad is that).
Perform the API call. It would alter the CPU context, possibly, (if the
call requires it) also changing the guest address space. There should
be no OS accesses to these pages--those would not trap, but would return
-EFAULT because the pages would not actually be allocated. (Win32
programs should not be making Linux syscalls--a version of WINE that
uses this would need to catch and ignore any Linux syscalls made.)
>> -- read/write in guest address space
>> Explanation: mmap is fine for big changes to an address space
>>(such as loading modules), but one capability WINE would need for this
>>to be truly useful is 1/2/4/8/16-byte PEEK and POKE. (Some Win32
>>programs like to do wierd things with Windows' system code--in
>>conjunction with "fasttrap", this would allow WINE to keep such programs
>>happy.) As I understand, ptrace already provides this, hopefully
>>adequetely.
>
>It provides this, it could be made a bit faster (I've reviewed a patch from
>another project which uses heavily ptrace, which makes that faster).
>
>> -- intercept arbitrary interrupts in guest address space
>> Explanation: Many older Windows programs (Win16 era)
>>occasionally directly invoke various soft interrupts (these are
>>basically DOS syscalls). The ability to intercept these is necessary,
>>but need not be particularly efficient or fast.
>
>I recall that hardware IRQ n. x is mapped to k+x, where k is fixed and low; we
>now have with ACPI 32 IRQs I guess (on my machine the kernel uses up to 22
>IRQs), so I guess int 0x21 it's going to conflict somewhere.
>
>That said, this could be added too for interrupts not reserved by the kernel
>(that is CPU exceptions). But DOSEMU already runs x86 programs, so WINE
>should be able to do it too... ah, yep, it uses vm86, while you need to do
>that on a paged system.
The only requirement here is to call vm86 in another address space,
which is already doable--except on 64-bit hardware, where vm86 doesn't
exist anyway.
>> -- transparently use threads in guest address spaces, if desired
>> Explanation: WINE currently uses the host's scheduler.
>>Changing it to this new API shouldn't adversely affect that ability.
>>(And on second thought, using a UML library might not be an option.)
>>
>>I shall clarify my proposal: each thread is assigned an address space,
>>
>and (you forget to say) it can be changed through PTRACE_SWITCH_MM you mean...
>(otherwise I don't see the addition).
>
>>while an address space can contain multiple threads.
>
>you can PTRACE_SWITCH_MM multiple threads to the same address space
This is exactly it--I wanted to be sure that distinct threads can share
an address space, while one control process can manage as many address
spaces as are needed/wanted. There should be no addition here--this was
mentioned for completeness.
>>Each thread also
>>has a STOP/RUN flag, which if set to RUN, causes the host scheduler to
>>consider that thread for execution (along with all other runnable
>>threads). This flag allows either the userspace control process to make
>>scheduling decisions itself, (by only setting one of its threads to RUN)
>>or to punt and have the kernel handle all scheduling for its threads (by
>>setting them all to RUN and using STOP only to block a thread).
>
>Hmm, sleeping like that is easy if you mean that only a thread can switch
>itself from RUN to STOP. The thread can use some mutex/semaphore thing, at
>that point.
>
>To switch a thread from RUN to STOP from the exterior, you can currentlykill
>it with -STOP. Beware it's maybe slow, but I don't know whether it matters
>and if it can be made much faster.
>
>The problem (I think) is that SIGSTOP will be processed not at kill() time,
>but at delivery time, i.e. after a context switch to the receiving thread,
>before returning to userspace. I've not checked for SIGSTOP and am not sure
>for the rest, but I think it's this way.
How about a PTRACE_SET_THREAD_RUNNABLE that takes a 1 (RUN) or 0 (STOP)
as its argument and has immediate effects? The problem (IIRC) with
SIGSTOP is that signals are delivered to all threads in a process, while
a userspace scheduler needs to wake up or block exactly one thread at a
time. Blocking a thread would be done from the control process, not
from the thread itself. (The call that resulted in it being blocked was
made by touching a page that triggered the control process.)
>>Could all SKAS4 APIs be multiplexed through one syscall? (Perhaps
>>simply as more ptrace functions, or as a new "skas4" syscall?)
>
>"multiplexing" like ipc(2) is a bad idea.
>
>However, currently the idea is sys_mm_indirect , taking an fd representing an
>mm context, a syscall number and its parameters, plus a syscall to get afd
>representing a mm context.
How are address spaces manipulated? Could ioctls on the mm context's fd
be useful?

1 message has been excluded from this view by a project administrator.