Re: [Qemu-devel] simulated memory instead of host memory

From:

Johan Rydberg

Subject:

Re: [Qemu-devel] simulated memory instead of host memory

Date:

Mon, 9 Jun 2003 21:37:10 +0200

On Mon, 09 Jun 2003 21:09:37 +0200
Fabrice Bellard <address@hidden> wrote:
: It would be possible. I spent a lot of time thinking about it, but I did
: not make it because of lack of time and motivation. I see three solutions:
: [...]
: 2) A faster solution is to use 4MB tables containing the addresses of
: each CPU page. One 4MB table would be used for read, one table for
: write. The tables can be seen as big TLBs. Unmapped pages would have a
: NULL entry in the tables so that a fault is generated on access to fill
: the table.
In the current version of GUSS I use a similar technique. I call them
mtcaches, which stands for memory translation caches. They can be seen
as a direct mapped cache, with the virtual page number as index. The
tag is constructed from the virtual address, with the offset masked.
The cache contains <tag, diff> tuples. The diff is the difference between
the virtual address and the host memory address. When there is a mtcache
hit, all that has to be done to get the host memory address is add the virtual
address to the diff value.
When there is a mtcache miss, the full MMU emulation code is called.
It is up to it to add entries to the mtcache (there is separate mtcaches
for reads and write, and user and supervisor mode).
Some early testing (booting the Linux kernel on a simulated MIPS32 4Kc)
shows that you can get a 95% hit rate or more.
On SPARC and other RISC architectures which has bit extraction insns
and register+register addring the test against the mtcache can be done
in 6-8 insns. The testing on IA-32 is a bit more complex (12-14 insns),
mainly due to the limited number of general purpose registers.
This is what my code generator emits for a memory store. The value that
should be stores is located in %ebx. The virtual address in %eax.
%ecx must be pushed on the stack to free a register.
40017160: 0000005b: push %ecx
40017161: 0000005c: mov 0x805cce4,%ebp pointer to mtcache
40017167: 00000062: mov %eax,%ecx
40017169: 00000064: shr $0xc,%ecx
4001716c: 00000067: and $0xff,%ecx 256 entries
40017172: 0000006d: lea 0x0(%ebp,%ecx,8),%esi mtcache entry at %esi
40017176: 00000071: mov %eax,%ecx
40017178: 00000073: and $0xfffff000,%ecx make tag
4001717e: 00000079: cmp %ecx,0x0(%esi) and compare
40017181: 0000007c: jne 0x00000439 miss -> slow way
40017187: 00000082: mov 0x4(%esi),%esi
4001718a: 00000085: add %eax,%esi
4001718c: 00000087: mov %ebx,0x0(%esi) do the store
4001718f: 0000008a: pop %ecx
Can you come to thing of a faster way to do it? Note that I generate
the code by hand (not using GCC).
: 3) An even faster solution is to use Linux memory mappings to emulate
: the MMU. The Linux MM state of the process would be considered as a TLB
: of the virtual x86 MMU state. It works only if the host has <= 4KB page
: size and if the guest OS don't do any mapping in memory >= 0xc0000000.
: With Linux as guest it would work as you can easily change the base
: address of the kernel. The restriction about mappings >= 0xc0000000
: could be suppressed with a small (but tricky) kernel patch which would
: allow to mmap() at addresses >= 0xc0000000.
Since it isn't very portable I don't think it is an option.
: I wanted to implement solution (3) to be able to simulate an unpatched
: Linux kernel (and call the project 'qplex86' !).
:
: To run any OS you would also need precise segment limits and rights
: emulation, at least for non user code.
Of course. Everything has to be simulated. That is the challange :)
--
Johan Rydberg, Free Software Developer, Sweden
http://rtmk.sf.net | http://www.nongnu.org/guss/
Listning to Her Majesty - F.U.N.E.R.A.L.