I'm wondering on how this works:You write a real mode emulator and let in run in protected mode, you tell it to execute int 10h, somehow it magically does a VBE mode switch.Can somebody explain this magic part to me?

Also what is the minimum amount of instructions I would have to emulate in order to be able to execute int 10h interrupt handling code?Why are you doing this? Well this topic interests me and sounds like a fun project (x86 Real Mode Emulator for Mode Switching).

An emulator - such as libx86emu which was specifically written for just this - executes code pretending to be a real machine. It is a closed system - if you want it to affect the real machine, then the emulated one has to forward to the real one, often by sharing I/O ports, BIOS and video card memory ranges.

The instructions you need? No two video cards are equal. Be prepared to support at least everything a 486 has, protected mode and everything.

_________________"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie[ My OS ] [ VDisk/SFS ]

An emulator - such as libx86emu which was specifically written for just this - executes code pretending to be a real machine. It is a closed system - if you want it to affect the real machine, then the emulated one has to forward to the real one, often by sharing I/O ports, BIOS and video card memory ranges.

The instructions you need? No two video cards are equal. Be prepared to support at least everything a 486 has, protected mode and everything.

Yup, after some deeper digging, it looks like I/O ports are the main thing that is used for mode setting internally. The interrupt code itself is just to generate the required values that need to be forwarded trough ports.I also found out that protected mode support is not required and that all the code is 16 bit for compatibility reasons.

zity wrote:

Hi Octacone,

I remember reading this older post, which contains a lot of explanation and useful information.

I've started working on my emulator since, the thing that bothers me is, how do I handle single opcode multiple instructions?For e.g. 0x80 can be add, adc, and, xor, or, sbb, sub, cmp.Edit: I just discovered that opcodes are not just "randomly assigned numbers by Intel", there is a whole lot of things going on. Prefix bytes, m/r byte, SIB... a lot more that I initially thought.

I've started working on my emulator since, the thing that bothers me is, how do I handle single opcode multiple instructions?For e.g. 0x80 can be add, adc, and, xor, or, sbb, sub, cmp.Edit: I just discovered that opcodes are not just "randomly assigned numbers by Intel", there is a whole lot of things going on. Prefix bytes, m/r byte, SIB... a lot more that I initially thought.

Each opcode is one instruction (though it might have different mnemonics in assembly to make things more clear). Sometimes prefix + opcode is a different instruction; that's specified explicitly where necessary.

As for the 0x80 thing: the ModR/M byte following the 0x80 opcode has 3 unsued bits (where normally you would specify a register operand) because it doesn't need 2 register operands. For example for "XOR r/m8, imm8" the encoding is "80 /6 ib", meaning the byte 0x80 is followed by a ModR/M byte where the 3 unsued bits are assigned the value "6", and then an immediate byte. The other instructions have different values in those 3 unused bits. As such, these 3 bits are actually an extension of the opcode, and that's how you differentied them.

(I wrote an x86 assembler as part of a project once, these things do get quite confusing. Trying to actually emulate these instructions is a whole new level of complex altogether)

_________________Glidix: An x86_64 POSIX-compliant operating system, aiming to be as optimized as possible, especially in graphics.https://glidix.madd-games.org/

I've started working on my emulator since, the thing that bothers me is, how do I handle single opcode multiple instructions?For e.g. 0x80 can be add, adc, and, xor, or, sbb, sub, cmp.Edit: I just discovered that opcodes are not just "randomly assigned numbers by Intel", there is a whole lot of things going on. Prefix bytes, m/r byte, SIB... a lot more that I initially thought.

Each opcode is one instruction (though it might have different mnemonics in assembly to make things more clear). Sometimes prefix + opcode is a different instruction; that's specified explicitly where necessary.

As for the 0x80 thing: the ModR/M byte following the 0x80 opcode has 3 unsued bits (where normally you would specify a register operand) because it doesn't need 2 register operands. For example for "XOR r/m8, imm8" the encoding is "80 /6 ib", meaning the byte 0x80 is followed by a ModR/M byte where the 3 unsued bits are assigned the value "6", and then an immediate byte. The other instructions have different values in those 3 unused bits. As such, these 3 bits are actually an extension of the opcode, and that's how you differentied them.

(I wrote an x86 assembler as part of a project once, these things do get quite confusing. Trying to actually emulate these instructions is a whole new level of complex altogether)

It is a real pain, so many variations. I've been reading on this topic for days and I'm still struggling to catch up.There are just not many resources on 8086 instruction decoding.

It is a real pain, so many variations. I've been reading on this topic for days and I'm still struggling to catch up.There are just not many resources on 8086 instruction decoding.

You don't need many. Just the CPU manual (have both, intel and AMD) and a way to experiment and check your understanding of the manual. For the latter use an assembler and a disassembler. NASM (and its NDISASM) will work perfectly here. You may also want a hex file viewer and a programmer's calculator. That's all.

Start with e.g. the add instruction. Write a bunch of different variants of it like so:

I would recommend the disassembly method mentioned above for all assembly language programmers. It helps to get a grasp of instructions and the their logic. No need to spend too much time inspecting the bytes but maybe a few hours or a day? In addition to that, trying labels and seeing what values the assembler sets to those could be enlightning, e.g. how "mov ax, my_label" or "jmp my_label" translate to bytes and how the values change when code is modified or assembler directives are used. As a more advanded topic, check how object files handle relocations.

The A86 assembler has a file called A86MANU.TXT; the section "The 86 Instruction Set" has always been the simplest to understand in my opinion.

Additionally, sandpile.org is indispensable (specifically look at their opcode encoding and opcode groups).

+1, that file contains a metric ton of useful data, was looking for something like that.

alexfru wrote:

You don't need many. Just the CPU manual (have both, intel and AMD) and a way to experiment and check your understanding of the manual. For the latter use an assembler and a disassembler. NASM (and its NDISASM) will work perfectly here. You may also want a hex file viewer and a programmer's calculator. That's all.

Start with e.g. the add instruction. Write a bunch of different variants of it like so:

Observe (from the assembly listing or from disassembly of the binary (e.g. "ndisasm -b 16 file.bin")) how they're encoded.

For fun try to do the reverse. Given an instruction description/encoding, try to encode it by hand and see that the disassembly of your bytes gives you the expected instruction.

Extend this to 32 bits, throw in segment override prefixes, etc.

Beware, some instructions may have alternative encodings.

That's a smart idea. I didn't know ndisasm existed. Although it would be useful to have a program that could differentiate between prefixes, opcodes and other bytes, instead of having them all written together.

Antti wrote:

I would recommend the disassembly method mentioned above for all assembly language programmers. It helps to get a grasp of instructions and the their logic. No need to spend too much time inspecting the bytes but maybe a few hours or a day? In addition to that, trying labels and seeing what values the assembler sets to those could be enlightning, e.g. how "mov ax, my_label" or "jmp my_label" translate to bytes and how the values change when code is modified or assembler directives are used. As a more advanded topic, check how object files handle relocations.

I'll definitely have to address jumps and calls sooner or later, since VBE code jumps around a lot.Getting my code to recognize the instruction is the hardest part, emulating them is easy. After all I don't need all of them.

Near and short direct jumps and calls are encoded relative to the end of the instruction. You get an opcode and then an offset, and you first set IP to the end of the instruction and then add the sign-extended operand to get the new IP.

So for instance, the following snippet:

Code:

hltloop: hlt jmp hltloop

is encoded:

Code:

F4 EB FD

That last FD being -3 when sign-extended.

Far and indirect calls and jumps encode their target absolutely. So for instance, in 16-bit mode, the code bytes

Code:

FF 27

mean

Code:

jmp [bx]

And that means: Look in memory at the word BX is pointing to and copy that into IP.

VBE code might also use software interrupts. If you don't know the function called in that case, you might also just emulate that as an indirect far call that pushes flags.

Who is online

Users browsing this forum: No registered users and 2 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum