Find Instructions Hidden In Your CPU

There was a time when owning a computer meant you probably knew most or all of the instructions it could execute. Your modern PC, though, has a lot of instructions, many of them meant for specialized operating system, encryption, or digital signal processing features.

There are known undocumented instructions in a lot of x86-class CPUs, too. What’s more, these days your x86 CPU might really be a virtual machine running on a different processor, or your CPU could have a defect or a bug. Maybe you want to run sandsifter–a program that searches for erroneous or undocumented instructions. Who knows what is lurking in your CPU?

If you don’t think your CPU has a lot of instructions in it, have a look at the list of what’s inside a modern Intel chip and compare it to the relatively tiny list of the original 8086 instruction set (which is still in there, too). According to the project’s website:

Sandsifter has uncovered secret processor instructions from every major vendor; ubiquitous software bugs in disassemblers, assemblers, and emulators; flaws in enterprise hypervisors; and both benign and security-critical hardware bugs in x86 chips.

You can read more in the project’s whitepaper. We were honestly surprised to read: “Typically, several million undocumented instructions on your processor will be found…” However, it appears that these millions of instructions will fall into one of only a few categories.

We aren’t sure if any end user is likely to discover new undocumented instructions in production silicon with this tool. But it could be handy for testing and especially for testing emulation code. If you want even more instructions per chip, you could always get a device with 1,000 CPUs onboard.

I discovered some of those by accident 40 odd years ago. The problem with using them is that if they are undocumented they are not safe to use unless you are doing a one time project for one particular processor and even then it is dodgy.

Not always. NASA did hundreds of hours of tests on the Z80 and found the entire instruction set, including the undocumented instructions, to be consistently reliable. With their emphasis on reliability I suspect they never actually used them as they weren’t part of the official instruction set.

Some of the undocumented instructions were called ‘edge chip instructions’ meaning that they were unreliable at the rated CPU speed due to propagation delays in the die itself. Most however were usable.

Yes… Some of the undocumented opcodes that worked on the APPLE II’s 6502, didn’t work on the 6502 compatible 6510 in the commodore 64.
I remember that 0xa1 and 0xa2 were different opcodes while 0xa3 was reserved. Turns out that it did both when you issued the 0xa3 instruction. I no longer remember the instructions, but: http://www.oxyron.de/html/opcodes02.html tells me it was LDA and LDX and they document 0xa3 to be LAX (LDAX i.e. do both) which as officially “reserved”.

Those weren’t so much hidden instructions as incompletely decoded instructions. Some were useful, some were useless, some were duplicates of existing instructions.

The 8085, on the other hand, did have real hidden instructions. Intel, for whatever reason, didn’t document them. IIRC, mostly they were 16b register pair operations, eg, shift a register pair one bit. Actually, rather than remember, I simply googled and this was the first hit:

The modern x86 chip isn’t really an x86 chip because it’s functionality is defined by microcode. While this is interesting, I would be far more interested in something that would work to discover what each microcode bit does. Still very cool but I’m like Lars R., I’d like a RISC-V chip though not on FPGA.

I designed a processor a few years back that stored all it’s microcode instructions in eeprom and register loops. You could load and modify them during execution. Fast for the dip on breadboard setup but at the time, headache inducing to program. If I do it again I’m going to incorporate hardware management of the instruction set.

The IBM System 360 mainframe ISA (Instruction Set Architecture) was defined by microcode loaded from diskette drive inside the CPU cabinet during the initial power-on sequence. Two instruction sets were available, commercial and scientific.

Not all 360 mainframes used a diskette drive. In fact, I don’ t think the diskette drives were used until the 370 systems. I know the 370/135 and 370/145 had diskette drives. I debugged many 135s and 145s.

I can beat that by a few years. mid eighties I was assigned to do a lab on the PDP11/60 that was bought in the mid seventies by the TUDELFT (ehh TH Delft back then). It had programmable microcode (for our assignment we left it at the normal PDP11 instruction set).

That sounds interesting. One of my interests is processor architecture and I looked at microcoded processors quite a bit. At one time I worked for Datapoint doing test and repair of their 5500 processor boards which were microcoded in EPROM. They were the forerunner of the Z80. Is your design online and do you have any links to it?

Currently my design efforts are toward a single instruction processor that I call NISC. I have a draft of the specification at https://github.com/BillBohan/NISC and welcome comments, suggestions and contributions.

Well, since the answer to that is obvious, I’m going to assume you are trolling. I assume what they mean is you have an instruction “prefix” that load some additional bytes and each of those counts as an instruction even though we would not normally write them as such. For example, suppose a byte-sized computer (because I’m too lazy) has a load immediate instruction, F8, and that it loads the next byte. So F800 is one instruction and F801 is another instruction …. all the way to F8FF. So a tool like this could identify that as 256 instructions (F880 is load 80 to accumulator). We would say it is one instruction with an “argument.”

The answer to that is NOT obvious, and I am not trolling. You are guilty of grandstanding in order to push an ‘article’ which has little merit, if any.
Why would you state that “…since the answer to that is obvious“…, and then follow that immediately with “…I assume what they mean is…”
You can not have it both ways: if the answer is obvious, you can NOT make assumptions about how the answer is arrived at. Since the answer is so absolutely obvious, would you like to tell us EXACTLY what they meant?

Your patronizing attitude is a poor choice, and insulting to everyone who reads you: I know exactly how computers operate; I’ve been designing them since the DEC PDP-8 had less than ten instructions, and Ken Olsen claimed, just as you are doing, that it had more than sixty. Try harder, and with more–a lot more–rigour next time.

The next time you want to make an absolutely valid assumption, try this: assume that perhaps someone else knows more than you. You can’t go wrong, whatever field you’re in.

Well if you have that much experience then you would know the difference between mnemonic, op-code and instruction. Furthermore you should have realized from the stated permutations that the author was referiing specifically to instructions that have a unique op-code rather than simple mnemonics.

…so, you get paid to read the author’s mind, and to speak for him, do you? When you are wrong, do you have to pay him? [I, and everyone else who reads HackaDay, just knew you’d be heard from. The smartest man in the room–any room, anywhere. ANY subject.]

“The next time you want to make an absolutely valid assumption, try this: assume that perhaps someone else knows more than you.”
You can’t go wrong, whatever field you’re in. Or think you’re in. Or think you know.

It’s not obvious. The “several million” by your logic is in fact too conservative. Example: ADD an infinite number of numbers. Vowel! An infinite number of instructions. POKEing fun there at those who say “Wah la” or “Viola” for Voilà.

It is unlikely yet plausible that some of the undocumented instructions could be the result of the RISC core being accessible somehow from ring-0 (Kernel) and/or userspace. Or duplicates of other instruction pairs (If this then optimization hacks maybe possible this way)

I can see where you might want some opcodes to go directly to the RISC core for performance reasons, but I would expect the ‘translation bus’ block everything else.

Unless there is a ‘bug’ in the ‘translation bus’ of course.

The other alternative is some of these genuinely are ‘official’ instructions which haven’t been documented for some reason.

The Z80’s undocumented index register instructions were mostly a side effect of ‘duplicating’ the hardware for existing features. Think of it as an early copy-paste error :)

The intended behaviour was to manipulate the IX/IY registers in the same way as HL register pair by using the appropriate prefix byte with the existing the HL opcodes and that’s precisely what they got. Including the opcodes for manipulating the H and L registers separately.

At the other end of the ‘spectrum’. The Wang VS had a number diagnostic opcodes only documented in engineering manuals not available to end users. They were pretty strict about that.

It is just word games centering around obtuse values of “different.” What they mean is, if it does what it does in a different way than they expected, they’re going to say it isn’t a True Scotsman and is really Some Other Thing We Don’t Know What. That way it sounds more interesting than just, “All CISC processors use microcode because if you don’t have a complicated branch prediction you’d be using RISC instead.”

If you care about total throughput per dollar, you’re going to want CISC and don’t try to ask what it is “really” doing, because whenever you think you know there is another level of abstraction under that. OTOH if you use a simple enough RISC processor you might have direct access to literal registers, instead of just things that are called registers and can be represented with a block diagram of a register and accessed through instructions that have the word “register” in the description.

Why they care what is happening inside that IC is a whole additional question that might actually be more interesting. Another interesting question is, why are most of the people who claim to hate microcode people who never buy a desktop RISC-based system? The answers are funny.

“these days” there are no Transmeta CPU’s being made, and most of the products using them are old and almost obsolete.
But .. Transmeta technology was licenced by Intel, Nvidia, Sony, NEC and Fujitsu, so there is a possibility that the technology is still in use.

It’s not always entirely the manufacturers fault though. I know from personal experience just how difficult it can be to get the information you need to *package* an application out of the programmers, never mind actually documenting the thing. They’re always far too busy to spare the time it would take.

What I like about HaD is how the articles tend to concisely explain the hack itself: the problem(s) faced & the genius solution for it, with pointers to the source for more details, and discussion on the hack itself. Sadly in this case the hack itself was not described in the HaD article, and the discussion comments don’t discuss the hack itself in any way despite there being already 40 comments. Most of the comments are people showing off what they already knew… Nevertheless I am very glad this news was mentioned here, even if I had rather low expectations: I followed the link to the whitepaper anyway, there I read the actual hack, and I was not disappointed :)

TLDR:
problems:

* Combinatorial explosion: with known instruction lengths up to 15 bytes: even if we had a cooperative oracle that answers: “yes(no) this n byte sequence is (not) an instuction” that would still leave us with brute force evaluating the oracle for 120 bit inputs

* How do we build this oracle? Even if the manufacturers choose to keep certain instructions undocumented?

solution:

* Combinatorial explosion: if we can detect instruction length we can heavily prune a depth first search (assuming that prior bytes encode the instruction, and latter bytes encode parameters like offsets, immediates etc) making the search feasible

* To build this oracle: arrange 2 adjacent pages of memory, both with read and write access, but only the first is executable, and the second is non-executable. We will place the byte sequence near the end of the first page, crossing into the second, and then try to execute (jump to) the byte sequence. As long as the instruction decoder considers the instruction incomplete it fetches the next byte, if at any point the next byte is in the second page, a page fault occurs. SO A PAGEFAULT CAN LEAK INFORMATION OF THE DECODERS INTERPRETATION OF A VALID INSTRUCTION! by starting with only the first byte of the sequence in the first page, and shifting the sequence to the left whenever there was a pagefault until there is no pagefault, we can find out the length of an instruction in a sequence of bytes…

Al Williams documented what the person did:
The person made a program to find undocumented instructions,

Al Williams also linked to “Further info”, and yes I’ve just finished reading it.
It documents the techniques they thought of and decided on how to minimize the fuzzing area to just opcodes that seem to respond.
They also incorporate fail-safes using page segmentation and undefined exception handlers (Exception IRQs GND_PF and GND_UD) as their hardware catches.
The hardware GND_PF catch is used as an instruction size/length finder.
Also by single-stepping each potential instruction and skipping mode-change instructions alongside SYSENTER, SYSCALL etc…
Said white paper documents the methods and techniques quite well.

The slide-show(Presentation) PDF however, As long as you skip the middle chunk out as it repeats long after people theoretically have spotted the patterns…. OK… IMHO, Just no for the slide-show…

On the other hand, Mostly you are right about others just spouting random things or already common knowlege. Which BTW is not common to the average pleb who doesn’t know ACPI safely shuts down their PC from the front panel on/off switch, nor do they know how to set it to do so and exclaims, “Oh noez wy U tehrn off yawre laptop from teh power switch??? Doesn’t dah braekz yuo’er battery disk dryve???”.

Though there are such plebs that has posted whom seems to confuse, “Finding” with, “Knowing” and going on their own rant.

Seriously, surely you’ve been tech support to some right thick people… Haven’t you?

don’t misunderstand me, I’m not DEMANDING to be spoonfed the actual hacks, only mentioning I LIKE to be spoonfed the actual hacks before I look at the gory details.

I understand I may not appear thankful, but I really am thankful for the article (in its current form). As you say it contained all the necessary links to read the gory details, which allowed me to appreciate the actual hack by reading the whitepaper (which I read before the slides, and contained more information in detail). I’m only saying that with some more spoonfeeding the actual hack, any readers with the requisite background would almost instantly appreciate the trick. This could have been done in a relatively simple paragraph like:

“By carefully placing candidate instructions across a memory page boundary (executable before, non-executable after) and catching resulting pagefault’s [xoreaxeaxeax] was able to detect instruction lengths according to the bare metal (the instruction decoder in fact) as opposed to documentation: as long as part of the instruction resided in the latter page, a pagefault would be generated. By shifting the same sequence of bytes one byte to the left and retrying, at some point the single instruction resides completely in the executable page and no pagefault is generated. Next [xoreaxeaxeax] combines his instruction-length-detector with a cleverly pruned depth-first search to efficiently enumerate all instructions instead of having to bruteforce up to 15 bytes (the known longer documented instructions). For more details on all the gotcha’s along the way see …”

I speculate people would have stayed more on topic if they had actually read something describing not just the feat but also the hack, … but I could be wrong, since most articles describe the hack and people still go on off-topic rants…

Again I appreciate the article in its current form (without it I probably wouldn’t have checked it out), I’m only complaining I would have liked some ice cream with my free apple-pie :)

Well iAMT runs on a separate ARM core doesn’t it? embedded into the whole CPU. so you’d need to first get to the ARM and then try ARM code on that bit.
But that’s more interesting on a higher level, try to find out what that OS does, but the problem there is also that it uses encryption.

And incidentally, AMD also has that shady/tricky stuff in their CPU’s. Just under a different name.

One somewhat interesting use for this would be in producing changelogs for Intel microcode. Typically when they issue a new revision, it comes out as a binary blob sent directly to the CPU, and has no other errata / changelog / documentation beyond that. There is practically no way of knowing what is different between microcode revisions.

However, using this tool, you could profile a CPU before applying new microcode and then do it again afterwards. The result would show you if there was any difference in the ISA.

that’s a good idea, if the microcode update is not a bug fix but actually adds a new instruction that should work, but I don’t know if the instruction decoder can be reprogrammed to recognize a new instruction format. If new instructions can be added I suspect that instead of “adding” a new instruction I would suspect the instruction decoder to remain fixed, and some initially reserved instruction (which would already be detected by sandsifter before the microcode update) to become implemented with a microcode update (such that it would no longer throw an undefined instruction fault)

another “use” would be control flow obfuscation: software could intentionally arrange new code pages dynamically and set some of them as non-executable in such a way that it knows a pagefault will be thrown and catch it (just like this sandsifter tool does) in a controlled way. Any static disassembly would probably fail to realize the code will at some point in the middle of an instruction enter a nonexecutable page, so the flow graph would be missing this “jump”…