Posted
by
timothy
on Friday November 12, 2010 @08:06AM
from the sooper-seekrit dept.

An anonymous reader writes "A hidden (and hardware password protected, by means of required special values in processor registers) debug mode has been found in AMD processors, and documented by a reverse engineer called Czernobyl on the RCE Forums community today. It enables powerful hardware debugging features long longed for by reverse engineers, such as hardware data-aware conditional breakpoints, and direct hardware 'page guard'-style breakpoints. And the best part is, it's sitting right there in your processor already, just read the details and off you go with the debugging ninja powers!"

I wondered the same thing - if these debug features are useful to developers debugging their own software, why not market this as a feature? The only thing that occurred to me, is that, maybe there is some sort of security problem with this debug functionality? Does anyone know - could these debug features be used to do something like break Operating System security models, leading to privilege escalation issues, or for other nefarious purposes?

It is possible that the debug features are for their internal use and they don't quite work as intended. They may be useful as such, but if there are implementation bugs that require cumbersome work-arounds on the software side, it may be that they are waiting for a non-buggy implementation before publically documenting the features.

It is also possible that they don't want to put the resources to documenting and supporting the debug features. After all, AMD is a small company compared to Intel and not that profitable. They even had some layoffs during the worst recession - what if they had to lay off the guys responsible for the debug mode?

It's probably that AMD doesn't want to claim that they ever marketed the feature as such. If they did, it would put Intel up to create and release a debugging interface for their silicon. Then both would be forced into competing to produce a better debugging interface. This drives production costs up for a component that may be used by less than 1/100 of a percent of the users when they should have been putting their efforts elsewhere.

it would put Intel up to create and release a debugging interface for their silicon.

Maybe Intel already has a debugging interface on their silicon. This AMD interface has remained hidden for who knows how many years, why couldn't the same thing happen with Intel? After all, it's not as if just anyone can reverse engineer a CPU.

Intel has provided debug registers for ages. You can have up to four hardware watchpoints in pretty much any Intel - and AMD - chip. TFA is Slashdotted at the moment, but 'hardware data-aware conditional breakpoints, and direct hardware 'page guard'-style breakpoints' can both be implemented on any chip since the 386.

In fact, this is a fairly small incremental improvement over the existing hardware debugging support in x86 chips. It provides some extra control codes allowing the address in DR0 (one of the four registers i386 provides for hardware watchpoints) to do some slightly more clever things. For example, a watchpoint can be triggered on a partial match, rather than an exact match, to the address - this is really nice because it lets you put a watchpoint on the whole of any data structure that fits within a page. With the i386 watchpoints, you can only watch a single word with each register (4 words in total), while this means you can watch anything smaller than a page (and you can watch things bigger than a page by marking the page as no access, trapping the access, then unprotecting, single-stepping through the load / store, and continuing the process, which is how you implement watchpoints when you run out of debug registers).

PAGE IN CONSTRUCTION
You can help Yes, YOU - Please DO correct grammar and other English language mistakes - Please add formatting, bold, italics... as needed for better accessibility - Please contact me over any errors inaccuracies complement to the technical contents - If you own a 64-bit AMD processor, please DO check and report any differences. Contents are intended to be released at a later time under liberal copyright copyleft options. Please do NOT make the contents available elsewhere BEFORE it is READY to be unleashed I retain the 'moral' ownership rights.

TODO mucho...

Summary AMD processors Athlon XP and better have included firmware-based debugging features that expand greatly over standard, architecturally defined capabilities of X86. For some reason though, AMD has been tightly secretive about these features hint of their existence was gained by glancing at CBID's page URL below . Herein will be uncovered what was found through direct experimentation, in the hope it may be useful to software developers, possibly included in future debuggers or debugger plug-ins. I'll use the term expanded for the capabilities covered here, since the term debug extensions was already taken. The item is WORK IN PROGRESS, but USABLE hence released AS IS Author can be contacted email, or PM, or public thread on the reversing board Note All addresses values are hexadecimal unless noted otherwise.

The following four new machine specific registers MSR are involved in the expanded debug facilities. All those MSRs are password protected against casual access read write access RDMSR WRMSR to the registers is granted only if EDI holds the correct password value, viz. EDI 9C5A203A. Otherwise, a GPF exception occurs.

All these registers default reset values are zero.

Control C001_1024 width 8 bits Data_Match

C001_1025 width 32 AMD64 64 Data_Mask

C001_1026 - ditto - Address_Mask

C001_1027 width 12 bits

Let's start by looking at the Control MSR DbgCtrlMSR2, C001_1024 since it is the key to the new features. The low 8 bits can be set reset - only bit 0 and bit 1 will be examined here. The other bits had no effect in my preliminary, limited experimentation and will be left at zero. Readers are invited to further investigate report A - When Control 0 default , the AMD processor's debug firmware operates as defined by X86 architectural specs. In order to switch on the expanded debug capabilities, we must set Control MSR's bit 1. IOW, set Control 2 or 3, more on this later.

B - When Control 2 binary 10 , the operation of any breakpoint defined in DR0 is modified as will be described. DR1 to DR3 breakpoints aren't affected at all . DR0-based breakpoint is now controlled by the new registers Addres_Mask, Data_Match and Data_Mask, in addition to DR7. General notions about the Mask MSRs both in Data_Mask and Address_Mask, a 1 bit means don't care.

Formally, when a comparison of two values addresses needs to occur under mask, the Mask is ANDed to both and then the compare is done. A match occurs if the masked values are equal. A mask value of zero thus is equivalent to no mask. A mask of all ones makes match occur on just one value considering compare length . Note for Address_Masks, those only make sense that are binary zeroes followed by zero or more binary ones. Not sure if that is enforced by the AMD firmware Now let's examine expanded DR0-breakpoint operation Instruction breakpoints DR7 type 0 break occurs at any address which matches the BP address in DR0 under Address_Mask. Recall Address_Mask is significant to 12 bits only. Thus the widest possibl

Exactly, it's probably a bit of a kludge, and making it into a stable, documented, supported feature is going to be expensive with a lot of support and a small user base.

I have modes like this in some of my own products, and sometimes I'm leery of even having some other people on my own team have access to the debug modes, because of the potential for disaster and a WHOLE lot of handholding from me.

It's not worth the time it would take for me to set it up for broader use, and if I did, they would break thi

It is possible that the debug features are for their internal use and they don't quite work as intended.

Ding ding ding ding... we have a winner!

I work for a processor design company. If this feature is kept secret, it's because the company does not want to put in the resources to make sure it works completely on every chip. It probably uses lots of hacks and violates the architecture in some obscure way. AMD does not want customers depending on this feature and then insisting that it works for future design wins.

Perhaps the slashdotted site answers this but I have to wonder why not just have a separate opcode to turn the debugging on?

Because there's already a whole bunch of privileged MSRs that normal user code mustn't have access to - many of which are undocumented and processor-specific - so adding a few more is no big deal. Adding new opcodes, on the other hand, requires more work and risks them clashing with Intel's opcode choices at a later date.

Any CPU debug mode worthy of the name should be able to violate OS security six ways from Sunday, and silently at that, without any difficulty. By the same token, though, any CPU debug mode worthy of shipping in commercial silicon really ough to be possible for the firmware and/or kernel to lock for the duration of operation. If userspace can kick it off, a brave and exciting new world of AMD-specific malware is about to begin...

If an OS running on real hardware can block this call coming from user-mode then a hypervisor can block it coming from a VM. And if it can't be blocked you're p0wned either way. A virtual machine makes no difference.

Not that I'm that knowledgeable about virtualization software, but I can't imagine that they would run priviledged code in the virtual machine as priviledged code on the host CPU, so it doesn't matter anyways.

In a VM running under hardware-based virtualization (AMD-V / VT-x) ,
privileged code in the guest generally does run at privilege level 0, also referred to as (privileged), kernel mode, or ultimate privilege. This is required to implement a protected mode operating system; a modern guest OS

Not necessarily. Memory access can be blocked because the MMU controls what and where an application can write, transferring control to known code (the OS) on violations. Interrupts can be blocked because invoking them gives control to the OS. Priviliged instructions can be blocked because non-ring0 execution gives control to the OS.

The OS can't choose to block for example the "xor" instructions in any reasonable way. It's possible by basically single-step

AFAIK they are packaged with every major linux distro out there, and I can't but presume that Windows ships with microcode patches as well.

Microcode updates for Windows machines are distributed through Microsoft Update and are downloaded and installed automatically if automatic updates is enabled (and it is enabled by default). No BIOS update required.

An example of such an update can be found by looking at Microsoft KB936357 [microsoft.com]

Any CPU debug mode worthy of the name should be able to violate OS security six ways from Sunday...

Any security model worthy of the name would be agnostic to whether the CPU was in user mode or debug mode. While there is always the risk of a bug or a security hole, I can assure you that anything that goes into the chip goes in under the scrutiny of the security model. I know of many instances where some debug or test feature was not implemented because there was some potential threat to the security model.

Yeah, security of DRM or TPM is probably easier to compromise (instead of taking several months to break a new DRM system it would take days since reverse engineering can be done much faster)... Which also is an obvious reason they would want to hide this feature to avoid pressure from certain groups.

Does anyone know - could these debug features be used to do something like break Operating System security models, leading to privilege escalation issues, or for other nefarious purposes?

Exactly my thoughts as well. Perhaps if these features were documented, and compilers and kernels were written with these features in mind, they would be insanely helpful. This way, however, it's just a back door wide open.

I used to work for a processor emulator tools company called Applied Microsystems Corp, Redmond WA. now defunct.Up thru processors type 68040 emulations tools could be mounted external to the processor chip and performed the functions mentioned ( hardware breakpoints, memory maps, all register shadowing, soft and hard breakpoints, etc, all the things that you need to perform basic computer system development. As the complexity of the systems increased beyong those early 8/16/32 bit cpu, all those hardware f

NSA key? Generally speaking, if a company spends money on something it expects a return. Putting those "debug" features into silicon costs money so why don't they advertise them? I only see three likely reasons: security (can't be secured), doesn't work (oops!), government backdoor.

I only see three likely reasons: security (can't be secured), doesn't work (oops!), government backdoor.

I see at least one more: intended for internal testing, not guaranteed to work the same on all processors, AMD's not interested in guaranteeing that it works the same on all processors, and AMD's not interested in dealing with users whining that it doesn't work the same on all processors.

Such debug features circumvent any lower-level security completely.Even if the debug features can only be enabled from hardware (avoiding the obvious malware risk), then existence of such features is fundamentally incompatible with existence of such issues as secure drivers (like HDMI encryption or copy checks on DVD's, securom style), DRM even by use of hardware dongles, etc.

That is a very interesting point. I recall that when I did some work programming directly to a processor in Assembly, while using the debug mode I was able to break the execution and change any register to another value and then resume it. I know that this was just a school project and the processor I used isn't as complex as an AMD processor is but if the same principle applies, then your comment makes a very good point.

Anything to do with paging and interrupts could be a security vulnerability - some kernel processing has to go on in order to update process states. Being able to interrupt the process at the point just after getting kernel ring permission and before interrupts are disabled would be a dream, but probably just a theoretical one. Usually, it would be an atomic process, you couldn't do one without the other, but with these instructions, who knows...

Probably because then they'd have to fully document the features and test them thoroughly on each new chip, which would likely cost them quite a bit more than developing the features in the first place.

They would also be saddled with supporting backwards compatibility in future chips, since it becomes hard to remove publicly-accessible features in a CPU once they've been added.

One of my pals at NVIDIA was talking about this in a generic sense. Evidently, all of the big design houses have reverse engineering departments where they scrape down to the silicon and get things running. They never make any public info, but it's crazy what kind of logic blocks they find on silicon.

These exist on "all processors" as ways to test the processors and increase yield cheaply. The moment that the engineering samples go out, competitors get their hands on them, and it's only days or weeks before they figure out what's really going on. Kind of cat-and-mouse.

By the way, here's a guy who does this in his spare time [bunniestudios.com]. He may not have the $10+ million budget that the big boys have, but it should give you a little context as to what really happens in industry.

(As the original article was instantly slashdotted, I can only guess that the AMD exploit was found through software avenues.)

They never make any public info, but it's crazy what kind of logic blocks they find on silicon.

Sometimes scraping can tell simpler things, like an accurate estimate about how much profit a company is making on a chip, and thus how much money the company will have to invest in its next generation of chips.

yea, why stories continue to be posted with direct links instead of using things like coral cache is beyond me. If you KNOW the site you are going to link to can't handle a slashdot load then DON'T LINK DIRECTLY TO IT.

Of course this does not include sadistic evil people who enjoy watching websites crash and burn (probably a sizable but not large percentage portion of the slashdot community)

Ah yes, I've seen this before...the typical way this is done is to hide the article behind at least three blog posts, thereby decreasing the chance that anyone will actually RTFA.I believe that's how it usually goes right?

Since TFA is down by now, and I can't get the exact details... does this mean that any program running and setting the right bits in the right registers can get "processor root" access to everything the processor does, irrespective of any security constraint the OS may place on that process?
Oh dear

It's the OS responsibility to ensure that normal applications can't simply do whatever they like directly to the hardware, including the CPU.

Even though it's the OS's responsibility to ensure normal applications can't simply do whatever they want, the CPU needs to provide the necessary functionality. If the CPU allows writing to some register and provides no method of protecting that write, and that write causes anything that normally would not be allowed, then the OS can do bugger all about it.

That's true, but irrelevant. The debug mode doesn't do an end-run around the machine's entire hardware and software security stack as so many posters were implying. By the time you have a chunk of executable code on your machine trying to set specific registers to specific values, all of the security measures in place up to that point have failed. Malicious code is malicious code and it does not need special access to some obscure CPU feature in order to do damage.

Since TFA is down by now, and I can't get the exact details... does this mean that any program running and setting the right bits in the right registers can get "processor root" access to everything the processor does, irrespective of any security constraint the OS may place on that process?

Oh dear

Any program that can read and write to any processor register already has complete access to everything on your computer. The reason this is secret is not to protect your data, its to protect AMD's secrets.

Just hypothethically, given the information we have from the summary, what's the worst case scenario?

- The debugmode is worthy of its name, i.e. can bypass any ring and OS restriction
- It cannot be turned on or off in the bios or with a pin, since it is undocumented
- It is on by default
- The bit combination to set resides in usual working registers and can be triggered by usual computation by native code or in any bytecode interpreter (javascript, java etc.) of your choice when carefully targeting the

No, TFA said (before it went down) that some registers have to be filled with defined values through hardware means to enter debug mode. In short, you won't stumble into it. And neither will any harmful software.

It's not that I'm surprised. But you need to recall that AMD chips a goodly chunk of data/hosting-center cores, which run many clients on the same machine...
AMD will need a very good indemnification clause to wind their way out of that dammage responsibility.

To state the obvious: Chernobyl, or Czernobyl as is referred to in the polish language, is a very well known nuclear disaster site. Those crafty Polish are starting to make a name for themselves in the computer industry.

I'm actually surprised to find out that everyone's surprised. I've been hacking routers and now work for a telco surrounded by disassembled set-top boxes, and both have serial and JTAG interfaces abundant. Many require soldering, so in that respect it's "hidden" from customers.
Maybe:
- It's often more expensive to engineer these things out of the test systems to ready for production
- and just maybe it's still actually useful especially as you peer deeper into the GHz to get more performance from an existing design.

I'm just guessing, as the site is still inaccessible, but it sounds like this is a set of debug functionality beyond what you'd get with the normal debug registers or with a JTAG interface. AFAIK modern desktop/server processors still have JTAG interfaces (not just SoC, embedded type processors). Sure JTAG interfaces are often 'hidden' as you say... maybe there's a footprint there but you have to solder on some flying leads or a connector.. but without knowing about these new registers you still wouldn't be

I'm not suggesting this is a JTAG interface, perhaps my title is misleading. I'm suggesting it's "hidden" in the same way these hardware debug interfaces (both standardised like JTAG or other more obscure interfaces) appear "hidden" to people who don't do hardware/firmware mods. As I mentioned in the first line, I'm surprised everyone's surprised, these are immensely complex parts that sometimes need a root-of-roots, this sounds like just the thing AMD or any other manufacturer would have designed in.

I'd say password protected and undocumented is far more hidden than a unpopulated footprint marked 'jtag' (I know I know, not all hardware debug i/faces are always that obvious either:-)But yeah, no one should be particularly surprised... these are ridiculously complex chips and would be impossible to develop and debug (the chip that is, not software for it) without extra hidden circuitry.

It's not the same thing. Virtually every microcontroller has JTAG support and nobody would be surprised to find a JTAG interface in an embedded device. It would be very well documented in the datasheets. It's no big deal to find an unpopulated serial or JTAG header in a production device. These aren't manufacturer secrets -- they are well-known debugging interfaces provided for the benefit of the device developer.

AMD's proprietary debugging features are a different story -- features not intended for the

So this could be enabled in the kernel and/or gcc/gdb in the near future? That would be convenient for debugging. Unfortunately the link is still blocked, so the details on enabling this is still hidden.

Based solely on the Google cache of the forum post describing this (linked above), there's no need to go into hysterics. For hardware and systems geeks, this is very cool. It's an extension of the existing x86 debug registers (DR0-7) that allows you to set a debug watchpoint that only fires when specific data is loaded in.

There are a lot of researchers and tool builders that would love to have this because it would allow them to take a watchpoint fault whenever they only when they have a specific value from a specific location. For instance, let's say that every so often you get a null pointer exception at a specific address. However, if you current go into gdb and set 'watch 0x{address}', you're going to take a breakpoint every single time that pointer is accessed.. Wouldn't it be great to do something like 'watch 0x{address} NULL' and only stop your debugger whenever 0 gets written into that address?

That's what the forum posts imply, at least. "Guys, I've reversed this in part... breakpoints defined in DR0 can be made to fire only on data match (under optional mask), plus masking of any or all of 12 low address bits ! Works also for I/O break points, provided CR4_DE is set, of course !"

I would wager that this is not a large security concern. Access to DR7 is restricted to ring 0, and therefore enabling debug breakpoints must be done by the operating system. While extremely interesting (I wish I could read more!), Czernobyl appears to be describing a modification to debug breakpoints that are already enabled.

Sure, but it's much faster to do it in hardware. This is the whole reason data watchpoints exist (See, for instance, the paper "Some Requirements for Architectural Support of Software Debugging" by Mark Scoctt Johnson from ASPLOS-I), as you could technically have your debugger put address & data checks around every memory access, but that leads to completely unacceptable overheads. It's faster to let the hardware check the addresses in parallel with regular execution and take a fault only if you touch the watchpoint.

Similarly, if the hardware will check the value before taking a debug interrupt to the kernel and subsequently signaling/scheduling of the debugger, it will be much, much faster than performing all that and then have the debugger check the address & throw this particular interrupt away before continuing execution. That constant interrupt cycle can cause 10,000x or more slowdowns if you're constantly accessing a value & taking bad watchpoints on it.

If you are an application developers, I would agree with you. Any decent debugger should allow you to set a conditional breakpoint, but I am not sure if you can say that for kernel debuggers which are very different animals typically.

Oh, and the summary's description, "hardware data-aware conditional breakpoints, and direct hardware 'page guard'-style breakpoints", matches up with the line I copied & pasted from the forum post. I previously described the "hardware data-aware conditional breakpoints"where you can make hardware take a fault if an address of a memory operation is matched && the value of the memory operation matches. Looking through my notes, embedded Power ISA (Book III-E) processors also let you set value-dependent watchpoints using the Data Address Compare (DAC) Registers. I'm not sure about other ISAs.

The second party of the summary's statement refers to to 'page guard'-style breakpoints. This is referenced by Czernobyl's "masking of any or all of 12 low address bits". Again, this is a very interesting extension of the x86 debug registesr, which only allow debug watchpoints of size 1, 2, 4, or 8 bytes (and the latter only in certain microarchitectures & modes) However, by masking out the low 1--12 bits of the address into don't-cares, it's possible to set watchpoints anywhere from 1-4096 bytes, limited to powers-of-two and size-alignment. This is cool from an x86 standpoint, but ARM, MIPS, and Itanium (off the top of my head) already do this.

Suffice it to say, the stuff that Czernobyl found is very cool in relation to x86, especially if these facilities were officially released to the public at any point in the future. However, it's very unlikely to cause any kind of AMD-only viruses or other scary security concerns. These features exist on other ISAs without any kind of world-shattering problems.:)

However, if you current go into gdb and set 'watch 0x{address}', you're going to take a breakpoint every single time that pointer is accessed.. Wouldn't it be great to do something like 'watch 0x{address} NULL' and only stop your debugger whenever 0 gets written into that address?

If you can set a hardware breakpoint, then you can do the conditional in software, only slower. So this sounds like hardware acceleration, which is very useful, as it can make something go from way too slow to no speed hit at all.

Methinks this could be very useful in defeating many types of DRM. I'm thinking in particular DRM implementations similar to CSS, AACS, BD+, etc. Could this spell the end of DRM for once and for all? One can hope! Any experts care to elaborate (I'm no software developer nor a CPU engineer)?

I can think of many reasons why it might be hidden. For example, it may be hidden because the cost of supporting it would outweigh the benefits of admitting the "feature" is there. I don't just mean in terms of documenting it and releasing that info for developers, I mean in termins of testing it for security reasons. Plus, let us say that a theoretical bug is found that creates a hole someone can exploit - is it patchable? It's a whole can of worms AMD may be right to avoid opening.

I can think of many reasons why it might be hidden. For example, it may be hidden because the cost of supporting it would outweigh the benefits of admitting the "feature" is there. I don't just mean in terms of documenting it and releasing that info for developers, I mean in termins of testing it for security reasons.

...and, if it's documented as an architectural feature rather than a feature of a particular processor or line of processors, guaranteeing that it works the same in future processors, even if they have a different microarchitecture. (And even if you explicitly document it as a feature of a particular processor, if you don't implement it the same way in your next generation of processors, somebody will probably have ignored the "this is a feature specific to the Phenom 666" note in the documentation and wri

My question is, is this actually hidden? Stuff like this is usually in the data sheets. So, does anyone have access to the actual processor data sheets? I didn't find them on AMD's site, just stubs containing the first few pages.

For example they did not properly document and release the docs on the hardware RNG in their first chipsets when it came out. As a result it ended up supported only on Linux on a "friend-of-mine" basis and MSFT (on whatever basis). The other OS developers did not know about it for a while (more than half a year). I remember personally telling Theo De Raadt on BUGTRAQ at the time to stop talking rubbish that AMD does not have a hardware RNG and he was genuinely shocked. However th