Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

illiteratehack writes "10 years ago AMD released its first Opteron processor, the first 64-bit x86 processor. The firm's 64-bit 'extensions' allowed the chip to run existing 32-bit x86 code in a bid to avoid the problems faced by Intel's Itanium processor. However AMD suffered from a lack of native 64-bit software support, with Microsoft's Windows XP 64-bit edition severely hampering its adoption in the workstation market."
But it worked out in the end.

Most programs still don't need to work with numbers larger than 4 billion on a regular basis, so native 32-bit ints are just as fast as native 64-bit ones.Most programs still don't need to map more than 2GB (not 4GB; in fact not even quite 2GB) at once, so there's no pressing need for 64-bit pointers.

Software does take advantage of the fact that you can fit twice as many 32-bit values into the standard x86 registers if the registers are 64 bits wide, in the same way that you can stuff two 16-bit ints into EAX on a 32-bit system if you want to. However, the performance gains from doing so end up in conflict with the reduced cache coherency of larger binaries (bigger instructions) and possibly larger (less well-packed) data, resulting in more frequent cache misses. That's why the perf gains are typically very modest, although it really depends on the application.

Where 64-bit does become really valuable is working with very, very large amounts of sequential data (want to allocate a 10GB array? Can't do that on x86, no way no how). That's hardly a typical requirement right now (although I wrote a program a few weeks ago that needed to do it). However, it's getting closer. Additionally, while clever memory mapping can allow a 32-bit process to access over 4GB of RAM (just not all at the same time), there is a (small) performance impact associated with the need to be constantly re-mapping that memory.

The other area where 64-bit really helps is with security, specifically exploit mitigation. High-entropy ASLR in recent versions of Windows and some other OSes randomly places 64-bit aware executables and their various data regions across their entire 64-bit address space. This not only makes it completely impossible to correctly guess the address of any given bit of code in memory, it also makes spraying (heap spray, JIT spray, etc.) attacks completely infeasible; to cover even a tenth of a percent of the address space, you'd need to spray 16 million gigabytes of data. That's not only quite impractical at modern CPU speeds (even on a blazingly fast CPU and done in parallel, it would take a week or more), it also is far more memory (physical or virtual) than any modern computer will be able to allocate.

And for those that want the best of both worlds, there is the x32 ABI, which uses all the good stuff from x86-64 (more registers, better floating-point performance, faster position-independent code shared libraries, function parameters passed via registers, faster syscall instruction... ) while using 32-bit pointers and thus avoiding the overhead of 64-bit pointers.

They're working on porting Linux to the new ABI...kernel and compiler support is there, not sure about all the userspace stuff.

kernel and compiler support is there, not sure about all the userspace stuff.

Just debootstrap it from Daniel Schepler's repository [debian.org]. Most of the work has since moved to official second-class repositories (AKA debian-ports), but because of the freeze, you want both, So after debootstrapping, echo "deb http://ftp.debian-ports.org/debian [debian-ports.org] unstable main" >>/etc/apt/sources.list and you're set.

And this is something people who've worked on RISC chips have known for ages. The x86 system architecture is essentially stuck in the early 80s. The 386 was just a simple extension on top of 286 model, nothing really fundamentally changed, you still had limited number of registers each with at least one specialized purpose. Maybe MMX and similar stuff fixed that but you couldn't rely on everyone's PC to have the instruction set you compiled it for.

Intel was stuck supporting a very popular CPU with an instruction set that they knew was outdated, and they even tried having replacements for it that failed to gain acceptance. The reason this Opteron caught on was because it was backwards compatible with x86, not because it was the first thing to try to break out of the mold. And 386 was designed to be compatible with 286, which was designed to be compatible wiht 8086, which was designed to be compatible with 8085, which is compatible with 8080, which is compatible with 8008, which is compatible with 4004, which was the first commercially available microprocessor... (and all of those retain the original accumulator A register)

Software does take advantage of the fact that you can fit twice as many 32-bit values into the standard x86 registers if the registers are 64 bits wide, in the same way that you can stuff two 16-bit ints into EAX on a 32-bit system if you want to. However, the performance gains from doing so end up in conflict with the reduced cache coherency of larger binaries (bigger instructions) and possibly larger (less well-packed) data, resulting in more frequent cache misses. That's why the perf gains are typically very modest, although it really depends on the application.

You're arguing on the correct side, but what you wrote here is badly flawed. Packing multiple 32-bit values into a 64-bit register is near worthless, what is valuable is amd64 gives you twice as many general-purpose registers (that also happen to be 64-bits wide). A far bigger gain for 64-bit on x86 was the addition of full relative addressing. Instead of 32-bit jumps always being to absolute addresses, in 64-bit mode software can do addressing relative to the program counter. This helps a great deal with libraries, since instead of needing large relocation tables, they simply use relative jumps that are valid no matter what address the library is loaded at. With most processors using 64-bit mode loses performance due to having to shuffle more data around, x86 is about the only one that gains performance.

PAE is more or less old school segmentation. You can't say 'it has a 3% slow down' because it has 0 slowdown if that particular page is already in memory, and if not... it has the same 'slowdown' as an other paging operation plus a fixed number of cycles. So if you're dealing with tiny amounts of 'more than 2/3gb' then the overhead is a lot higher than if you're mapping out 2GB on every window change. PAE is just another form of paging. It is slower, but you're making numbers up from nothingness.

The interger math performance of the processer has nothing to do with it being 64 bit. Most (All now?) x86-64 processors internally will process 2 32 bit numbers in the same span as a 64 bit number if properly optimized by sending the 32 bit values through together. 64 bit code using less than the OS max for 32 bit code is actually slower than 32 bit code due to the increased pointer sizes wasting the processors registers filling them with 0s.

You really have no idea how processors work. While nothing you said is illogical, it is still in fact wrong in every account. Under the hood, processors don't work anything like they do on the surface.

Other processors also do other weird things. I have an 8 bit CPU that can handle 32 bit numbers in a single clock cycle, exactly like it does 8 bit numbers... and the neat thing... it can do 2 16 bit numbers in a single clock cycle! Why? Because the processor as I see it from a software developers perspective isn't anything like the actual hardware doing the work. Processors have translation units in front of them to provide you with one look while allowing themselves to rewire the backend in all sorts of different ways.

x64 has twice as many registers. That alone means less having to move stuff in and out of memory, so that will improve the speed when compared to 32 bit applications. 32 bit x86 has only 4 truly general purpose registers. x64 adds another 8 64 bit registers.

The annoying thing being that an x86-64 processor in long mode can, in fact, run 16-bit protected mode code (like essentially all actual Windows 3.x programs) with the same compatibility sub-mode that runs 32-bit code. It's merely that Microsoft decided they didn't want to bother supporting it.

That this can be done is easy enough to prove; take a Win16 app and run it in WINE on 64-bit Linux.

We don't even have true 64-bit x86-64 processors yet. While programmers are told to* treat pointers as 64-bit in the current implementation (reffered to as a "48-bit implementation" there are only 47 usable bits for user-mode pointers**. That is enough to map 128 terabytes to one process, afaict the most ram you can currently get in a PC architecture machine is 2 terabytes.

If we assume the largest available memory size doubles every 1.5 years and we want to be able to map all the memory to one process then we have 9 years until the current implementation is used up and another 24 years after that before a "full 64-bit" (with one bit used to distinguish between kernel and user mode) implementation is used up.

* Of course just because programmers are told to do something doesn't mean they will http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=642750 [debian.org]** A 48th bit is used to differentiate kernel and user addresses. The number is then sign-extended to produce a 64-bit number.