Big news from ARM over the past few days. The processor architecture, once strictly an embedded affair for low-power devices, is going big. Not only has ARM announced it's going 64bit, HP has announced it's going to build servers with ARM processors. It seems all the pieces are now in place for ARM.

Some implementations of ARM SoC already have hardware crypto engines present, since they are primarily embedded processors it entirely depends on what the package was designed for in the first place.

However, when it comes to lower power non x86 server processors i wonder why MIPS hasn't made more of an effort? There have been 64bit MIPS processors out there for years so its a tried and tested platform with existing compiler and operating systems support and where compatible hardware for development/testing is a very cheap ebay away... Whereas a 64bit extension of ARM will necessitate writing all these things, and most developers won't even be able to get their hands on the kit for quite some time.

when it comes to lower power non x86 server processors i wonder why MIPS hasn't made more of an effort? There have been 64bit MIPS processors out there for years so its a tried and tested platform with existing compiler and operating systems support

Just wait a few years, I suppose. Loongson is MIPS, starts getting some adoption apparently - and China is supposedly betting on it big time, to have technology independence.

Just by the virtue of their massive internal market, and how they make... pretty much everything, Loongson adoption should "spill over" to other regions, I guess?

There are the chinese Loongson 64 bit cpus for both Laptops and Desktops. However, it seems that frequency is low, they are 65nm technology and the performance is rather poor. It seems that the chinese aren't pushing them far enough performance wise. As they plan to build the fastest computer using Loongson CPUs, I think they will accomplish this by cramming a very large number of CPUs together.

I am not informed as to what ARM motherboards use as their firmware, but I am sure someone more enlightened will be here.

Is it a BIOS, UEFI or something else?

Could it be that if Microsoft succeeds in their plan of limiting non-Windows OS:s by (ab)using some UEFI mechanisms, we could all simply switch to ARM?

I don't think there is a mandated one given that I've heard rumours of hardware vendors opting for UEFI at times whilst other times using CoreBoot with various payloads (UEFI support being one of those payloads). With the development of ARM v8 it not only adds fuel to the fire regarding Microsofts relationship with Intel (whether things are as close as they once were) but whether the rumour of Apple moving to ARM CPU's in the future holds any water given that we're probably at least 2-3 years away from seeing an ARM v8 product appearing in a laptop/desktop/server etc. from a tier 1 vendor:

Hardware designers will have to wait a little longer. ARM plans to release the first chip designs that support ARMv8 next year, and expects prototype systems using these designs to emerge in 2014.

So in the case of Apple it might be 2015 that we see an A7/A8 chip appear in laptops. Maybe HP's involvement goes beyond just servers as the article claims and might include desktops, laptops and so forth - the ability to maybe recover some margins and turn around their PC division.

ARM is completely non-standardized as far as the platform goes, although a lot of applications use U-Boot.

And by "completely non-standardized", I mean, compare to 68000 in the 80s.

The Sun-1, 2, and 3, the Amiga, Macintosh, Sinclair QL, Atari ST, Sega Genesis, Apollo workstations, and HP 9000 workstations were all 68k-based, but (except for later Apollos and HP 9000s) none of them booted the same way, and none of them had hardware that was even close to similar, or even in similar places in the memory map.

I am not informed as to what ARM motherboards use as their firmware, but I am sure someone more enlightened will be here.

Is it a BIOS, UEFI or something else?

Could it be that if Microsoft succeeds in their plan of limiting non-Windows OS:s by (ab)using some UEFI mechanisms, we could all simply switch to ARM?

It's not "PC BIOS" or UEFI, or OpenFirmware or anything else. It's complete chaos. Typically the manufacturer creates/installs firmware specifically for the embedded device they happen to be making at the time.

This continues to be the single largest problem with ARM - there is no standardised firmware (and no standardised way of doing hardware auto-detection for anything that isn't USB or PCI).

This means you can't create one generic OS for ARM that works on all ARM systems and continues to work for future ARM systems. Instead you end up creating a special version of the OS to suit each specific ARM system (and then hoping the manufacturer installs it, because the end user can't). It's also why Microsoft only support about 5 of the many different ARM systems; and why you'll never see "Windows for ARM" on a shelf in a computer shop where people can buy it and install it on whatever ARM system they happen to have.

For embedded systems it's probably a good thing, as it reduces hardware costs and the manufacturer doesn't want the end-user to change the software anyway. For general purpose laptop/desktop/server systems (where the manufacturer makes the hardware, and the end-user decides which OS/s to install when) it's a show stopper.

It's not the only problem though. For PCs (desktop/server) hardware there's a bunch of standards relating to hardware that includes motherboard form factors, power supplies, cases, etc. This means the end user can replace/upgrade any component, and also means that a computer manufacturer can get "off the shelf" parts from 20 different companies and put them together as a complete system. It's this "componentization" that makes PC compatible (desktop/server) cheap and flexible (and not just for the initial purchase).

Until there's usable standards (for everything, not just firmware), ARM will never go beyond the embedded market (which includes phones and other disposable gimmicks), regardless of what the CPU/s themselves are capable of.

Ironically, ARM can use "PC compatible" standards to compete against the "PC compatible" market - ARM manufacturers could adopt UEFI, ACPI, the ATX form factor, the ATX power supplies, etc to become a standardised platform fairly quickly.

I have a question, what would it take to place one or more arm processors on an existing PCI bus infrastructure and use existing PC components for the rest of the build?

Of course the drivers would need to be recompiled or rewritten, but hardware wise is there a reason this would be bad? Is there something about ARM processors which rules out efficient use of existing commodity hardware like ram/video/ethernet/power supplies?

As for the BIOS services, it's true that we need a way to identify/probe the hardware in ways which are specific to the mainboard. However for the most part bios services are only used in the bootloader, after which modern operating system specific drivers take over.

It seems like it would be pretty easy for ARM manufacturers to provide a standard boot loading firmware for their hardware, which end users/operating systems would be free to use or not. Just look at how trivial the x86 bootloading is (just as an example, ideally it'd be a bit more sophisticated though).

The PCI and PCIe buses can be used with any compatible processor architecture. I'm aware of SPARC64, PowerPC designs that support PCI devices.. including ARM.

Intel's IOP321 is an "XScale" ARM design that included a PCI-X bus, for example.

As Brendan said, if an ARM motherboard in ATX form factor was released.. they could use the same conventional hardware, and even memory.

Most open source operating systems treat the PCI bus as platform independent and have glue for PCI hosts controllers. This allows PCI device drivers to be portable, assuming the developer takes into account Endian, alignment and 32/64-bit portability issues.

So if the firmware was standardized, or at least the boot procedure.. a common "ARM PC" port for many operating systems could be quite trivial.

The x86 starts up internally by jumping to the end of addressable memory, which is a ROM address containing a jump instruction to the beginning of bios code.

For instance:
The 8086 jumped to 0xffff0.
The 286 jumped to 0xfffff0.
The 386 jumped to 0xfffffff0.

It would be pretty easy to write an OS loader like grub in this environment without any bios, obviously with direct IO for things like IDE controllers.

Another idea would be to simply compile a minimal Linux kernel, for this mainboard (since the device drivers are already implemented there), and then flash it into the CPU reset address. This kernel could have a basic UI and settings, but it's main purpose would be to "kexec" the real OS.

This would be the ultimate solution in terms of flexibility. Much better than either BIOS/UEFI IMO if the source code is open (and it would be, since it's GPL).

I'd be fairly confident I could personally implement it in a few weeks if it were on x86, but unfortunately my knowledge falls off sharply for ARM. I don't see why ARM would be that different though. The biggest hurdle would be one of standardization, rather than implementation.

This would be an excellent opportunity to establish an open firmware standard for ARM desktop systems - before proprietary implementations take over and we end up being stuck with stock firmware like in x86 world.

"Linux as firmware has already been done, c.f. LinuxBIOS, now known as Coreboot"

Yea, but that's x86 where unfortunately proprietary bios solutions have already "won" over open solutions. Coreboot doesn't support any of my mainboards (I checked in the past).

Linux as firmware was attempted, and mostly failed because Linux is far too big to fit in firmware. Even if this wasn't a problem, Linux as firmware would still be a major disaster as there's no standardised driver interface.

Coreboot on it's own is good for chipset initialisation, but doesn't provide a standardised environment that other software can rely on, and is therefore useless on its own. Coreboot isn't designed to be used like that. Coreboot is designed to start a "payload", where the payload is typically a standardised environment that other software can rely on - a "PC BIOS" payload or a OpenFirmware payload, a GRUB payload or a UEFI payload. You wouldn't want a "PC BIOS" payload on ARM (the "PC BIOS" standard/s were never intended to be portable). GRUB has no standardised environment other than multi-boot, which wasn't intended to be portable either ("multi-boot 2" which is intended to be portable seems stillborn). That only leaves "Coreboot with OpenFirmware payload" and "Coreboot with UEFI payload". In both of these cases you'd be adopting a standard (OpenFirmware or UEFI) and how it is implemented (e.g. on top of coreboot, or without coreboot) wouldn't matter.

I don't know much about OpenFirmware (but it looks like it'd work fine).

Microsoft would probably push for UEFI on ARM; and I'd guess device manufacturers would rather just provide one boot-time driver in "EFI byte-code" rather than one in "EFI byte-code" for 80x86 and another in "Fcode" for OpenFirmware.

This would be an excellent opportunity to establish an open firmware standard for ARM desktop systems - before proprietary implementations take over and we end up being stuck with stock firmware like in x86 world.

I like open source, and wish some sort of open source firmware was realistic. In my experience it's not - none of the open source projects seem capable of defining a formal standard (which is completely different to implementing something and accidentally ending up with an ad hoc "standard").

Not so fast. The chip-set is what needs fairly sophisticated initialization specific to particular silicone. In addition, every kind of ram since SDRAM needs some timing calibration after every bootup.
Without it nothing works at all.

"Not so fast. The chip-set is what needs fairly sophisticated initialization specific to particular silicone. In addition, every kind of ram since SDRAM needs some timing calibration after every bootup.
Without it nothing works at all."

You are completely correct, however I don't understand where you see a contradiction?

This means you can't create one generic OS for ARM that works on all ARM systems and continues to work for future ARM systems. Instead you end up creating a special version of the OS to suit each specific ARM system

Which really is not a problem, since nearly no users do install OS. Users use whatever the PC/device comes with, they only install applications. And sometimes even accept (vendor) provided OS updates.

And the OS handles the hardware abstraction, making the hw differences a non issue for the applications.

It's not the only problem though. For PCs (desktop/server) hardware there's a bunch of standards relating to hardware that includes motherboard form factors, power supplies, cases, etc.

Form factors and power supplies are trivial. And the tighter integration of ARM devices with less need for external circuitry, will make the motherboard PCB much simpler than the densely populated x86 boards.

Adding one or more standardized buses like PCI, PCI-E variants are also fairly trivial. You can actually get loads of ARM boards with such slots today.

I've been wondering about this too. Like everyone's saying: currently it's a mess with hardware-specific boot loaders and kernel builds. Which in the embedded world isn't necessarily a bad thing. Imagine if it took your phone as long to start up as your traditional BIOS-based desktop which wastes time doing unnecessary things for the sake of compatibility.

Up until now there hasn't been a real reason to standardize. End-users aren't installing operating systems on ARM devices so it does't matter how it gets there.

But that's all changing. We're getting Windows for ARM, ARM processors are rapidly approaching general-purpose desktop performance, and with servers on the way, admins are going to want to install their Linux distros of choice.

I think, in terms of booting, we're going to see UEFI on ARM. It's already in the UEFI standard, and Microsoft seems to want it on PCs at least.

I used to be excited about the growing marketshare of ARM, thought it'd be a time to start over. But now I think we're going to see more of the same crap... or worse.

The highlights seem to be when going to AArch64 that there's now 31 general purpose registers and an improved exception model.

There seems though to be two losses when going to AArch64 when compared with AArch32. The first of these is the loss of the "M" version of load and store instructions. Those would allow you with a single instruction to store multiple registers. It makes sense that these instructions have been lost since there's now double the number of registers, so there's no longer space in a 32-bit opcode to define storing/loading of the complete register set. Given the speed at which modern CPUs operate, the depths of pipelines, and the cost of going out to RAM, the new replacement "P" version of load and store instructions will likely result in little if any difference in performance compared to the "M" version. There's just a cost in terms of needing more instructions to save out multiple registers.

For me, the great beauty of the AArch32 instruction set was that every instruction was conditional. That meant that ARM code would have far fewer branches than comparable code for other architectures, and thus no branching penalties on execution. This change will dictate that AArch64 code will require many more branches than AArch32 code and thus be less efficient.

I guess that change explains why ARM put in a lot of effort into branch prediction logic in the Cortex-A7, since they'd need such logic for an AArch64 CPU.

The highlights seem to be when going to AArch64 that there's now 31 general purpose registers and an improved exception model.

There seems though to be two losses when going to AArch64 when compared with AArch32. The first of these is the loss of the "M" version of load and store instructions. Those would allow you with a single instruction to store multiple registers. It makes sense that these instructions have been lost since there's now double the number of registers, so there's no longer space in a 32-bit opcode to define storing/loading of the complete register set.

For me, the great beauty of the AArch32 instruction set was that every instruction was conditional. That meant that ARM code would have far fewer branches than comparable code for other architectures, and thus no branching penalties on execution. This change will dictate that AArch64 code will require many more branches than AArch32 code and thus be less efficient.

More conditional instructions means more branches, not less. So does that now make ARM uglier than other architectures?

More conditional instructions means more branches, not less. So does that now make ARM uglier than other architectures?

That is a confusing statement. Instruction predication generally reduces the number of branches in the code, so the parent is correct.

Predication is nice for in order micro-architectures and can reduce the size of the code, but it does not really make sense for out-of-order micro-architectures which the ARM64 ISA is presumably targeting since Cortex A9 and A15 are both OoO already.

More conditional instructions means more branches, not less. So does that now make ARM uglier than other architectures?

Most CPU architectures I've looked at besides ARM only allow for conditions to be checked on branch instructions. Thus on such CPUs all conditional code requires a branch.

The 32-bit ARM instruction set lets every instruction be conditional. If the condition is not matched the instruction turns into a no-op. This means that you'll avoid branching for a large number of cases - you'd only branch when you really need to.

The reasoning behind this on 32-bit ARM is that, in general, a great deal of conditional code tends to just be a few instructions long. The cost of a no-op is a single clock-cycle, whilst the cost of a branch could be dozens of cycles. You're also often saving two branches, rather than one, since there's no need to branch back.

Most CPU architectures I've looked at besides ARM only allow for conditions to be checked on branch instructions. Thus on such CPUs all conditional code requires a branch.

The 32-bit ARM instruction set lets every instruction be conditional. If the condition is not matched the instruction turns into a no-op. This means that you'll avoid branching for a large number of cases - you'd only branch when you really need to.

The reasoning behind this on 32-bit ARM is that, in general, a great deal of conditional code tends to just be a few instructions long. The cost of a no-op is a single clock-cycle, whilst the cost of a branch could be dozens of cycles. You're also often saving two branches, rather than one, since there's no need to branch back.

The issue with all instructions being conditional is that they end up modifying condition flags, which then creates dependencies between instructions and makes creating an out-of-order microarch difficult (perhaps this is why they reduced the number of predicated instructions?). The in-order arch with predicated instructions may well turn out to be slower than the out-of-order arch with branch cost amortized via a decent branch predictor and branch target buffer.

The issue with all instructions being conditional is that they end up modifying condition flags, which then creates dependencies between instructions and makes creating an out-of-order microarch difficult (perhaps this is why they reduced the number of predicated instructions?). The in-order arch with predicated instructions may well turn out to be slower than the out-of-order arch with branch cost amortized via a decent branch predictor and branch target buffer.

With the exception of direct comparison instructions, ARM instructions only update condition flags if you include the "S" flag in the instruction. When writing conditioned ARM code you bear that in mind.

You are right - using conditioned instructions creates dependencies. That's inevitable. But aren't such dependencies the kind of thing that chips with multiple parallel execution units have been handling since they first appeared? Part of what branch prediction units have to do? And indeed what any out-of-order architecture has to deal with?

It should be noted that ARM Cortex-A9 and ARM Cortex-A15 are both multi-dispatch out-of-order chips. So whilst it's difficult, it's not impossible. Cortex-A9 outperforms it's in-order predecessor the Cortex-A8.

I don't know the make-up of the 64-bit ARM instruction set since it's not been published yet. My guess is that since the instructions are all 32-bits long, just like 32-bit ARM, and they have double the number of registers to deal with, the extra bit needed to specify a register simply ends up meaning there's not enough bits to make all instructions conditional.

Thanks for pointing me towards that discussion. Some of the people there clearly know more than they are allowed to tell. ;-)

I think I now understand better the arguments involved.

Removing conditional execution still appears to me to be a loss. I understand that there's reasons why it may be a good thing to remove, with a significant pay-off in simplifying logic and saving power. However even with branch prediction and multi-dispatch out-of-order execution I'd expect there will be a net loss of performance, since even a correctly predicted branch has an execution cost.

Thought provoking stuff.

My perspective on this is skewed from the fact that I got into ARM with the ARM2. The competition at the time was the 80386 and 68020, and of the three the ARM2 was the high performance chip. I tend therefore to be skewed more towards performance than power consumption.

Removing conditional execution still appears to me to be a loss.
I started with ARM7 (GBA), so I had the same thoughts as you. Now, B3D people almost convinced me in v8 ISA "goodness" =)
But ARM ISA was very unique and now they lost some features and turned mainly into "generic RISC".

It seems I am the only one unhappy with seeing how another architecture adopts the "bloat" philosophy.

The Arm arch. was a marvel to behold. Then they started to add stupid instructions with the ARM9 model, got ridiculous with the ARM11 (the 32 bits minivectors thingie), and got messed with the CortexA8 (the double way to perform a float op, the VFP one misteriously crippled, the NEON one just unnecesary).

The launch of a 64 bit architecture was a perfect oportunity to break with all the braindead decissions; just put 2 different cpu's in the same core, a pipe for old ARM32, and other one for a clean, new architecture.
Just share the biggest, more expensive blocks like the caches, the float muladds, vector permutes, and maybe the exception handling pieces. After a few years, the 32 bits part can be just emulated.

Unfortunately ARM is following the Intel/AMD path. Like eveybody followed the M$ way of doing things.
But remenber, ARM: Apple hits hard M$ just reckoning that crap is crap and doing something better. And you, ARM, hit hard Intel/AMD because of a cleaner achitecture.

Somebody will come in the future and will hit you just as hard, designing something less murky than your patched-away ARM64. Hopefully...

"That best-of-both-worlds situation is exactly what the x32 ABI is trying to provide. A program compiled to this ABI will run in native 64-bit mode, but with 32-bit pointers and data values. The full register set will be available, as will other advantages of the 64-bit architecture like the faster SYSCALL64 instruction. If all goes according to plan, this ABI should be the fastest mode available on 64-bit machines for a wide range of programs; it is easy to see x32 widely displacing the 32-bit compatibility mode. "

That kind of spaguetti is not really related about word size, is about having different instruction sets live together (the 32 bits x86 and the 64 bits x86 are really 2 different ISAs). The problem is having 2 different ISAs living under the same OS. There are several ways to accomplish that, but none of them nice. PalmOS lived with that, and also Apple.
The ugliest one I have known happened under PalmOS, when they started using ARMs (little endian) in substitution of the old 68k (big endian). The whole kernel & apps were emulated, but you could program direct to the ARM for speed. To call the OS, you were forced to work around the little endian / big endian issue, reversing any argument passed to the OS call.

Thanks for pointing me to XMOS. I didn't know the architecture, it looks interesting. Not in my line of thinking but interesting anyway.
Imho a modern, clean ISA should be centered into better FP integration with the ALU. Floating point types are very important today; in many CPU designs, FP appears as an afterthought.
I am undecided about the vector programming thing. The Cell SPU ISA (for example) is powerful and clean. It could be a good reference, but few programmers take the effort to vectorize their inner loops these days. In fact, few programmers even know what an inner loop is, these days.

It seems I am the only one unhappy with seeing how another architecture adopts the "bloat" philosophy.

The launch of a 64 bit architecture was a perfect oportunity to break with all the braindead decissions; just put 2 different cpu's in the same core, a pipe for old ARM32, and other one for a clean, new architecture.

So your proposal is to make a more expensive processor with no capabilities for running legacy 32bit applications natively?

You didn't understand what I wrote.
There are two ways to ensure backwards compatibility. Extend your old design (with the old mistakes), or include it in the same package.
Example: the Itanium included extra hardware for x86 compatibility. A revision later, x86 was entirely emulated by software, therefore removing the need for extra legacy crap in the desing.

You seem to be living in the desktop space and rarely looking outside. Who said anything about OSX?
Apple gets their money from the iphone/ipod/ipad set. M$ has been trying to enter this space for years, and still they are totally unable to make a dollar from it.
Here, some news for you: http://articles.businessinsider.com/2010-05-26/tech/29988890_1_ente...

About Intel trying to enter the low power / embedded market: we will see. They are still quite far, their designs are too inefficient, expensive, lack integration and components, and always come late to market.
All that despite Intel having the best existing fab technology.

At some point, ARM designs will be so complex that the x86 will be able to compete. At that point, a cleaner architecture should be able to blow both of them out of the water.

Although the ARM chip is now associated almost exclusively with mobile devices, it actually debuted in a desktop PC (Acorn Archimedes) in June 1987. Yes, it became the world's first mass-produced 32-bit RISC computer and for a year or so (1987-1988) was actually the world's best desktop PC until IBM PC/Intel caught up.

Fast-forward to April 2003 (yes, 16 years later) and we finally get 64-bit desktop (Athlon 64) and server (Opteron) chips. Intel just over followed a year later, but by this time, ARM was considered strictly a low-power (in both speed and energy) CPU for use in embedded devices. Sadly, ARM didn't follow AMD and Intel into the 64-bit space back then, which retrospectively may have been a mistake.

Now we move to 2007, when ARM finally twigged that 64-bit might actually be useful and they belatedly start development on extending their architecture. They've now lost 3-4 years on their potential rivals for 64-bit servers (and maybe eventually 64-bit desktops/laptops/netbooks/phones).

Reaching the current day, ARM finally announce that they will produce a family of 64-bit chips that they started developing 4 years earlier. I was fully expecting a launch date in 2012 (making 5 years of development and being up to 9 years behind their opposition), but *no* - the earliest we'll see them is in 2014!

It's simply too little too late - as we all know, it's the software that makes the CPU family successful. We currently have no 64-bit software for ARM and have very few server and desktop OS'es even running 32-bit ARM (most Linux distros have actually abandoned supporting even 32-bit ARM by now). They're betting the farm that Windows 8 on ARM will take off, but it won't run *any* legacy software at all (even by emulation) which was always a strong point of consecutive Windows releases.

In conclusion, I think ARM in either 32-bit or 64-bit form will remain limited to phones and maybe some netbooks. It won't get any market share on desktops or servers - AMD and Intel are just too far ahead on performance and OS/software availability for ARM to make inroads.

Fast-forward to April 2003 (yes, 16 years later) and we finally get 64-bit desktop (Athlon 64) and server (Opteron) chips. Intel just over followed a year later, but by this time, ARM was considered strictly a low-power (in both speed and energy) CPU for use in embedded devices. Sadly, ARM didn't follow AMD and Intel into the 64-bit space back then, which retrospectively may have been a mistake.

Consider that there were 64bit server chips (Alpha, PA-RISC, Sparc, POWER) before the Athlon, and they aren't viable competitors anymore. ARM introducing 64bit at the same time as the Athlon may or may not have also been futile. Perhaps they really made a wise decision doing what they're good at and capturing the mobile market in the process as they're now more relevant than Alpha, PA-RISC...

"Consider that there were 64bit server chips (Alpha, PA-RISC, Sparc, POWER) before the Athlon, and they aren't viable competitors anymore. ARM introducing 64bit at the same time as the Athlon may or may not have also been futile. Perhaps they really made a wise decision doing what they're good at and capturing the mobile market in the process as they're now more relevant than Alpha, PA-RISC..."

This move to 64bit has me a bit perplexed given ARM's market demographic. I think ARM needs to do alot of catching up to do performance-wise before it will be a serious contender to the desktop (or cluster farm), 64bit or not. Maybe ARM is doing this for forward compatibility so they don't have to cross the bridge later on. Frankly though, most desktop users still don't actually benefit from 64bit registers/addressing today, the high bits are mostly just wasted space. (yes of course I know AMD64 introduced other significant ISA changes as well).

That said, I can think of new OS designs which would make good use of 64bit addressing. For example, if the 64bit (or 48bit, etc) address space was actually backed by NV-RAM behind a large cache, then one could theoretically do away with disks entirely and have all files directly addressable in system RAM. It would eliminate the need to explicitly load and save contents to disk, files just stay in NVRAM even after power cycles. Of course we'd need a very robust OS with robust protections, but it would remove alot of bottlenecks currently incurred by file IO. Consider a web server/database/media player that doesn't need to do any file IO and who's changes are persistent.

Of course virtual memory allows us to emulate this with a hard disk or flash drive, but it really doesn't eliminate the IO bottlenecks.

That said, I can think of new OS designs which would make good use of 64bit addressing. For example, if the 64bit (or 48bit, etc) address space was actually backed by NV-RAM behind a large cache, then one could theoretically do away with disks entirely and have all files directly addressable in system RAM. It would eliminate the need to explicitly load and save contents to disk, files just stay in NVRAM even after power cycles. Of course we'd need a very robust OS with robust protections, but it would remove alot of bottlenecks currently incurred by file IO. Consider a web server/database/media player that doesn't need to do any file IO and who's changes are persistent.

You would still need to constantly flush/sync the contents of the fast (volatile) ram to the slower nvram (which is slower than a ssd). This is essentially the same as any file io today.

Unless of course you're saying that the nvram is memory mapped in the processor, which would really simplify file access but would be slower than loading the file to ram then saving it back to non volatile memory.

"You would still need to constantly flush/sync the contents of the fast (volatile) ram to the slower nvram (which is slower than a ssd). This is essentially the same as any file io today."

The problem with non-memory mapped NVRAM is that it needs to be explicitly transferred/synced before it can be used. Memory mapped NVRAM would be ready to use instantly. Having it memory mapped eliminates all the bottlenecks in transferring files all the way from the CPU to the south bridge. So maybe it's a hundred cycles instead of thousands.

"Unless of course you're saying that the nvram is memory mapped in the processor, which would really simplify file access but would be slower than loading the file to ram then saving it back to non volatile memory."

I did say it could be cached. If the CPU can address the file directly on the bus, as though it were ram, then there is no need to send multiple IO bursts along the bus for every single file IO operation.

Isn't everyone trying to grab a share of the smartphone and tablet market?

Also, are there now more netbooks/notebooks sold to consumers than desktops to end-users?

From what I remember of Apple's history, they have switched over CPU architecture every 10 years or so. It would not be surprising to see Apple switching completely to ARM-64Bit within the next couple of years - it's been almost a 10 year wedding to the Intel's X86 one.

Sadly, ARM didn't follow AMD and Intel into the 64-bit space back then, which retrospectively may have been a mistake.

Mistake? Nobody will spend millions to design a chip that no one wants.

They've now lost 3-4 years on their potential rivals for 64-bit servers (and maybe eventually 64-bit desktops/laptops/netbooks/phones).

They lost nothing. ARM acts on demand of their partners. ARMv8 development started much ahead of such a demand. 2013-2014 is a nice time for real products _with_ software.
It took ~5 years for x86 software makers to adopt 64bit.

- Yes, but they also supported Alpha, MIPS, Itanium and MIPS which are dead or almost dying platforms. So Windows coming to some architecture is not a guaranteed success. It may be or may be not. It may be a success on tablets but a fail for desktops. Also consider that many popular Windows apps may or may not come to ARM on Windows. Most likely apps like Photoshop, 3DS Max, Maya, games, Visual Studio and other resource intensive apps won't come to ARM.

ARM is coming to servers.

- Only to those servers that need heavy parallelization and performance it's not important. Once you need performance ARM is not a choice. So ARM will be present in some class of servers but will not be a universal solution as x86 is.

Apple will switch to ARM.

- Will not as for the foreseable future Intel CPUs will be orders of magnitude more powerful than ARM CPUs.

ARM is better than x86

- Specifically in what way better? They are better for phones and tablets as until Intel's Moorestown there won't be an ultra low power x86 CPU. Is it better for desktop usage or servers? I fail to see why. Many people claim ARM architecture is "superior". I fail again to see why. Many people like ARM because "it's not Intel." "Being not Intel" isn't a technical merrit on itself.

I am not a fanboy and I like to see the right tool being used for the right job. This means right now ARM for low power devices such as phones and tablets, maybe neetbooks and laptops but x86 for desktops and servers.

Some people want ARM to come to desktops see more competition in desktop market. This is a good reason but why not advocate for another x86 maker beside Intel and AMD, to see some competition.

VIA, SIS, Transmeta, Rise Technology, Centaur Technology, National Semiconductor, Cyrix, NexGen, IBM, UMC, NEC all did x86 CPUs at some point in time. VIA still sells x86 CPUs with limited success. I would love some strong competitors coming to x86 market too boost both research&development and price cuts. AMD, on itself , it seems that has troubles fighting Intel.

Or why not promote an entirely new, written from scratch, CPU architecture aiming desktop performance? Well, just from a philosophic point of view, as practically is just an utopia.

I mentioned earlier that I didn't think ARM was ready either (in terms of performance).

But I do think x86 platform has a number of problems. The foremost concern on everyone's minds is tons of legacy cruft, it increases complexity and wastes silicon but we still need to support it - things like MMX and a complex FP stack that overlaps with SSE, etc. Then there's all the various modes: Real, 286 Protected, Virtual Real, 386 Protected, 64bit Long Mode, VT extensions, etc. Each mode has it's own way of handling of interrupts, segment registers, faults, memory addressing, etc.

Then there are scalability concerns: the x86 cache coherency protocol is a severe bottleneck in large multiprocessor systems. Back in the 90s intel decided to make synchronization implicit, but this means every single memory access needs to be implicitly synced with other cores/processors. This is easily manageable with two processors, but as more are added implicit synchronization becomes a bottleneck in and of itself. Explicit synchronization would mean that the cores would not need to communicate between each other at all except when threads are signaling each other.