Posted
by
BeauHDon Thursday December 08, 2016 @08:00AM
from the two-is-better-than-one dept.

An anonymous reader quotes a report from AnandTech: Today at Microsoft's WinHEC event in Shenzhen, China, the company announced that it's working with Qualcomm to bring the full Windows 10 experience to future devices powered by Snapdragon processors. These new Snapdragon-powered devices should support all things Microsoft, including Microsoft Office, Windows Hello, Windows Pen, and the Edge browser, alongside third-party Universal Windows Platform (UWP) apps and, most interestingly, x86 (32-bit) Win32 apps. They should even be able to play Crysis 2. This announcement fits nicely with Microsoft's "Windows Everywhere" doctrine and should come as no surprise. It's not even the first time we've seen Windows running on ARM processors. Microsoft's failed Windows RT operating system was a modified version of Windows 8 that targeted the ARMv7-A 32-bit architecture. It grew from Microsoft's MinWin effort to make Windows more modular by reorganizing the operating system and cleaning up API dependencies. The major change with today's announcement over Windows RT and UWP is that x86 apps will be able to run on Qualcomm's ARM-based SoCs, along with support for all of the peripherals that are already supported with Windows 10. This alone is a huge change from Windows RT, which would only work with a small subset of peripherals. Microsoft is also focusing on having these devices always connected through cellular, which is something that is not available for many PCs at the moment. Support will be available for eSIM to avoid having to find room in a cramped design to accommodate a physical SIM, and Microsoft is going so far as to call these "cellular PCs" meaning they are expecting broad support for this class of computer, rather than the handful available now with cellular connectivity. The ability to run x86 Win32 apps on ARM will come through emulation, and to demonstrate the performance Microsoft has released a video of an ARM PC running Photoshop.

Supposedly Apple has Mac OS (or OS X) running natively on ARM processors. The only emulation needed will be for legacy programs, which is what they did for PowerPC programs after switching to Intel processors.

But Microsoft has a horrible record on that - remember NT on RISC? Given that record, I don't trust them on this one. For the Surface Phone, I believe it only makes sense to do if the phone is x64 based, otherwise there is no reason to slay the Lumia. In fact, the Lumia would be a better phone than a phone that still has a Snapdragon but is now emulating x86, as opposed to one running native ARM apps, which is the case w/ Lumia

Intel has seen the writing on the wall for some time now, that's why they've been pushing the Atom and i3 chips to lower and lower power consumption. They need to get some level of parity with the ARM chips or those areas where ARM excels (low power, low heat) will be eating Intel's lunch.

The Intel powered phones weren't quite there in power consumption, but they certainly had processing power. With ARMs becoming more ubiquitous and general-purpose, you can bet your ass Intel is pushing to keep their har

That's the same thing that AMD has been saying for every new chip for the past decade. They haven't had a hit since the Althon 64, and that was 13 years ago. I'll believe it when I see it, until then, I'll expect Zen to be a repeat of the Bulldozer disaster. I'd like for it to turn out to be another Athlon 64, though: it's been a very long time since there has been any competition in the x86 market.

When Intel had XScale, that was DEC's StrongARM, and DEC was noted for high performance, but never low power. The XScale was fine as an embedded CPU but not for low power applications, which is why it went nowhere.

When I've heard people talk about "ARM servers," the fine print tends to be that they're not really talking about ARM CPUs, they're talking about ARM SoCs... so however many ARM CPU cores paired with other components that tailor the SoC for specific workloads. The resulting ARM servers probably won't be general-purpose hardware for everybody to use, they will be marketed to people who know the specific thing they want to do and now they just want to hit the sweet spot on power consumption/cost/whatever.

So far in the micro server and embedded space, ARM has been particularly disappointing to me. I have a drawer full of ARM devices I've accumulated over the years. SheevaPlugs, GuruPlugs, RouterStation, etc. All are potentially useful devices, but ARM is hobbled by proprietary boot systems and differing device trees and proprietary supporting hardware. These devices rely on customized linux distributions, and they are often fairly hard to update to new kernels and new flash file systems. Some of these devices have good CPU performance specs, but in practice I've never had them outperform an intel-based server, even a small low-powered one like the atom.

And now in embedded space we have a plethora of Arm-based devices based on lots of different SoCs from companies all over the world. All with their own forks of Linux. We've got Raspbery Pis, Orange PIs, Pine64s, etc. All very interesting and probably useful. But a nightmare to do anything with in a sustainable way.

The Pi (and some of these devices) at least is easy to update since everything comes off of the sd card, with no kernel flashing required. And some of them like the Pi have a fair amount of hacker inertia behind them, so they are capable of doing cool things (maybe not as server replacements though).

With x86-based embedded systems and small servers, at least I can run more standard, off-the-shelf distros on them. I'd far rather deal with a conventional linux server than a sheevaplug, even if the sheevaplug is a nice tiny thing with lots of potential.

In fact my current home office router is a small, low-power Intel-based computer running bog standard, minimal install of CentOS 7. Wifi is hung off of that using a consumer-grade access point running in bridge mode.

If arm devices had a standard boot process like ufi or even the bios, and could boot off of a variety of devices in a standardized way, including ssds, hard drives, usb sticks, and internal flash storage, and could run stock distributions downloaded from distribution web sites, without custom kernels, then I'd say for sure x86's days are numbered. Arm is good at remaining fragmented though.

The ARM platform mess is likely one of the reasons behind the rapidly growing support for the RISC-V [riscv.org] architecture. The more obvious one being that no one wants to pay rent in perpetuity. ARM may not be as bad as intel, but they are definitely still a substantial burden on hardware developers, and a good open architecture and platform is what just about everyone wants. With Android being mostly platform neutral, I think ARM will also be in for a rude surprise, and turn out to have been a foolish investmen

Developers wanted to recompile their x86 Windows desktop applications for the desktop on Windows RT. Microsoft refused, instead decreeing that the only desktop applications on Windows RT shall be File Explorer, IE, and Office.

According to TLA, x86 compatibility is achieved through emulation. Emulating the x86 instruction set is a non-trivial exercise that almost invariably results in extremely disappointing performance. Why? The x86 instruction set is an accretion of the instruction sets of older Intel processors, beginning with the 8008. This yields a difficult (i.e., computationally expensive) instruction set to decode and execute. Over the years, Intel has implemented micro-architectures that address this problem through special purpose hardware. If you're so inclined, have a read here http://www.intel.com/content/w... [intel.com] for details. The takeaway is that simply emulating the x86 instruction set results in about a 100x slowdown for an equivalent clock rate. So, although this is an interesting technology demonstration, I seriously doubt it will prove useful outside of a small set of applications. It will certainly not be a satisfactory gaming platform.

"The takeaway is that simply emulating the x86 instruction set results in about a 100x slowdown for an equivalent clock rate. So, although this is an interesting technology demonstration, I seriously doubt it will prove useful outside of a small set of applications. It will certainly not be a satisfactory gaming platform."

Sounds like you didn't read the article or watch the embedded video, where MS show x86 Photoshop and x86 World of Tanks being emulated on Windows 10 ARM and a Snapdragon (835?) SoC - a

Don't believe everything you read/see in a press release. Apply some critical thinking.

A reasonable person that both watches the video and reads your comment would conclude that either you are mistaken, or Microsoft and Qualcomm have somewhat overcome or mitigated the issues you point out.

The takeaway is that simply emulating the x86 instruction set results in about a 100x slowdown for an equivalent clock rate.

Emulation definitely results in slowdowns, but it's generally much less than 100x. In particular since any emulator that focuses even slightly on performance uses dynamic compilation: it translates the code once from x86 to the host architecture and from then on runs this translation. The translated version will probably be less efficient than the original code, but by no means 100x slower. 2x to 5x seems more realistic on average, although there are certainly outliers (e.g. code that intensively mucks with system registers or that triggers context switches will be slower, while some straightforward calculation loops may actually become just as fast as or even faster than the original code depending on the target architecture's nature).

Back then it was still Dynamo. And they only managed to do that on a particular HP PA-RISC architecture, because it was very sensitive to instruction cache missers (or had a bad branch predictor?) so that creating linear traces of code was very performant. They later tried to reproduce it on x86 and failed horribly (just like I did during my master's thesis; the best I got was a 20% slowdown for gzip, I think the best they got was no performance loss with some benchmarks).

That only means you have to mark the pages containing the code you just generated read-only once you're done.

Several operating systems in wide use, such as Apple iOS and the operating systems of modern video game consoles, offer no way for third-party applications to switch a page from read-write to read-execute. When a page is allocated for data, the OS clears it first, and it stays non-executable until deallocated. Only the OS's executable loader* has the privilege to allocate pages for code, and once the loader loads a module, verifies its digital signature, and flips its pages from read-write to read-execute,

Much less than 100 times alright, but somehow most JITs/emulators (hello Transmeta) manage to incur a ridiculously high load on the system so running multiple large emulated apps simultaneously is troublesome.
Why not distribute in CIL or llvm IR format?

Having an intermediate format that you statically translate into the target architecture is definitely useful (like Android is now doing with ART), but keep in mind that LLVM IR is not architecture-independent most of the time. E.g., when LLVM IR is generated from C, then this C code will at least have been compiled based on a certain pointer size, size of long, size of long long, alignments for struct fields, etc. CIL is better in this regard.

Your information is highly dated and perhaps your sources are also a bit biased. At any rate, 100x performance hit is stupid wrong.

Static translation was achieving 50-70% native performance rates (measured against clock cycles) with FX!32 on Windows NT for Alpha in the mid 90's. The problem of course has been very well studied since then particularly with the advent of virtualization and the x64 instruction set and the need to enhance the performance of x86 code running on even Intel's own platforms. Furthe

The applications that lose in this scenario are the ones the rely on raw single thread performance. Certainly some games are in this camp, but many games which make efficient use of threads are not.

It's not that simple. Games are multithreaded now, yes, but they do not have a crapload of threads which can make use of a crapload of cores. If you're taking a substantial clock rate hit and another substantial hit from translation overhead, the truth is that it's not just the high-end games which are going to suffer, nor the low-end ones, but any of them which are not very old — as defined by coming from the era when PCs had even lower clock rates. It's already true that Intel processors with less c

It depends on if they emulate it by translation and shadowing, or by interpretation. Software translation is rather fast, but not native-fast. To get native-fast, you have to go native.

I've been suggesting an accelerator chip (maybe even off-die) that decodes x86 instructions into the internal RISC instructions stored in the ICache, but people keep telling me it's impossible because... they're stupid. Modern x86, x86-64, and ARM chips all read instructions in their ISA and translate to an internal CPU

I think it's safe to say it's slower, but as I used to remind the engineers when they pushed back against releasing slow solutions when I was working as a liason between customers and engineering: "not working is infinitely slow".

It's easy to see how Microsoft wants this for the Surface Phone since it will allow customers to run x86 applications on their phone. If your choice is between not running an application and running it slowly, most people will always choose "slowly". Most Win32 software isn't h

They are prefectly capable, and have done so. Actually windows is pretty (NT train on which everything current is based) is pretty cross platform.

What Microsoft can't do is get every other vendor in the world to port all their x86 code, and rebuild all their x86 binaries, where the source is otherwise free of x86 assumptions about pointer size, etc.

You need to add in the ridiculous licensing on the version of Office shipped with the SurfaceRT (it was Home and Student version so unusable for business) and complete lack of Outlook at launch that basically killed the product.

Not all the apps are written by Microsoft, so they have no control over what gets ported. While they have written Office natively to the ARM viz Office for WIndows 10 Mobile, other apps that the phone may need may not be there.

But I still am not getting the point. The applications that one runs on desktops - not talking about email here - are different from the ones one runs on phones. For instance, things like WhatsApp, Yelp, Fandango, et al are there on Windows Phone, but not on x86: for the latter,

That's not actually that big of a downside. With Microsoft Office, for example, Microsoft still recommends most users install the 32-bit version, even though almost everybody is running a 64-bit OS these days. The exception is people who need to run crazy big Access databases (or... shudder... Excel spreadsheets).

Microsoft produced a version of Windows NT for DEC Alpha and threw in x86 emulation layer called FX!32 so it could run existing software. It worked but it ran like a dog, far slower than native x86 instructions. And not for want of trying because it did machine translate instructions to try and execute code natively.

I really don't expect the picture to be any different with x86 over ARM. I expect they'll machine translate x86 instructions into native ARM instructions in some way and cache them somewhere,

Also, FX!32 was meant to be a stopgap solution until users of major NT titles that were run under the Alpha would prove to the ISVs that it's worth recompiling their titles for the Alpha. Which never happened, due largely to Microsoft's lack of support. It won't happen here either. For some reason, Android succeeded where OS/2 failed - being first to market helped, and the fact that there was another illustrious alternative in iOS ensured that the market had what it needed

If there were lots of ARM desktop systems out there I could understand it, but as it is, ARM is almost solely still an embedded, tablet and smartphone ecosystem. Having used classic Windows apps on an 8" tablet, I can't imagine any sane person wanting to run them on a phone, or even an 8" tablet.

I've done it with a tablet and it works, but I own a notebook and have a desktop PC at work so I see little enough reason to do it. In theory I suppose the idea of using a mobile device as your primary computing device has its attractions, but this would also mean for me having a Windows device or a device capable of running the Windows software I do use, and the cost of Windows smart devices is fairly hefty.

Do these "most commonly owned computers" that you mention offer general-purpose functionality when connected to an external monitor and paired to a Bluetooth keyboard, including the ability to take one tool's output and use it as another tool's input without needing each tool to be specifically aware of the other tools?

There are computers everywhere. But the smart phone is similar to a PC than microwave oven.

Do these "most commonly owned computers" that you mention offer general-purpose functionality when connected to an external monitor and paired to a Bluetooth keyboard, including the ability to take one tool's output and use it as another tool's input without needing each tool to be specifically aware of the other tools?

GE has a line of Bluetooth-enabled appliances. Not sure about external keyboard and monitor support.