Samsung Heads Off MediaTek With Octa 'HMP'

Samsung is making heterogeneous multiprocessing software available that allows all eight cores in its Exynos 5 Octa application processor to operate at the same time, when necessary.

Samsung Electronics Co. Ltd. has announced that its Exynos 5 Octa application processor, as well as being one of the best-known examples of ARM big-little processing, can now support heterogeneous multiprocessing (HMP). But a question remains outstanding: How heterogeneous is Samsung's HMP software for Exynos 5 Octa?

In a press release, Samsung stresses that with some additional software, developers working with the Eynos 5 Octa will be able to use all eight cores at the same time, with tasks assigned to any combination of cores. This will be a step up from operating in a big-little clustered migration approach where the four big Cortex-A15 cores and the four little Cortex-A7 cores cannot be powered at the same time.

The move may have been a response to marketing from MediaTek, which began an advertizing campaign stressing the "true octa-ness" of a forthcoming application processor that it is expected to launch soon. MediaTek produced a YouTube video and published an article on its website that extols the virtue of being able to power all eight cores in an application processor at the same time. "MediaTek is the first adopter of true octa-core technology for mobile SoCs," the company said in the video. (See: MediaTek Starts Pushing Octa.)

However, with software soon to be available, presumably for both Exynos 5 Octa processors -- the 5410 with Imagination PowerVR graphics and the 5420 with ARM Mali graphics -- Samsung could yet beat MediaTek to the punch.

With the Exynos application processor up until now, the whole processor context is moved up to the big cores or down to the little cores based on the work load. The additional software from Samsung means that more complex global task scheduling can be done based on the loadings different tasks represent and the available resources. This can produce better optimized power efficiency. (See: ARM Benchmarks Flavors of Big-Little Multiprocessing.)

However, the differences between the CPU migration and the full global task scheduling flavors of big-little are marginal and do come at the cost of more complex software. Heterogeneous multiprocessing is likely to show more significant benefits when it also takes into account graphics processors and balances workloads across GPUs, multiple instruction set architectures, and hardware accelerators.

It appears that Samsung's HMP solution is, for now, restricted to the ARM instruction set. But that does beg the question: How heterogeneous is Samsung's HMP software for Exynos 5 Octa? After all, if covers just one instruction set architecture, is it really heterogeneous?

"An eight-core processor with HMP is the truest form of the big-little technology with limitless benefits to the users of high-performance, low-power mobile products," Taehoon Kim, vice president of System LSI marketing at Samsung Electronics, was quoted saying in the press release.

Limitless benefits are the kind I like, so I look forward to the arrival of the HMP software for Samsung's Exynos 5 Octa application processors. It's due to be available to customers in the fourth quarter of 2013, Samsung said.

"It's one thing to have P-code that can be JITted on different CPUs. It's another to have it produce identical results (as Java designers discovered...). But it takes some serious heroics to migrate a running process from one CPU to another. That includes either translating heap and stack data between different CPUs or defining an identical layout for everything - including all the low level details in the JIT and VM. Having a single OS kernel run as an SMP on different CPU architectures is yet another level of complexity on top of that."

Complex yes, and probably it will never happen, but it is "possible".

" Then there is the hardware perspective - you can't just hook up a random ARM/MIPS core with a random x86/GPU core and expect them to work nicely together as an SMP! Obviously you'd have to agree on the same endianness, coherency protocols, page tables, L2/L3 cache and memory interfaces, how to deal with interrupts, IO, security etc in compatible ways across different CPUs/GPUs."

As it stands, yes, having different architectures from different vendors is not going to happen as they need to agree on all of these details which won't happen because there is nothing in it for them. But different architectures from the same provider is technically possible.

"When you consider the actual implications then it becomes clear it is an impossible dream. And the question remains, what benefit could you possibly get from seamless migration of an app from a CPU to a GPU or from CPU1 to CPU2? You can compile your app to a specific CPU or GPU and already get much better performance than even the best JIT ever could."

Good question. The benefit is that it would give you more degrees of freedom to optimise for more power, performance points.

PS. I think we are in agreement Wilco1. Perhaps I am being pedantic :-)

In theory anything is possible. Whether it is practical and achieveable is a completely different matter...

It's one thing to have P-code that can be JITted on different CPUs. It's another to have it produce identical results (as Java designers discovered...). But it takes some serious heroics to migrate a running process from one CPU to another. That includes either translating heap and stack data between different CPUs or defining an identical layout for everything - including all the low level details in the JIT and VM. Having a single OS kernel run as an SMP on different CPU architectures is yet another level of complexity on top of that.

Then there is the hardware perspective - you can't just hook up a random ARM/MIPS core with a random x86/GPU core and expect them to work nicely together as an SMP! Obviously you'd have to agree on the same endianness, coherency protocols, page tables, L2/L3 cache and memory interfaces, how to deal with interrupts, IO, security etc in compatible ways across different CPUs/GPUs.

When you consider the actual implications then it becomes clear it is an impossible dream. And the question remains, what benefit could you possibly get from seamless migration of an app from a CPU to a GPU or from CPU1 to CPU2? You can compile your app to a specific CPU or GPU and already get much better performance than even the best JIT ever could.

I think that's a big.LITTLE centric definition of HMP :-) In theory, migrating a task from a CPU to a GPU or vice versa is possible (of course it is), whether that's the most efficient solution is another story. I could envisage a case where heterogeneous processors (i.e. processors with different architectures) in an SMP configuration (i.e. all sharing the same main memory) take p-code as input and translate/execute this just-in-time. Of course, we are nowhere near this in reality and that might never happen but I do not think we should redefine generic concepts depending on what is practically possible today.

But i may want to move some task or other out to the GPU or to another CPU depending on what resources are available at run time and an assessment of the work loads from various tasks -- and then move the Linux kernel down from the Cortex-A15 to a Cortex-A7 and thereby save power.

Ah I misunderstood what you were trying to say ("ARM" without qualification can mean ARM Ltd, the ARM community, the ARM architecture or the ARM instruction set). So basically what you meant was that it isn't really heterogeneous if it doesn't involve multiple ISAs.

I don't agree. An OS kernel is never going to run as SMP across multiple ISAs - remember to be SMP you have to run the same kernel image. Could a GPU run the ARM Linux kernel?

So heterogeneous computing with different ISAs is never going to be SMP. To be more precise, HMP is a generalization of SMP where the cores can have different caches and microarchitectures but still have to be based on the same ISA (like SMP). Given the time it has taken to get HMP ready, getting the kernel to support it was a non-trivial task.

Heterogeneous computing (without the SMP part) is something different - every device you can imagine already does that. A typical smartphone for example has about 10 microcontrollers and DSPs besides the main CPUs and GPUs. But nobody sane would suggest to run the Linux kernel on those!

If Samsung is not supporting use of the GPU cores alongside the use of CPU cores OR the use of mutliple architectures (not applicable in Exynos 5 Octa) I think they would have done better to bill this development as "true-octa" or "global task scheduling."

All I am going from is the proper use of the term heterogeneous which means distinctly non-uniform.

And the use that would be expected by the Heterogeneous System Architecture Foundation.

The HSA is looking to help create systems where software is able to be optimized to run on a heterogeneous systems that would include multiple ISAs and multiple implentations of each ISA, mixing GPUs and CPUs.

Samsung (and ARM and MediaTek) are members of the HSA but it is NOT clear that Samsung is supporting the fullest definition of HMP here?

Technically, you are right Peter. True HMP implies tasks working across different processor architectures. I guess Wilco1 is referring to the two ISAs available in ARM processors: ARM ISA and Thumb2 ISA. What you are implying is a completely different processor architecture (not just the ISA). I would still consider what Samsung and MediaTek are doing as HMP but its a special case or initial stab at it. When they extend that to GPUs and/or different processor architectures e.g. from IMG and ARM, you could call that full HMP. That said, I do not think this would happen anytime soon especially not between rival architectures.

I believe "heterogenous" in Big Little refers only to varying cores all using the same ISA.

There is an Intel patent which discusses the Big Little concept including extending it to the case where the ISA:s are heterogenous. The lesser ISA would emulate instructions that were lacking (claim 3, 22 and 23 of a patent called "Dynamic Core Swapping" http://patents.com/us-8156351.html).

So maybe that's something you remember from there? Or something you got from Intel – that would be interesting!