Heterogeneous compute has been on the tip of the tongue for many involved in integrating compute platforms, and despite talk and demos regarding hardware conforming to HSA provisional specifications, today the HSA Foundation is officially launching the ratified 1.0 Final version of the standardized platform design. This brings together a collection of elements:

- the HSA Specification 1.0 defines the operation of the hardware,
- the HSA Programmers’ Reference Manual for the software ecosystem targeting tools and compiler developers,
- the HSA Runtime Specification for how software should interact with HSA-capable hardware

The specifications are designed to be hardware agnostic as you would imagine, allowing ARM, x86, MIPS and other ISAs to take advantage of the HSA standard as long as it conforms to the specifications. The HSA Foundation is currently working on conformance tests, with the first set of testing tools to be ready within a few months. The goal is to have the HSA specification enabled on a variety of common programming languages, such as C/C++, OpenMP, Python as well as C/Fortran wrappers for the HPC space. Companies such as MultiCoreWare are currently helping develop some of these compilers for AMD, for example.

The announcement today is part of an event being held in San Jose, CA, and the event will also see the preview of a HSA book designed to help developers in this space. There will also be a round-table panel of HSA board members discussing the release as well as taking questions. Some of the obvious key points the HSA Foundation will be pushing include video conferencing in mobile (exchanging encode cycles on the CPU/GPU for lower bandwidth requirements), video search, embedded applications and high performance computing, especially those with high memory requirements but can still take advantage of co-processor based compute.

As part of the pre-briefing we received, Phil Rogers, the president of the HSA Foundation and corporate fellow at AMD, explained the purpose of the announcement and answered a few of our questions. Mr Rogers explained how current Kaveri APUs currently rely on the HSA 1.0 Provisional specification, and Carrizo (based on Excavator) will aim for 1.0 Final compliance if the tools are ready before Carrizo launch. Carrizo will not be held back in order to secure compliance before ramping up production, but the expectation is that it should pass and be used similar to Kaveri but with the minor adjustments required for 1.0 Final, such as GPU context switching.

Mr Rogers also explained that the HSA 1.0 Final specifications should integrate with Aparapi for Java and Project Sumatra, both of which we described back at Kaveri launch last year. Currently C++ AMP is also a goal as it allows a reduction in defining restricted kernels due to the unified memory. Also making sure that the new upcoming version of C++17 is also fully supported within the HSA context is important.

With regards profiling, the HSA Foundation has a Tools Working Group currently pursuing both a Profiling API and a Debugging API to allow low language software developers to integrate these tools into their GUIs. We were told that this should happen within a year, but the API requires proper low level access from the developer.

Mr Rogers was not able to comment on the implementations of other HSA Foundation members, particularly companies such as Qualcomm, Samsung, ARM, Imagination Technologies, LG and MediaTek, all of which have ‘arms’ into the smartphone space where HSA could encompass a wide variety of scenarios. We were told however that each of the members of the HSA Foundation, of which there are over 40 technology companies and 17 universities, were keen on closing the specification in order to move forward with their goals.

Heterogeneous System Architecture is in for the long haul for sure, although execution and improvement of user experience will be the key factors in providing something tangible. There also requires an element of learning to think in the HSA paradigm, something not specifically taught to young software developers entering college and university. To that extent, PCIe co-processors and multi-core programming are still low down on the list of to-teach. Nevertheless, I would imagine HSA offers a wide opportunity to those who can take advantage of it, developing their hardware and tools to use it effectively, and the ratification of the 1.0 Final specifications is a big step along that road.

Post Your Comment

26 Comments

just a essential question:Where from could a software "know" which different types of cores with which types of features are available in a heterogeneous system?How could the software or the OS choose the best fitting types of cores for a sepcial part of the software's code?

A method which is used for CPUs with big.LITTLE concepts (arm cpus in smartphones) seams not to be suitable/efficient for complex taksher the scheduler of the OS "simply" sorts for priorität (high -> big) and foreground/background tasks (foreground >big)https://en.wikipedia.org/wiki/ARM_big.LITTLE#Heter...Reply

HSA exposes the platform topology, and software does its own choice (i.e. pick an agent, create a queue associated to the agent, submit a packet to the queue). In other words, if you want automation, you would have to wait for a library (e.g. AMD's Bolt) to target HSA.Reply

"There also requires an element of learning to think in the HSA paradigm, something not specifically taught to young software developers entering college and university. To that extent, PCIe co-processors and multi-core programming are still low down on the list of to-teach."

I don't think that will change anytime soon, if not ever. CS grads will know data structures and algorithms and high level programming. Front end/back end web programming, database, abstracted application programming is what they're ready for. Even undergrad Operating System classes have to teach at a high level with a lot of abstracted out details simply because it's not possible to learn everything from scratch in one semester. Hell, a proper OS class would all 4 years of college on its own. Most CS grads are not prepared to understand the Intel software dev books and spec sheets, UEFI spec, nor decode the output of linux's lspci.

I had to learn most of what I do at work over a several years (it's never-ending, really) and could hardly find any books on the subject. The only resource I've found so far to cover this type of area is http://mindshare.com/ (My company paid for a couple classes and I've gotten other books of theirs on my own) If anyone knows of other resources, please share themReply

All jobs are like that. When you go to school to get a degree, all you've done is prove that you are highly trainable in a specific skill set. That's why experience becomes the number one item listed on a resume after your first job. I've never worked anywhere that expects a recent graduate to know what they're doing. The people I manage can't operate on their own for a year after they're hired if they have previous experience, and up to three years for someone who doesn't.Reply

Yes, but it varies by degree according to the topic. Low level areas remain mostly a black art, relative to other areas, it's harder for someone to learn on their own, in order to gain experience in the first place. There are no Oreilly books on the topic and it's not something that you could really learn on stackexchange or discussion forums. The vast majority of jobs in this field are not entry level or available to new grads.Reply

It suffers a lot from the fact that a large percentage of the programming jobs out there are not low-level bare metal programming. What business is mostly concerned with is development costs and speed more than performance. That's why it's a heck of a lot harder to find materials on low-level programming.

I suspect this trend will continue and the next wave is all going to be stuff like JavaScript, it's starting to happen with stuff like Node.JS, let's write everything in JavaScript! Scary, right?Reply

That's not necessarily a bad thing. If compilers and virtual machines do an excellent job optimizing for performance, only compiler and hypervisor coders need to do low level coding. People writing applications can focus on user experience and business logic.High level doesn't necessarily mean low performance. Granted, hand optimizing assembly language can often lead to extremely large gains in performance, but it can also result in fragile code that's not maintainable. On the other hand, putting the performance effort into the platform will improve all software that runs under that platform.That is the idea behind HSA. To provide an easier path to multithreading and massively parallel applications that are agnostic to particular hardware as long as it supports a standard.Reply

It depends really, the computer science school at my university (Manchester) does have third-year modules on parallel and multi-core computing where they teach you the main principles of programming in a multithreaded environment as well as how thread scheduling and cache coherency is handled. In my masters year, they cover parallel and heterogenous comping in quite a lot of detail and they gave a decent intro to OpenCL and CUDA. OpenACC was briefly touched on as well. Operating systems, compilers, microcontrollers and hardware design are also taught really well at various levels.

Then again, it does help that the Advanced Processor Technologies group at the school consists of some seriously smart researchers, headed by none other than Steve Furber.Reply

For what it's worth, at UCLA most freshmen in computer science and computer engineering learn assembly, MIPS, multithreading, and basics of computer architecture and operating systems by the end of the year. Reply