A year ago Fudzilla revealed that Samsung has been building a mobile GPU. We are still as puzzled about it now as we were then, but there are more confirmations that they should finish it in the 2017 – 2018 timeframe.

The details are slim but the report claims Samsung want to unite its Eyxnos CPU with its own GPU. The main goal is to become competitive in the Heterogeneous System Architecture (HSA) system market. It is believed that this might become a big thing in the longer term. The original plan was to deliver the GPU much earlier, but it looks like the plans were significantly delayed.

Some of you might remember Intel Larrabee, that Hamlet of discrete graphics that was supposed "to bee" for years before it was declared "not to bee." Making a GPU from scratch is extremely hard as making all the drivers is the programming job from hell.

Samsung's own GPU will probably find its way by 2017 or 2018 which is a long time in computing. The Exynos 7420 processor uses Mali-T760 MP8 graphics and there are indications that the next generation Exynos expected in 2016 will use the Mali-T860 and T880.

Samsung needs to stay competitive as the Qualcomm and MediaTek made more noise about having the GPU and CPU that work together to speed up tasks. Not even Apple has its own GPU. Apple used PowerVR for years while the Qualcomm had its own Adreno graphics.

MediaTek started using latest Mali T860 for its Helio P10 (6755) mainstream / performance SoC while the Helio X20 is using the yet to be named Mali-based graphics. Nvidia with its Geforce mobile seems to be the fastestfor tablets and console use while Adreno 530 looks like a top performer.

After a lot of asking around, we can give you some actual numbers about the AMD's coherent fabric.

The inter-connecting technology already sounded very promising, but now we have the actual number. The HSA, Heterogeneous System Architecture MCM (Multi Chip Module) that AMD is working on can give you almost seven times faster score than the traditional PCIe interface.

AMD will be using this technology with the next gen Multi Chip module that packs a Zeppelin CPU (most likely packed with a bunch of ZEN cores) and a Greenland GPU that of course comes with super fast HBM (High Bandwidth Memory). The Greenland and HBM can communicate at 500 GB/s and can provide highest performance GPU with 4+ teraflops.

This new MCM package based chip will also talk with DDR4 3200 memory at 100GB/s speed making it quite attractive for the HSA computation oriented customers.

We have mentioned both AMD's Zen core processors, as well as Greenland HBM memory powered graphics in the past, but now we have a few more details.

The highest end compute HSA part has up to 16 Zen x86 cores and supports 32 threads, or two threads per core. This is something we saw on Intel architectures for a while, and it seems to be working just fine. This will be the first exciting processor from the house of AMD in the server / HSA market in years, and in case AMD delivers it on time it might be a big break for the company.

Each Zen core gets 512 KB of L2 cache and each cluster or four Zen cores is sharing 8MB L3 cache. In case we are talking about a 16-core, 32-thread next generation Zen based x86 processor, the total amount of L2 cache gets to a whopping 8MB, backed by 32MB of L3 cache.

A theoretical quad-core would have four times 512KB cache and 8MB L3 cache. The platform supports secure boot and AMD's crypto coprocessor, which is important for corporate and business customers, and there is a very good chance that this processor will end up in the HSA compute market.

This new APU also comes with the Greenland Graphics and Multimedia Engine that comes with HBM memory on the side. The specs we saw indicate that there can be up to 16GB of HBM memory with 512GB/s speed packed on the interposer. This is definitely a lot of memory for an APU GPU, and it also comes with 1/2 rate double precision compute, enhanced ECC and RAS and HSA support.

DDR4 Coherent fabric Zen X86 die and Greenland HBM GPU

The next generation Zen based APU also comes with PCIe Gen3 support and SATA express or SATA, 1GbE support and DDR4 memory controllers in 4x72 configuration. The 4-channel DDR4 supports ECC memory and speeds up to 3200 MHz, SODIMM, UDIMM, RDIMM, LRDIMM 2DIMMs/channel, with total capacity of 256GB per channel. That is a lot of memory.

The APU has 64 PCI express Gen 3 lanes, where 16 lanes are switchable with 2 lanes or SATA Express and 14 lanes of SATA. AMD is using coherent fabrics to interconnect the Zen CPU die and Greenland graphics die, and it also uses coherent fabric for inter communication between Zen die CPUs and caches, PSP, Times, counters, ACPI or Legacy interface, GMI Physics, Combo Physics, Host Controllers like USB, SATA or GbE and memory controllers.

GMI stands for Global Memory interconnect and this is the interface between Zen die and Greenland die, or between two chips on the same multi-chip module package.

The world is still expecting the birth of the High Bandwidth Memory (HBM) boosted Fiji GPU, and the first cards based on the new chip should launch in late June, or the end of Q2 2015 if you prefer.

AMD is looking ahead and their engineers are working hard on the company's next generation HBM card, currently codenamed Greenland. We are not sure if this is the name of the whole generation or this is simply a single GPU backed by HBM, that will end up in APUs.

Like we said, we doubt that Fiji will actually launch on the Pacific island of Fiji and that the Greenland launch event will be held on Greenland (Denmark), but we can confirm that the Greenland GPU will use HBM memory. There is still no confirmation on the manufacturing process, but we would expect that Greenland ends in either 14nm GlobalFoundries process or TSMC's 16nm process. Greenland will be a part of AMD's next generation K12 APU, which means that this multiple Zen core APU will get some great graphics performance. It is not clear if Greenland is a part of the Caribbean Islands (Fiji) generation or if it belongs to a successor generation.

Greenland uses HBM in 2016

At this time we cannot confirm (or deny) whether or not Greenland will launch as a desktop card, too, and we can only speculate that Greenland is shrunk derivative of the Fiji generation architecture.

Nvidia's first HBM Pascal card that is coming by early 2016. Pascal will use the 2.5 D HMB approach and probably HBM 2 memory, and we expect that AMD's Fiji successor will use HBM 2 memory as well 2 memory as well.

Details are limited, apart of the fact that Greenland can end up in the next generation APU such as K12, making the architecture quite scalable. High Bandwidth Memory combined with new K12 cores might create the fastest integrated product of all time, and let's not forget that AMD is putting a lot of emphasis on Heterogeneous System Architecture (HSA) and the compute side of things. With the help of HBM-powered Greenland that can end up with 500GB/s bandwidth, along with multiple Zen 64-bit CPU cores, you can expect quite a lot of compute performance from this new integrated chip.

The HSA Foundation has issued a new standard which can match up graphics chips, processors and other hardware to boost things like video search.

The downside is that Intel and Nvidia to not appear to have been involved in the creation of the version 1.0 of its Heterogeneous System Architecture specification.

What the standard would mean is that compute, graphics and digital-signal processors will be able to directly address the same physical RAM in a more cache-coherent manner. It will mean the end of external buses and loosely linked interconnects, and allow data to be processed at the same time.

A GPU and CPU can work on the same bits of memory in an application in a multi-threaded way. The spec refers to GPUs and DSPs as "kernel agents" which sounds a bit like corporate spies for KFC.

The blueprints support 64-bit and 32-bit, and map out virtual memory, memory coherency, and message passing, programming models, and hardware requirements.

While the standard is backed by AMD, ARM, Imagination Technologies, MediaTek, Qualcomm, and Samsung, Intel and Nvidia are giving it a miss. The thought is that with these names onboard there should be a enough of a critical mass of developers who will build HSA-compliant games and tools.

AMD has officially announced the successor to its Kaveri A-series APU line-up and the new chip, coming sometime in the first half of 2015, is codenamed Carrizo.

We already mentioned the Carrizo announcement yesterday, so regular readers may spot a bit of repetition, but we will go through all the newly released info nonetheless. AMD stressed that Carrizo supports Heterogeneous System Architecture (HAS), where the GPU and CPU share a unified memory and can combine serial work suitable for CPUs with parallel work suitable for GPUs. You get to benefit from unified memory, memory coherency as well as context switching between CPU and GPU data.

Mantle, DirectX 12 and Dual Graphics on a budget

The Carrizo APU shares a portion of its infrastructure with Carrizo-L, making it easier to get design wins. For example, it simply costs less to design and manufacture notebooks based on Carrizo or Carrizo-L APUs.

The new Excavator CPU core is optimised for low power notebooks or convertible form factors, which might imply that we will see some interesting designs powered by AMD in 2015. We wonder if we will get to see detachable designs.

The next generation Radeon Graphics Core Next (GCN) iGPU has support for Mantle, DirectX 12 and Dual Graphics, so it will be attractive to mainstream gamers. The Carrizo APU has a single chip Southbridge, which is not a new approach as in the least few generations the Northbridge part of chipset migrated to the CPU in both Intel and AMD designs.

Significant performance and battery improvements

AMD expects significant performance and battery life improvements from its first HSA 1.0 processor and we have heard the same statements from John Byrne, SVP and GM of AMD Computing & Graphics Business Unit - Carrizo is the best and most efficient APU AMD ever designed.

AMD Secure Processor is now part of Carrizo APU and this component is based on ARM Trustzone and integrated onto the Southbridge. The good news is that you don’t need an extra chip that will make your platform a tad more expensive.

HSA can make a difference

Some of the potential HSA applications include Natural User Interface and support for gestures, voice recognition, biometric facial recognition a secure, fingerprint recognition, or even augmented reality where you can superimpose digital information as a virtual overlay.

Audio video content management including searching, indexing, tagging and data mining of video and audio can also benefit from HSA approach as well as streaming media, new codecs, 3D, transcode and audio experience beyond HD. Virtual reality might be another technology that will become popular in the next few years when Oculus gets its act together.

HSA approach is to acquire loads of HD content e.g. 4K video then simulates physics, structural analysis, thermal, electromagnetic or computational fluid dynamics and the end render a photorealistic visualization.

AMD has a bunch of people interested in HSA with members including ARM, Imagination, Mediatek, Texas Instruments, Qualcomm, Broadcom and many others.

There is no doubt that AMD is changing. The company has a new CEO and Lisa Su definitely has a solid technology background that can help the creation of some great products. We were surprised to see John Byrne, Senior Vice President and new General Manager of AMD Computing & Graphics Business Unit, deliver a YouTube preview of AMD’s upcoming APU, codenamed Carrizo.

Carrizo and Carrizo-L are the new parts in AMD’s 2015 roadmap and these APUs will end up in laptops and All-in-One systems. Byrne claims that the platform is running benchmarks and that many partners are excited by this release. The flagship Carrizo processor uses a new x86 CPU core, codenamed Excavator, with next generation AMD Radeon graphics. Carrizo is world's first Heterogeneous Systems Architecture (HSA) 1.0 compliant SoC. Byrne promises better performance and better efficiency, calling Carizzo the best APU they ever developed. The silicon is running validations and benchmarks and it is right on schedule.

The runner up is called Carrizo-L. Based on Puma+ CPU cores with AMD Radeon R-Series GCN graphics, Carrizo-L is intended for mainstream configurations. Designers have added an AMD secure processor in both Carrizo and Carrizo M, enabling ARM TrustZone support across the entire family. This is what business and corporate users were missing from AMD systems in the past.

Carrizo and Carrizo-L are scheduled to ship in 1H 2015, with laptop and All-in-One systems based on the 2015 AMD Mobile APU family expected in market by mid-year 2015.

AMD has confirmed what we knew all along. Although it might announce the first Kaveri products later this year, the first desktop parts will be available on January 14 2014. Although many were hoping to see the first Kaveri chips by the end of the year, having them just two weeks into 2014 doesn’t really make much of a difference.

So what can we expect from the first batch of Kaveri parts?

One part revealed during the APU 13 presentation was the A10-7850K. It appears to be a 3.7GHz quad-core with 512 Radeon cores (R7-series GPU). The theoretical performance calculated by AMD for this particular part is 856 GFLOPs.

However, the trouble with Kaveri is that we still don’t know the impact of HUMA, HSA and Mantle on actual real world performance. HUMA will let the chip share memory between the GPU and CPU, although GDDR5 support is lacking, shattering the wet dreams of many a fanboy. HSA and Mantle could unlock even more performance.

"Kaveri can perform well above its class because of these technologies," an AMD spokesman told EE Times.

So far AMD is confirming Mantle support in four upcoming games. Mantle could practically allow AMD APUs to do more with less silicon, boosting their price/performance ratio. Of course, more developers need to embrace Mantle in order to give new AMD APUs a competitive edge.

Kaveri is set to launch by the end of the year, with availability in early 2014. The initial batch will be reserved for OEMs, for desktops and notebooks to be precise.

However, later versions will also target embedded and server markets. Berlin server parts will ship by July and they will be the first APU-derived server parts to support HSA. AMD is promising to deliver an updated software developer kit, including a GCC/HSA Linux compiler, PGI Accelerator compiler for C and C++, Open CL Math, math library ArrayFire 2.0 for Open CLE and CodeXL, a tool suite for Linux and Windows.

Although some big names are still missing from the list of HSA supporters, the list itself is growing and it’s already impressive. ARM, Imagination, Qualcomm, Samsung, Mediatek, Texas Instruments, LG, Sony, Canonical, VIA, Marvel, Vivante and Broadcom are on board, to name a few. Intel and Nvidia are not.

Oracle is expected to roll out a Java update to harness the full potential of HSA. Java Lambda expressions should allow developers to accelerate parallel projects on GPU cores by 2015.

Chipmaker AMD is getting all enthusiastic over Heterogeneous Systems Architecture (HSA) as its cunning plan for the future.

Recently it has been talking to Ars Technica about something else dubbed "heterogeneous Uniform Memory Access" (hUMA) which is its take on HSA. HSA involves developing systems with multiple different kinds of processor, connected together and operating as peers. Normally it is CPUs and GPUs.

Armed with another set of acronyms AMD talks about splitting workloads between a CPU and a GPU, and the creation of a general purpose GPU (GPGPU). But a GPGPU is awkward for software developers, some of whom might think that GP stands for guinea pig and others are not happy that the CPU and GPU have their own pools of memory.

HUMA is AMD’s way around this problem. Using HUMA, the CPU and GPU share a single memory space and the GPU can directly access CPU memory addresses, allowing it to both read and write data that the CPU is also reading and writing. It is also cache coherent so the CPU and GPU will always see a consistent view of data in memory. If a processor makes a change then the other processor will see it.

We will first see HUMA in the chip codenamed Kaveri. It mixes up to three compute units using AMD's Bulldozer-derived Steamroller cores with a GPU. The GPU will have full access to system memory. It should be out in the second half of the year. It appears likely that the chip AMD is designing for the PlayStation 4 later this year will also be a HSA system.