Archive

Back in May of this year, ARM unveiled Mali-G71 GPU for premium devices, and the first GPU of the company based on Bifrost architecture. The company has now introduced the second Bifrost GPU with Mali-G51 targeting augmented & virtual reality and higher resolution screens to be found in mainstream devices in 2018, as well as Mali-V61 VPU with 4K H.265 & VP9 video decode and encode capabilities, previously unknown under the codename “Egil“.

Mali-G51 GPU

Click to Enlarge

ARM Mali-G51 will be 60% more energy efficiency, and have 60% more performance density compared to Mali-T830 GPU, making the new GPU the most efficient ARM GPU to date. It will also be 30% smaller, and support 1080p to 4K displays.

Mali-V61 VPU

Mali-V61 can scale from 1 to 8 cores to handle 1080p60 up to 4K @ 120 fps, supports 8-/10-bit HEVC & 8-/10-bit VP9 up to 4K UHD video encoding and decoding, making it ideal for 4K video conference and chat, as well as 32MP multi-shot @ 20 fps.

The company claims H.265 and VP9 video encoding quality is about the same for a given bitrate with Mali-V61 as shown in the diagram below.

VP9 vs HEVC vs H.264 – Click to Enlarge

Beside the capability of selecting 1 to 8 cores, silicon vendors can also decide whether they need encoding or decoding block for their SoC. For example camera SoC may not need video decoding support, while STB SoCs might do without encoding. While Mali-V61 is a premium IP block, ARM is also expecting it in mainstream devices possibly also featuring Cortex A53 processor cores and Mali-G51 GPU.

Vulkan was introduced as the successor of OpenGL ES in March 2015, promising to take less CPU resources, and support multiple command buffers that can be created in parallel and distributed over several cores, at the cost of slightly more complex application programming since less software work in done inside the GPU drivers themselves with app developers needing to handle memory allocation and thread management.

This was just a standard at the time, so it still needed some time to implement Vulkan, and work is still in program but ARM showcased the power efficiency of Vulkan over OpenGL ES in the video embedded at the end of this post.

The demo has the same graphics details and performance using both OpenGL ES and Vulkan, but since the load on the CPU in that demo can be distributed over several CPU cores with Vulkan against a single core for OpenGL ES, it’s possible to use low power cores (e.g. Cortex A53) operating at a lower frequency and voltage, hence reducing power consumption.

ARM also measured that the complete OpenGL ES demo would use 1270 joules against 1123 Joules for the Vulkan demo, resulting in about 15% energy savings in this “early stage” demo.

ARM claims 30% “sustained” performance improvement between Cortex A72 and Cortex A73, but the GPU should be where the performance jump is more significant, as ARM promises a 50 percent increase in graphics performance, and a 20 percent improvement in power efficiency with Mali G71 compared the previous generation (Mali-T880). Kirin 960 also integrates twice the GPU cores compared to Kirin 950, and some GPU benchmarks provided by Hisilicon/Huawei confirm the theory with over 100% performance improvement in both Manhattan 1080p offscreen and T-Rex offscreen GFXBench 4.0 benchmarks.

The first smartphone to feature Kirin 960 is likely to be Huawei Mate 9 rumored to come with a 5.9″ 2K display, 6GB RAM, and 256 UFS flash.

Imagination has just unveiled the successor of MIPS I6400 64-Bit Warrior Core with MIPS Warrior I-class I6500 heterogeneous CPU supporting up to 64 cluster, with up to 6 cores each (384 cores max), themselves up to 4 thread (1536 max), combining with IOCU (IO coherence units), and external IP such as PowerVR GPU or other hardware accelerators.

The main features of MIPS I6400 processor are listed as follows:

Heterogeneous Inside – In a single cluster, designers can optimize power consumption with the ability to configure each CPU with different combinations of threads, different cache sizes, different frequencies, and even different voltage levels.

Heterogeneous Outside – The latest MIPS Coherence Manager with an AMBA ACE interface to popular ACE coherent fabric solutions such as those from Arteris and Netspeed lets designers mix on a chip configurations of processing clusters – including PowerVR GPUs or other accelerators – for high system efficiency.

Hardware virtualization (VZ) – I6500 builds on the real time hardware virtualization capability pioneered in the MIPS I6400 core. Designers can save costs by safely and securely consolidating multiple CPU cores with a single core, save power where multiple cores are required, and dynamically and deterministically allocate CPU bandwidth per application.

Designed for compute intensive, data processing and networking applications – The I6500 is designed for high-performance/high-efficiency data transfers to localized compute resources with data scratchpad memories per CPU, and features for fast path message/data passing between threads and cores.

OmniShield-ready – Imagination’s multi-domain security technology used across its processing families enables isolation of applications in trusted environments, providing a foundation for security by separation.

The processor is also based on the standard MIPS ISA, so developer will be able to leverage existing software and tools such as compilers, debuggers, operating systems, hypervisors and application software already optimized for the MIPS ISA.

The figure above shows what an SoC based on MIPS I6500 may look like with one cluster with 4 CPU cores, 2 IOCUs, another cluster with any CPU cores but instead eight IOCUs interlinked with third party accelerators, and one PowerVR GPU.

Target applications include advanced driver assistance systems (ADAS), autonomous vehicles, networking, drones, industrial automation, security, video analytics, machine learning, and more. One of the first customer for the new processor is Mobileye EyeQ5 SoC designed for Fully Autonomous Driving (interestingly shortened as “FAD”) vehicles will eight multi-threaded MIPS CPU cores coupled with eighteen cores of Mobileye’s Vision Processors (VPs). EyeQ5 SoC should be found in vehicles as early as 2021.

MIPS I6500 CPU can be licensed now, with general availability planned for Q1 2017.You’ll find more technical details on the product page, and blog post for the announcement.

Parker is said to deliver up to 1.5 teraflops (native FP16 processing) of performance for “deep learning-based self-driving AI cockpit systems”.

This type of board and processor is normally only available to car and part manufacturer, and the company claims than 80 carmakers, tier 1 suppliers and university research centers are now using DRIVE PX 2 systems to develop autonomous vehicles. That means the platform should find its way into cars, trucks and buses soon, including in some 100 Volvo XC90 SUVs part of an autonomous-car pilot program in Sweden slated to start next year.

Despite it being two weeks since rc7, the final patch wasn’t all that big, and much of it is trivial one- and few-liners. There’s a couple of network drivers that got a bit more loving. Appended is the shortlog since rc7 for people who care: it’s fairly spread out, with networking and some intel Kabylake GPU fixes being the most noticeable ones. But there’s random small noise spread all over.

And obviously, this means that the merge window for 4.8 is open.Judging by the linux-next contents, that’s going to be a bigger release than the current one (4.7 really was fairly calm, I blame at least partly summer in the northern hemisphere).

Parallel directory lookups – The directory cache caches information about path names to make them quickly available for pathname lookup. This cache uses a mutex to serialize lookup of names in the same directory. The serializing mutex has been switched to a read-write semaphore in Linux 4.7, allowing for parallel pathname lookups in the same directory. Most filesystems have been converted to allow this feature.

New “schedutil” frequency governor – There are two main differences between it and the existing governors. First, it uses information provided by the scheduler directly for making its decisions. Second, it can invoke cpufreq drivers and change the frequency to adjust CPU performance right away, without having to spawn work items to be executed in process context or similar, leading to lower latency to make frequency changes.

Histograms of events in ftrace – . This release adds the “hist” command, which provides the ability to build “histograms” of events by aggregating event hits. As an example, let’s say a user needs to get a list of bytes read from files from each process. You can get this information using hist triggers, with the following command command:

EFI ‘Capsule’ firmware updates – The EFI Capsule mechanism allows to pass data blobs to the EFI firmware. The firmware then parses them and makes some decision based upon their contents. The most common use case is to bundle a flashable firmware image into a capsule that the firmware can use to upgrade in the next boot the existing version in the flash. Users can upload capsule by writing the firmware to the /dev/efi_capsule_loader device

Support for creating virtual USB Device Controllers in USB/IP – USB/IP allows to share real USB devices over the network. Linux 4.7 brings the ability to create virtual USB Device Controllers without needing any physical USB device, using the USB gadget subsystem. For what purpose? For example, for improving phone emulation in development environments, for testing USB and for educational purposes.

Fix s5p-mfc driver probe on Exynos542x Peach boards (need to provide MFC memory banks). On these boards this was broken for long time but apparently no one enabled this driver till now.

Fix creation of debugfs entries for one regulator on Exynos4210 Trats board.

Fix probing of max8997 MFD driver (and its children) because of missing interrupt. Actually the current version of the driver probes (just without interrupts) but after switching to regmap and regmap-irq, the interrupt will be mandatory.

For even much more details, you can check out Linux 4.7 changelog with comments only generated using git log v4.6..v4.7 --stat. Alternatively, and much easier to read, you can head to kernelnewbies.org to learn more about Linux 4.7 changes.

Today ARM has revealed the first details of its latest mobile processor and GPU, both said to be optimized for VR (Virtual Reality) and AR (Augmented Reality) applications.

Starting with the ARM Cortex-A73, we’re looking at an evolution of the current Cortex-A72 with ARM claiming 30 percent “sustained” performance over the Cortex-A72 and over twice the performance over the Cortex-A57. ARM is already talking about clock speeds of up to 2.8GHz in mobile devices. Other improvements include an increase up to 64k L1 instruction and data cache, up from 48 and 32k respectively for the Cortex-A72, as well as up to 8MB of L2 cache.

The Cortex-A73 continues to support ARM’s big.LITTLE CPU design in combination with the Cortex-A53 or the Cortex-A35. It’s also the first ARM core to have been designed to be built using 10nm FinFET technology and it should be an extremely small CPU at around 0.65 square millimeters per core, or a 46 percent shrink from the Cortex-A72. By moving to 10nm and FinFET, ARM is also promising power efficiency gains of up to 20 percent over the Cortex-A72.

Cortex A53 vs A72 vs A73

The Mali-G71 GPU takes things even further, as ARM is promising a 50 percent increase in graphics performance, a 20 percent improvement in power efficiency and 40 percent more performance per square millimeter over its previous generation of GPU’s. To accomplish this, ARM has designed the Mali-G71 to support up to 32 shader cores, which is twice as many as the Mali-T880 and ARM claims that this will enable the Mali-G71 to beat “many discrete GPUs found in today’s mid-range laptops”. We’d take this statement with a grain of salt, as it takes more than raw computing performance to do a good GPU and that’s why there are so few companies that are still designing their own GPUs. As with the Cortex-A73, the Mali-G71 is optimized for 10nm FinFET manufacturing technology.

As always with ARM based GPUs, it depends on the partner implementation and the Mali-G71 supports designs with as little as one shader. Looking at most current mobile GPU implementations we’d expect to see most of ARM’s partners to go with a 4-8 shader implementation to keep their silicon cost at a manageable level. That said, we might get to see one or two higher-end implementations, as ARM has already gotten the likes of Samsung, MediaTek, Marvell and Hi-Silicon interested in its latest GPU.

With a big move towards VR and AR, it’s also likely that the ARM partners are going to have to move to a more powerful GPU to be able to deliver the kind of content that will be expected from these market spaces. According to the press release, it looks like ARM has already gotten Epic Games and Unity Technologies interested in supporting their latest GPU

Devices using the new ARM Cortex-A73 and Mali-G71 are expected sometime in 2017, so there’s quite a gap between the announcement and the availability of actual silicon, but with HiSilicon, Marvell, MediaTek, Samsung Electronics and others having already licensed Cortex A73 IP. at least it means we have something to look forward to next year. You can find more details on ARM Cortex A73 and Mali-G71 pages, as well as ARM community’s blog.

Imagination Technologies introduced PowerVR Series7XT GPU family with up to 512 cores at the end of 2014, and at CES 2016, they’ve announced Series7XT Plus family with GT7200 Plus and GT7400 Plus GPUs, with many of the same features of Series7XT family, plus the addition of OpenCL 2.0 API support, and improvements for computer vision with a new Image Processing Data Master, and support for 8-bit and 16-bit integer data paths, instead of just 32-bit in the previous generation, for example leading to up to 4 times more performance for applications, e.g. deep learning, leveraging OpenVX computer vision API.

Block Diagram (Click to Enlarge)

GT7200 Plus GPU features 64 ALU cores in two clusters, and GT7400 Plus 128 ALU cores in a quad-cluster configuration. Beside OpenCL2.0, and improvements for computer vision, they still support OpenGL ES 3.2, Vulkan, hardware virtualization, advanced security, and more. The company has also made some microarchitectural enhancements to improve performance and reduce power consumption:

Support for the latest bus interface features including requestor priority support