Features maybe, performance never. Desktop cards drive large resolutions/multiple screens of the same, provide anti-alias and filtering and high-resolution textures. None of those are needed on a small screen that's just by itself.Reply

The features of modern mobile GPUs are very similar to high-end desktop cards, but obviously the power consumption and performance levels are still different. The environments that our GPUs go into are very different – power is always in short supply, and fans and exotic cooling systems are not appropriate!Reply

Does your GPUs have OpenGL 4.x support? And where are devices with Mali T760MP16?Because right now there is no mobile GPU that can beat performance of Tegra K1, and only T760MP16, and maybe PowerVR GX6650, can do that.Reply

In most of the markets our Partners are interested in, desktop OpenGL is not a requirement, so we haven’t bothered implementing it.

It’s not appropriate for me to pre-announce our Partners’ product plans, so I’m sorry I cannot talk about Mali-T760 design wins that haven’t made it into the public domain yet. However, you will see higher core counts designs coming in the future. You have to bear in mind that we only released the IP (RTL) of Mali-T760 at the end of October last year, so we are only just starting to see silicon chips based on it being publicly announced such as the Rockchip RK3288 (MP4), present in the Teclast P90 HD tablet.Reply

Many GPU vendors are investing in unified memory and eventually fully cache-coherent interfaces between the CPU and GPU. It is clear that such work is immensely beneficial for GPU compute, but is it useful for graphics workloads, either now or in some graphics pipeline of the future?Reply

Unified memory has been the common system design in mobile devices for years, whilst the desktop machines are only recently trying to move towards this. ARM’s Mali GPUs have been designed from the start for unified memory. The Midgard family of GPUs (from Mali-T604 onwards) supports I/O coherency whereby the GPU can snoop into the CPU’s caches, with support for full coherency coming.

Graphics workloads might initially thought to be essentially one-way in nature – that the CPU creates some data structures to be consumed by the GPU, to be worked on and then rendered to a frame buffer elsewhere. However, there are cases where cache flushing operations can be avoided by the use of a coherent memory system, with consequent power savings, so yes, it can be a benefit, even with graphics.Reply

I asked the following questions to Peter last time; but, as he pointed out then, he was a "CPU" guy. Thus, sadly the questions went without answer. Maybe this time is different :) In any case, I highly appreciate the time ARM experts are taking to address the community. Thank you! (and sorry for the long text following; it's a copy-paste from several questions-answers process of last time).

I would love to know ARM position towards open source GPU drivers now that Intel is putting a big amount of money and effort into developing theirs.

It seems to me that not taking the open source road for GPU drivers (as ARM is doing) is a big mistake. Furthermore, when the primary OS that host their hardware is running in Android.

ARM could create much better binaries for free with the work of other talented developers and get better integration with Android in the process. That will be a great selling point for any manufacturer and possible client!The Lima project has currently a better performance (5% more) than the close driver distributed by ARM and it is being coded in the developers free time! It would be great to see both teams working together.

What are the reasons the driver is close-source?- Is it that they would reveal a lot of Mali's internal core?- Is ARM afraid of IP sues? Having close source will deter patent trolls?Intel doesn't seem to have those problems.

PS: The question is not about whether the drivers should be open or not just because it is morally right or wrong. Obviously it would be nice for the clients and a good selling point, but I was wondering how ARM management see one of their biggest competitor embracing open source when developing GPU drivers.Reply

I have to wonder how long it will take ARM's legal department to review the driver's code and assess risks. Even if they have part of the blob which is legally binding, they can still open up a big chunk and start chipping in on the advantages of open source developing. There are already developers knowledgeable on their architecture.

ARM and other chip producers know how much of a pain is to support badly integrated blob of code. When an ARM customer have a problem with the driver, they have to communicate with ARM, wait till they receive an answer, wait till the problem is solved (if it is at all), and then integrate the new blob with their software. So many steps where you can fail! It takes so much time! Mobile products have a really short time-to-market. It would be so easy for everyone if they let their customer help out with the development. Plus, it is free!!

Samsung (ARM biggest customer) have been testing Intel's products for a while and I am pretty sure that by now they have some developers who know Intel's driver architecture. Don't you think one reason when deciding which platform to choose would be support and time-to-market?Reply

Thanks for the feedback. ARM has been a huge supporter of Open Source projects over many years, including a key role in the formation of Linaro. We continue to monitor the situation regarding Open Source graphics drivers, and we continually review demand for them from the Partners who license our technology.Reply

ARM would be a lot more friendly to FLOSS is there was at least one performant GPU that was open. That's a strong reason to open the MALI architectures. Not that it's good just for MALI, it's good for ARM.

I'm surprised to find that x86 works better for me than ARM. For just this reason.Reply

The strongest reason is the fact that mali is out on many SoCs, and that integration with other engines is absolutely paramount for good performance and power usage. DMA-Buf was pushed hard by linaro, and it does indeed solve a lot of problems, but cynical old me thinks that Jem Davies is very happy with part of this integration getting solved by dma-buf, and that the remaining integration will be swept under the rug.Reply

So basically, your answer to the underlying elephant question in the room that this feedback is getting at is "unless Samsung make a request for open source drivers for their Galaxy phones that incorporate Mali GPU's, developers on XDA will continue to be banging their heads against the wall trying to develop up to date Android builds for phones Samsung no longer cares about"

I really do understand your frustration and I’m sorry that this makes life harder for you and similar developers. We are genuinely not against Open Source, as I hope I’ve tried to explain. I myself spent a long time working on the Linux kernel in the past and I wish I could give you a simple answer. Unfortunately, it is a genuinely complex problem, with a lot of trade-offs and judgements to be made as well as economic and legal issues. Ultimately I cannot easily reduce this to an answer here, and probably not to one that will satisfy you. Rest assured that you are not being ignored. However, as a relatively small company with a business model that is Partner driven, the resources that we have, need to be applied to projects in ways that meet Partner requirements.Reply

Hi Jem, I just grepped a kernel tree (which does not include the commit history from the bitkeeper days), and the commit log, and i couldn't find you. I did find David, Liam, Tim, Steve, Jonathan, and even Craig Davis. Can you point out what linux kernel subsystems you worked on and in what timeframe?

About resources, I believe that we two communicated about this before. It included a pretty solid project outline and reasonably delineated resources, especially given the massive commercial success that ARM is enjoying these last "few" years.Reply

Jem, why not have redistributable binary-only drivers like nvidias linux-for-tegra kit? it would be a major step forward. Currently one cannot seriously consider mali for gpu compute projects, when the availability of non-android mali drivers depend on the goodwill of arm, soc vendor and product vendor.Reply

happosai - thanks for the suggestion, which is a good idea. In fact, it’s such a good idea that we do it!  The Open Source organisation Linaro, that we are a member of, host binary drivers, e.g. for the popular Arndale development board.Reply

If you go to the Linaro Engineering Builds (LEBs), the Arndale, Nexus 10 and now as of today the Juno BSPs are available with the Mali driver user side binaries. (breaking news, as the public announcement for Juno has only just gone out).

I should also mention that the Mali kernel driver is now part of the Linaro Stable Kernel (LSK) regardless of the user side binaries being included in the LEBs or not.Reply

Thanks. But these downloads seem to have a rather discouraging disclaimer - "Note: Ubuntu flavoured images released as Linaro engineering builds are not for production use, modification or redistribution, but are to be used for evaluation purposes only". It does not look like they are really redistributable yet. And this prevents the end users from having usable graphics drivers in the popular Linux distributions out of the box.

The Mali kernel module is not a problem, because it is a GPL licensed free software since a long time ago. The Linaro Stable Kernel (LSK) is nice, but not really necessary.Reply

Just wanted to thank you for addressing this question. I've done quite a bit of work on various Linux/Linaro project boards and the lack of video drivers has got to be the #1 problem. At one point I found the hardware vendor was having to modify the kernal to get the driver in (placing kernal support on the HW vendor instead of the community). Not good.

I don't personally care about source code for video drivers, but we certainly need available binaries.Reply

At the sunxi project, we have some mali-400 binaries whose origins are a bit grey, and we provided some framework, x driver integration and packaging around it. This is, for many people trying to use a gnu/linux (as opposed to android), the canonical source for useful binary drivers for mali-400 today.Reply

Jem, as a key member of the open source community around the Allwinner SoCs, i commented (on irc and our mailing list) on Allwinner joining linaro on some TV commitee thingie back in March. Allwinner had at that time started to more proactively work with the sunxi community, and we were hopeful for getting GPL violations solved and more documentation and code available. My cynical view was that this was all about to end with Allwinner joining (a part of) linaro. And it did. Linaro ticked the "Open Source" box for allwinner marketing, and no further attempts to appropriately participate with open source communities were undertaken.

I am glad to see that you too are hiding under the linaro umbrella here.Reply

San_dehesa, there seems to be little point to this, as our dear guest is squarely against open sourcing the mali. Please check out the end of my lima driver talk at FOSDEM this year for more details on this, there is even a whole section in a blog entry of mine (libv.livejournal.com/24735.html) :)Reply

Hello, Jem!It seems we have Xbox 360/PS 3 like graphical performance in some mobile GPUs. What do you think:1) What does prevent developers from making relative games on mobile platforms?1) When shall we see these games on smartphones and tablets?3) Is x86 with low energy consumption a serious competitor for ARM?Reply

Given the levels of performance now available, I don’t see developers being prevented from making great games on mobile platforms: both smartphones and tablets. The games companies are making some fantastic games available on these platforms right now. Of course, the performance will always be less than something plugged into the mains, liquid cooled, consuming more than 300 Watts of power, but often the games developers are producing multiple versions of games targeted at platforms of different capabilities.Delivering the best performance in the lowest power envelope possible continues to be a guiding design principle for ARM. While we take all competition seriously, ARM does not differentiate solely on power, but also a business model that enables an open ecosystem and fuels a rapid pace of innovation among our partners. Reply

Does your company have any plans to support audio acceleration?Obviously some workloads such as FFT, Reverb, EQ, etc.. are highly amenable to parallel computation. There were some efforts to run reverbs on GPU's but they never really went anywhere in terms of commercial products.I suppose the dream scenario would be to have access to the GPU in such a way as to have the equivalent of a Nord G2 or Symbolic Sound Pacarana right there within the tablet and without having to program in OpenCL(painful).

I do experimental sound/graphic design in different programming languages such as Max/MSP, Touch Designer, etc.. and we are always looking for more firepower and hope not to be shut out of the fun this round. The computations we need to perform are very similar to what the Financial Engineering/Data Science community is looking for.. we make use of all the same mathematics in algorithmic/generative/procedural composition and visualization.Not so much a question..just wanted to raise my hand and say - Yes, we are out here..in droves and want what you're selling! Reply

At ARM, we’re always looking at ways of doing things in a more efficient (lower power or energy consumption) way. It is a digital world, and most problems can be expressed in terms of 1s, 0s and algorithms. Our CPUs are designed to be as efficient as possible at the vast range of applications that get thrown at them but some applications are sufficiently different that we think there is market there for a domain-specific processor. A GPU is one example of that (as graphics is specialised) but we also produce video processors to do decode and encode of digital video.

GPU Compute is useful for a number of applications that can be expressed as massively parallel algorithms, and there are already use cases in mobile devices where our GPUs are being used to accelerate use cases. I do believe some people are already doing research into using GPUs for audio processing, and you’re not being shut out at all – quite the contrary. The commonly-used languages for this are currently OpenCL and Google’s RenderScript.Reply

Could you expand on what use cases the GPUs are being used to accelerate? For example, are they being used to accelerate web browsing, or what use cases do you see them being used to accelerate in the future?Reply

Thank you very much for doing this. Here are my questions.1) What is the die area of a Mali T760 core at 28nm?2) What is the peak power consumption of a Mali T760 core at 695MHz and 28nm?3) Why were SIMD 128-bit vector ALUs chosen over a series of scalar ALUs?4) What are your plans regarding ray-tracing in hardware?5) Will the Mali T760 support the Android Extension Pack?6) Can the Mali GPUs (and available APIs) support developer accessible on-chip memory.Reply

Thanks for your questions. I cannot address your first two questions as those are numbers we typically do not make public.

As far as your other questions go, happy to respond. In a blog I posted last year - http://community.arm.com/groups/arm-mali-graphics/... - graphics is a really computationally intensive problem. A lot of that graphics-specific computation consists of vector-intensive arithmetic, in particular, 3*3 and 4*4 vectors and arrays of floating-point (increasingly 32-bit) arithmetic.

The Midgard architecture (our GPUs from Mali-T600 series onwards) have some ALUs that work naturally on 128-bit vectors (that can be divided up as 2 64-bit, 4 32-bit or 8 16-bit floats, and 64, 32, 16 and 8 bit integers (we also have some scalar ALUs as well). All architectures chase the target of unit utilisation, and there are several approaches to this. We have taken the vector SIMD approach to try to gain good utilisation of those units. We have a heavily clock-gated arithmetic unit, so only active SIMD lanes have their clock enabled; and only active functional units have their clock enabled; and hence a scalar-operation-only instruction word will not enable the vector units.

Scalar warp architectures can perform better on code that does not have any vector code (either naturally vector or through a vectorising compiler). However, the big disadvantage of scalar warp architectures is that of divergent code. Whereas we effectively have a zero branch overhead in Midgard, warps struggle with branching code, as they cannot keep all lanes of the warp system occupied at the same time if not all warps are executing the same branch of divergent code (they effectively have to run each piece of code twice – one for each side of the branch). Other disadvantages are if the warps are accessing data that is not in the same cache line, and in the end, optimising for warp code can mean optimising with detailed understanding of the memory system, which can be more complex to get one’s head around than optimising for a more obvious construct like a vector architecture. In the end both types of architecture have advantages and disadvantages.

Looking at compute code (or at least that used in GPU Computing), if code can be vectorised, either by having naturally vector-style maths, or by unrolling loops and executing multiple iterations of the loop at the same time in vector units, then it should be very efficient on vector architectures such as Midgard. We tend to find that image processing contains a lot of code that can be parallelised through vectorisation.

Ray-tracing is an interesting question and a very fashionable one. When deployed in non-real-time renderers such as are used to generate Hollywood CGI-style graphics, the results can be awesome. However, the guys doing that don’t measure performance in frames per second, they measure in hours per frame (on supercomputers). Ray-tracing is a naturally very computationally intensive method of generating pictures. For a given computational budget (a given power/energy budget) you will usually get better-looking pictures using triangle/point/line rasterization methods. To be clear, we have no problem with ray-tracing, but we don’t see its adoption in real-time mobile graphics in the near future. There is also the problem that there are no real standards in this area, which means developer adoption is further hampered.

We will be supporting the Android Extension packs on all our Midgard GPUs.

As a tile-based renderer, we have tile-buffer memory inside the GPU (on-chip) and we are providing API extensions to access this and working with the industry to get standards agreed for this that are non-vendor-specific so that developers can adopt them with confidence. We have presented papers on this at conferences such as SIGGRAPH.Reply

Recently attention has been called to ARM's poor OpenGL drivers on mobile devices (ranked "Bad" by Dolphin devs), even when compared to Nvidia/AMD/Intel on Windows where OpenGL support is not a main focus of driver development.

Is there anything being done about these unsatisfactory OpenGL drivers?

I would also be really interested to hear more about this. It seems like GPU drivers on ARM devices are serious source of frustration for developers, and given that they can do things like reproducibly crash devices from user mode, probably a fairly serious security risk as well.

In a post-PC world, in the sort of devices that our GPUs are built into, desktop OpenGL is of very limited relevance, and so we choose not to support OpenGL. We support OpenGL ES, OpenCL, Google’s RenderScript, and Microsoft’s DirectX.Reply

Thanks for the comments. We thought that the headlines on the Dolphin post were mainly complaining about missing desktop OpenGL drivers and open source drivers/documentation, which I have addressed already. Whilst I am sure that all comments about bugs in drivers reflect real frustration on the part of developers, we have also received plaudits from our Partners for the support we give and the quality of our drivers. We scored highest in an analysis of our OpenGL ES compiler, for example. If you look at the conversations on our developer community website (http://community.arm.com/groups/arm-mali-graphics)... you can see many people getting their comments and questions addressed, so I question whether the perception that you describe is completely widespread. However we do take it seriously.

One thing we are working on to improve the experience for developers is to improve the speed with which new, improved drivers are rolled out through our Partners. We understand there is an issue there and delays in that process do cause irritation to developers and we are working closely with our Partners on this issue.Reply

... And that is exactly where not having open source drivers hurts you, as it has resulted in loads of unsupported and thus broken devices on the market and slower than needed update cycles. It is a shame ARM still doesn't get that innovation and time to market advantages of open source are important for its ecosystem and the only thing that might give it a chance against Intel.Reply

As I've heard and seen these arguments at several industries before, let me take a stab at what I guess are some of the arguments against making the drivers open source at ARM:* "It gives away our intellectual property"That's not a concept. Copyright and patents are. You don't give either away. They are actually extremely well protected: IF somebody would try to patent something you're doing, the code can prove you did it before. Pointing to an online repository with check-in date of the code works far better than proving you had it somewhere internally. You think you give away some unique, special coding tricks to your competitors? News flash: you're not that unique or smart. There's about a 0% chance you do something super-special that is worth protecting. Hope I didn't crush an ego or two.* Our competitors will copy our code and use itStill thinking you are special and smart? Guess how often this argument comes up in these discussions and guess how often it happens in practice. Answering 100% for the first and 0% for the second will give you bonus points. The reason for both: engineers always think they are smarter than other engineers. And they are always wrong. Collectively, ARM engineers are not smarter than the collective intelligence of people who do NOT work for ARM. This, generalized, is why open source always wins. You really think that going over all your code, hoping to extract a super-brilliant bit that just happens to fit exactly with their GPU architecture is something a competitor is going to throw resources at? If they would, you should be happy about it, as it would be wasting their time and money.* It is a big investment to open source things, you have to go over every line of codeDude. Really? Who told you that? Your lawyers? Like they are worth listening to. Lawyers always take the easy road out. Everything closed and ideally secured against alien invasion. That is how you kill companies. Reality might not be fluffy bunnies but it isn't what the lawyers say either. Just hire some people who actually know what they are talking about. Heck, I bet you already have those. Some auditing will be needed, a few lines of code from a supplier will be accidentally open sourced (which will prove to be utterly uninteresting) and nobody will get hurt.* Our code is uglySo? Nobody cares, and if they do, tell them to (help) fix it.* It is a security riskOnly if you still live in the 80's and follow its security philosophy. It's 2014, so repeat after me: "security through obscurity DOES NOT WORK".* It is hard work to work in the open and it slows us downIt will take a little getting used to and you'll have a few project managers who still don't get it barking at you but reality is: open source is FASTER. Very much so. Linux kernel anybody?* We don't need itYeah, that's what Google said when they effectively forked the Linux Kernel with their first Android version. They don't have the excuse to claim they are 'just a small compay', yet even they could not keep up with development and had to come back to the kernel community with their tail between their legs. You really think you can do better? There's that massive ego again!

I could go on for a while, but I'm hoping you have some competent people who already explained all these things to the naysayers - and the latter just don't have the insight to get it, claiming things are 'more complicated' and all that. I've seen it before. And I get the conservatism - better do it properly than half-way (look at how sun screwed up). But mark my words: either ARM opens up, or it goes down. It really is 2014 and especially small players like ARM have to collaborate and stand on the shoulders of giants to be able to really compete in the long run. Especially with players like Intel around - which, for its size, is a surprisingly agile and smart company.

While its nice to hear that ARM will be improving its Direct X drivers should Windows on ARM ever take off, that is likely to be of limited consolation to the Dolphin project, the various game developers I see on reddit complaining about ARM's poor driver quality on Android, or those Google developers linked above :)

What I am curious about is what ARM will be doing in the meantime to address the perception that its GPU's difficult to develop for due to poor driver performance, limited documentation, and buggy implementation of many features relative to Nvidia/AMD/Intel? Does ARM believe that there is a need for improvement, or are these developer concerns overblown? And if so, will they be taking steps to improve the situation? Or at least to improve communications with developers?Reply

Imagination views OpenCL full profile and features like fp64 support and precise rounding modes as unnecessary and an inefficient use of transistors on mobile. Do you have examples of common mobile use cases that OpenCL full profile would enable over embedded profile?

OpenGL ES 3.x explicitly did not include geometry shader and tessellation support presumably due to die area, power consumption, and market need concerns. However, many mobile GPUs are implementing full desktop DX11 support in hardware anyways. Is geometry shader and tessellation support die area intensive? Ideally we'd get both of course, but based on current and near future market needs, are game developers better served by devoting transistors on extra features like geometry shaders and tessellation or would those transistors be better used to add more ALUs for more raw performance?

Are Mali GPU owners getting the latest graphics drivers effectively? If not, is this a major impediment to game developers and users and how can this be improved?

Low-level graphics APIs are all the rage now. What is ARM's take on this? Can OpenGL ES be improved to increase efficiency and reduce CPU overhead through just writing better drivers, new extensions, or standard API evolution as part of OpenGL ES 4.0? Or is a new cross-vendor, cross-platform API needed? And with most mobile GPUs implementing the latest desktop feature-set is it necessary any more to have dedicated desktop and mobile graphics APIs or should they be streamlined and merged?Reply

We don’t regard tessellation support as area-intensive. The Midgard architecture is naturally flexible and supports tessellation without the need for additional tessellation units.

Low-level (or “low-overhead”) APIs are indeed very fashionable right now. These APIs do away with a lot of state-tracking and error-checking so that black-belt Ninja programmers can (in some cases) gain greater performance with code they can trust to be correct. Of course this only applies to code that is limited in terms of performance by the performance of the driver rather than by the GPU itself. Well-structured code usually drives the GPU flat out and is not limited by the performance of the driver.

Obviously I cannot comment on what is going on in the confidential discussion inside Khronos, but I would say that there is a great deal of interest across the industry in a modern cross-platform API and ARM is playing a central role in these discussions. It needs to be cross-platform as developers want to gain maximum return on their investment of writing code. They would obviously prefer to write it once rather than write it multiple times for multiple platforms.

As to merging desktop OpenGL and OpenGL ES, the same pressures exist – that of developers wanting common APIs.Reply

How well do the 450 and 760 MALIs scale with the core numbers?, Are there any 760-8 and 760-16 chips underway out there? Last one: how much thermal throttling is to be expected in real-life situations for either a 450MP8 and a 760MP8, i.e. in a waterproof phablet/tablet? Reply

Whilst we cannot comment on the chips our Partners are building until they have announced them, you will certainly find chips coming with the higher numbers of cores. Graphics is an easily-parallelisable problem, so the scaling that tends to be seen is fairly linear up to those sorts of numbers, providing that the memory system can supply the necessary memory bandwidth.

Any thermal throttling is so bound up with the Partners’ implementations (what silicon process, style of implementation, numbers of cores, target frequency, thermal controls, which CPUs are paired with the GPU, case design, etc.) that it is really hard for me to give a useful answer. Sorry!Reply

As I mentioned to bengildenstein previously, ray-tracing, is a great technology that can produce fantastic-looking images when rendered on supercomputer render farms. However, it is a long way from being deployable as real-time in mobile devices: the power requirements are too high as it remains a bit of a brute-force method. However, it is easy to confuse ray-tracing (a solution) with Global Illumination (the problem). Global Illumination (GI) is the problem of trying to add more realistic lighting to 3D scenes by taking into account direct and indirect lighting. This is a really important issue in providing more compelling, realistic images in graphics content.

In order to address GI in mobile, towards the end of 2013, ARM acquired Geomerics - another Cambridge-based company that have created some fantastic technology to address GI. Their product Enlighten is the industry’s most advanced dynamic lighting technology. It is the only product with proven ability to deliver real-time global illumination on today’s and tomorrow’s consoles and gaming platforms. Enlighten is behind the lighting in best-selling titles including Battlefield 3, Need for Speed: The Run, Eve Online and Quantum Conundrum and Enlighten has been licensed by many of the top developers in the industry, including EA DICE, EA Bioware, THQ, Take 2 and Square Enix. Their technology is very lightweight, and is optimised for all games platforms, including mobile. We are very happy to have them on board, and they are starting to influence our future technology roadmap already.Reply

I hope i'm not too late, or too direct. I would like to know if you guys are coming out with an HSA product anytime soon? You cover such a wide market, a product from you might get the wheels rolling on the software side.Reply

At this time I cannot provide any information on ARM’s GPU roadmap beyond what's on the chart Anand posted to kick off this thread.

However, ARM has been at the heart of the HSA Foundation: we helped build the Foundation and bring in some of the members. We’ve also been very involved, putting our best experts (CPU, GPU, memory system, runtime APIs etc.) into the Working Groups to help define the specifications that are being announced.Reply

Adapting Mali GPUs to be more suited to other markets is something we look at all the time, such as adding high-reliability features to make it more suitable for servers, High Performance Computers (HPC, or so-called supercomputers) and industrial. There’s no technical reason we couldn’t decide to go there, it’s just a decision to be based on cost and reward. Adding ECC/parity to memories within the GPU is a pretty well-trodden path – the guys in the CPU group have already done this for their CPUs.Reply

Why is it that the large players in desktop-class GPUs (Nvidia, AMD) have not been able to differentiate their graphics on the mobile platform? Is it because graphics technology is largely a commodity today? Or is their IP somehow specifically focused away from the mobile market?Reply

GPU computing seems to be both capable of providing huge boosts where it applies, and pretty specific so far in the areas to which it applies. Are there any particular use cases where you see the GPUs on the Mali roadmap or any other future mobile GPUs providing a big boost?

At least one other company's talking a lot about making GPU and CPU address a single pool of memory and generally work together more closely. Interesting direction to ARM or no?Reply

GPU Computing provides good boosts in performance efficiency for certain types of workload. Usually that workload will be typified by lots of computation across very large datasets, and where the computation can be highly parallelised. In other words we can exploit very large levels of data parallelism through thread parallelism (we have the capability of issuing thousands of threads. Typically, the code also exploits the high levels of floating-point capability found in modern GPUs, though actually for our case, that’s not always necessary in order to gain a performance efficiency advantage.

It is possible to imagine a very wide range of possible applications that would probably benefit from GPU Computing. The ones that we most often see in our ecosystem are image processing in its many forms: computational photography, computer vision, image clean-up such as sharpening, denoise, filtering, beautification etc. etc.

Another use case that has been used in products being announced by many of our partners is using GPU Computing to accelerate software codecs for video, e.g. where a new standard such as HEVC or VP9 comes out, but where the platform does not support that particular standard in hardware (yet). I think this is a useful example that shows the framework of decision making. As I said in a previous answer, it *is* a digital world: it’s all 1s and 0s, and the problems to be solved can be expressed as an algorithm which can be expressed in C code, Matlab, OpenCL, RTL or in many forms. You can write C code and run it on a CPU. You can write code in various forms (possibly low-level) and run it on a DSP. You can write it in a parallel programming language like OpenCL and run it on a compute-capable GPU. You can design your own special-purpose hardware, write the RTL for it and get your chip manufactured. Because we work in many of these fields, we can take a fairly balanced view of these options. All come at various levels of difficulty, levels of efficiency and quality of development environment, timescales, development costs and ultimately power efficiency. What will be “best” for one Partner in one environment will not be best for all. Some won’t want to dedicate hardware to a rarely-performed task. Some will need to spend hardware (silicon area = chip cost) on something because they need the power efficiency. Some want the Time To Market advantage of doing it Now in software so that it can be deployed on today’s platforms while the hardware catches up later.

As to making CPU and GPU addresses unified into a single address space we have been doing this from our very first GPU. We are also working towards a shared virtual memory environment as well, where the MMUs in the CPU and the GPU can share the same set(s) of page tables.Reply

ARM processors are becoming a great way for home hobbyists to create their own NAS, media server, or energy efficient desktop, but while I see new video codecs being implemented, some of the old ones like DIVX seem to be ignored. Any chance on implementing some of the older codecs? It would also be nice to see better Linux support as well.Reply

My two cents?1) Why we still waiting such long time for beefy desktop ARM processor (with TPD 30W or 60W ) and even it will appear?2) Will this beefy piece, have standart number of cores (4,8) or would be cluster of relatively low performance cores (32,64+)?3) Would be possible to use existing PCI-E PC cards together with ARM process? Or for desktop ARM we need brand new HW?4) Did you try some experiements with x86 virtualization or emulation or ARM, sooner of later we will need it.. i dont believe that every piece of software will be recompiled for ARM, there is huge x86 legacy.Reply

Also, what are the chances of a bigger chip with a bigger GPU? Probably not the right person to contact (but no one is! :-( ) but has ARM considered a Windows RT laptop, PC (think NUC), or set-top box? Everyone's screaming Android to the rooftops but I'm still clamoring over the potential of an RT laptop to get ~20 hrs of battery life.

Clincher: could such a device support Steam streaming? :D :D :D

Man I worked hard to work in legitimate questions :P Thank you for putting up with us!Reply

I’m sorry, perhaps I misunderstand the question. Mali-V500 (released last year) supports up to 4k resolutions and up to 120 frames per second (4kp120). We have a multicore architecture. A single core can do up to 1080p60 (depending on silicon process), and by laying down multiple cores (up to 8), the higher frame rates and resolutions can be achieved.Reply

Khronos internal discussions remain confidential (Google is a Khronos member), but no, I don’t think so. I don’t believe I am breaking any confidences by saying that the feature list for OpenGL ES 3.1 was agreed by the parties in Khronos and was obviously the usual compromise between features and schedule. The OpenGL ES 3.1 API was announced as ratified back in March and I don’t think that Google have announced a firm date for the inclusion of the Android Extension Packs yet. All Mali Midgard GPUs will support both Extension Packs.Reply

Thanks for your comments – it’s always good to hear other viewpoints. We believe the selling point of ARM GPUs is to support the wide range of silicon Partners across the industry, not just a single customer (we have licensed over 55 Partners). We support the features required by our Partners in a power, energy and silicon area-efficient fashion.

ARM have already announced the availability of OpenGL ES 3.1 drivers (they have already passed conformance). Obviously I cannot commit to when our Partners will release them, but I assume they will be competing to get them out there as soon as possible.

The aim of OpenGL ES 3.1 was not to require any new technical features over OpenGL ES3.0, so from a technical standpoint, any GPU that supports OpenGL ES 3.0 should be able to support OpenGL ES 3.1. All ARM Midgard GPUs will support OpenGL ES 3.1.Reply

For GPUs, we have a two-pronged roadmap (so far). The two approaches reflect the requirements of our Partners. Those producing so-called superphones are almost inevitably thermally constrained. The actual thermal limit will vary from licensee to licensee depending on their implementation and silicon process, and will vary from OEM to OEM depending on the ability of the case design to dissipate heat (e.g. whether it is an aluminium or plastic case), but for most of these partners, they are looking for the maximum performance within a given power limit (frames per second per Watt)

The other approach is for a large majority of our Partners who are making more mid-range mass-market devices (not just phones, but also tablets, DTV and STB). They care most about cost (equals silicon area). So they want the maximum performance within a silicon area allocation (frames per second per square millimetre).

Of course, everybody cares about cost and everybody cares about power consumption, so it’s not a complete either-or situation. We always try to produce GPUs that are good in both areas, but the two prongs reflect the biggest careabouts for our Partners.Reply

Today, our work is focused on producing ready-made designs that our Partners then configure to suit their needs from a range of options (essentially the same as licensing CPU designs). There is nothing inherent within our architecture that would prevent us from pursuing architectural licenses but it’s not a focus for us in the near term. If this were something we chose to pursue, a GPU architectural license would fundamentally differ from a straight CPU license due to the large software component. Reply

1. Does it make sense to have DDR5 or soemwhat faster memory bandwidth on the high end tablets/phones? Here I talk as a real novice on this field.

2. Without asking design wins, do you think it would be possible to make a entry-level laptop using just ARM technology? (Please give even a PR like of answer, but as for me I've seen just locked in laptops for now: Chromebook/Windows RT-Tegra tablets/laptops)

2. Is on Android's VMs on ARM GPU emulation accurate? How would you recommend to test/fix GPU driver bugs if they are reported by someone using Mali GPUs?Reply

Memory technology is moving on quite quickly. Most of our partners are using LPDDRx, as it is lower power, but some do use DDR as well, particularly in tablets. Graphics uses large amounts of memory bandwidth and while we design our GPUs to be as efficient as possible in terms of their bandwidth usage, if the content displays complex images at high resolution and high frame rate, then the bandwidth used can be very significant indeed. As a consequence, GPU performance can be throttled if it doesn’t get the bandwidth it needs.

I’m not sure what you mean by locked-in. ARM-based laptops running Linux have been around for many years, and there are ARM-based Chromebooks available today. I believe the one based on ARM CPU and ARM Mali GPU is still one of the most successful to-date…

Mr. Jem, with your position in the mobile GPU field, you should feel a special privilege since mobile GPU is the fastest advancing technology in history.I have calculated that if the current rate of performance growth of mobile and desktop graphics would remain unchanged, the mobile GPU would actually catch up with desktop by 2025, and make it ridiculously obsolete in 2026.

Now, if we agree that this theory can never actually happen, the obvious question arises- for how long do you think this phenomenal advancement can continue?How much room you think is left for development?Do you see mobile GPU's soon facing limitations similar to those we see in the desktop space? Reply

Thanks for the comments. Yes, I think I am really lucky: this is a truly great job. I get to work on some really cool technology and I have the privilege of working with some of the best people in the industry.

Actually, I don’t think it’s that useful to compare mobile GPUs with desktop GPUs. We’re all engineers: we get a problem, we explore round the problem, we see what the design constraints are, we understand the technology available to help us solve that problem, we invent some new stuff, and we attempt to solve the problem. That’s what we do. The desktop guys were given a different problem to solve: the best possible graphics within 300-400 Watts of power and lots of silicon. We’ve picked a different problem: a different power budget, and rather than making consumer products, we’re making component technologies to enable our Partners to make different, great consumer products. Obviously, I think our problem is the most interesting problem to solve (that’s why I am here), but there are great and interesting challenges ahead for the guys in the desktop area as well, and there is no way that they are standing still: they continue to innovate every bit as hard as we do.

Roughly, roughly, the “customers” (very approximate usage of the term) want a doubling of “performance” (again, approximate) every year. I know, incredible but true. Will that end? I think not for some time. We do not have visual realism yet in graphics. If you watch a CGI movie, you are fooled. It looks undeniably real, but if you ever freeze frame and look closely, you can usually see that it isn’t real within a few seconds of examining the screen. We just don’t have the capability to produce genuine realism yet (and those frames of images probably took 8 hours or so each to render on a state-of-the render server farm, with immense compute power). So, we have a hard technical problem to solve, and continuing consumer demand to have that problem solved. Consumers (rightly) don’t give a stuff that we don’t have the same compute horsepower available in mobile devices or televisions: “Not my problem” they say. Consumers have seen CGI movies; they have played high-end games on consoles; they have seen great-looking naturally compelling and fluid user interfaces. Most don’t know that they want higher resolutions so they can’t actually make out the individual pixels, or higher frame rates so that the visual effects seem smooth and buttery”. Most don’t know anything about high dynamic range, or color gamut, but if you show them one device that has some of these new technologies and one that does not, that will be a quick and easy choice for them. So, we have a clear and pressing roadmap of demand from customers that will continue for some time.

Advancements come in many different areas. For some years, we have had a bit of a free ride from Moore’s law, whereby (wildly paraphrasing, and you really should read up on what he actually said) every year we can have more frequency (performance) and more transistors for the same amount of money spent by the silicon Partner. There has been much speculation about the end of Moore’s law, and it would take a whole blog entry for me to write about that, so let’s just agree that the effect is *changing* and that we won’t get as much of a free ride in future. APIs are changing as well. There have been some questions here about the impact of Mantle, and Metal, and the way that this may change the world of graphics and the way that the content is changing. Display standards are changing as well. We have new physical interfaces, and we have Display Stream Compression, but we still largely still pretend that we’re talking to a CRT and shift data out serially in sequence: that will change as well.

One of the technical trends that interests me is the cost of computing a bit versus the cost of transmitting a bit. What do I mean by that? Well, if we look at video standards like HEVC and VP9, they are becoming computationally more complex to achieve better compression ratios (or better quality for the same bit-rate - it’s the same thing). We have traded extra compute requirement (at each end of the “wire”) for lower transmission and storage costs. It’s the right thing to do because FLOPS are trending towards free, whereas storage and transmission are not reducing in anything like the same way. One of our memory experts gave me the “10,000 FLOPS for the energy cost of loading a cache line” catch-phrase the other day. Of course there are more caveats in that than you can imagine: it depends entirely on the silicon process, the size of the cache line (if you ever want a really good argument, get a bunch of processor and memory system engineers in a room and ask them what the ideal cache line size is), the type of RAM and a whole host of other things, but as my boss used to say “It’s roughly right, even if exactly wrong”. So, we have some really interesting technology trends that are affecting the way in which we design the graphics, video and display processors. I think there is plenty of room left for development.Reply

Have many of the original Falanx employees stayed with ARM Norway? Does the current (core) staff in any way resemble the old one?How does the communication happen with ARM in UK?Who decides on the future path of Mali? ARM Norway?And one last question, why did the choice fall on Falanx in 2006? Back then there where many options. I would assume that even UK based Imagination Tech wasn't out of reach for a takeover.Reply

A lot of the original Falanx employees stayed with us. Even the CEO (who you might expect to go off and do another startup stayed for around four years (before going off to do another startup - and we remain in touch, he’s still a friend of mine). Two of the four founders remain with us today, including the chief GPU architect (who is now also a Fellow in ARM). However, the original company was 25 staff. We now have over a hundred and twenty people in Trondheim (Norway), and we have had to move offices twice since the takeover as we outgrew the premises! So, it’s changed a lot in some ways, but stayed very much the same in others.

Communications take place at all levels. We use email, we use Skype, we use audio conference calls, we use video conferencing, and we also put people on planes (sometimes there is no substitute for being in the same room and I haven’t yet seen the electronic whiteboard that can work as well as a napkin when in the same restaurant!). Of course it’s not just Trondheim<->Cambridge. We now have development being done in Cambridge, Trondheim, Lund (Sweden), San Jose (California, US), Shanghai (China) and Hsinchu (Taiwan).

The future path of Mali gets decided at different levels. ARM encourages decisions to be made at as low level as possible. Mostly that’s engineers making design choices: they are the best-qualified and best-placed to do so, and they do a great job of it. At a higher level, there’s a group of architects, and I suppose I get a lot of say on the future technology strategy, but I have great experts around me to advise. There are also our colleagues on the commercial side, who give us very useful advice on that side of things. Last, but by no means least there are our Partners. We have long-term and close relationships with a lot of our Partners, and we make sure our engineers get good contact with them as well as the commercial people. They trust us and tell us what they need, where they think improvements can be made, they advise us on trends coming our way that we might have missed, you name it. They are (rightly) a very demanding group, but the impact they have on our thinking and our roadmap is very significant.

I was part of the team that performed the initial selection process, which actually started in 2005, when it was clear how important graphics and other elements of visual computing were going to become. We did indeed, look at a lot of other companies back then, and we were confident then that we got the best choice, and I remain equally convinced to this day. Having been part of the teams that have done a number of the Media Processing Group in ARM’s acquisitions, I know that the selection process can be complex and wide-ranging. Of course there is the technical evaluation, but also, you are selecting critical people. You have to ask “Can I work with these people?”, “Will they enjoy working at ARM? (Unhappy people don’t work well)”, “Can we and they take what they have and turn it into what we need?”. You are not just buying technology into ARM, you are buying people (hopefully great people), who will become part of ARM’s future. If we invest in them and develop them, what other things can they do for us? Will they make our next generation of ideas that will be technically and commercially successful? Brilliant companies are all the time looking for ways to make the company better than they are already: “Do they have someone who can take my place?”, for instance.Reply

hi jem, your latest Juno development platform is rather a let down, i suppose its because of the focus on the industrial customers rather than ever focusing on the real end users that pay for the products they want to buy and that's shame...

the fact this does not have even USB3 never mind USB3.1 functionality given the IP is available now, and the fact its only got a modest Mali T-624 core seems wrong for a new PCB for development work, i also question the wiseness of not using your best generic CoreLink CCN-508 interconnect to date here, how can we assess you best IP if we cant buy it in a functional cost effective Development Board (ODROID/NVIDIA Denver Tegra K1 PCB etc style).

what are your expectations for this "Juno" Mali T-624 with slower CCN NoC with two Cortex-A57s, four Cortex-A53s, and no apparent options to use lowest power and fastest WideIO2 or HMC , or a way to even bring the CCN to the IO ports for best 3rd party expanability ?, people as in the actual end consumers want fastest Cortex IO everywhere not some limited USB2 that should be end of lifed in todays consumer markets.

are arm ready to provide what we the end consumers want to buy (USB3+,fastest IO ports, fastest 3d [MRAM) ram options, etc) or will the oEMs always win favor... providing these in dribs and drabs well after other pratform's have them as options, oh well... Reply

The Juno platform is not designed to be a performance platform but rather a platform to enable the transition to 64-bit on ARMv8-A targeted to software developers. The Juno boards are simply a small part of a larger amount of work ARM is doing with our developer community today to enable a rapid transition to 64-bit computing. They will only be available to qualified partners and developers.Reply

In the Midgard instruction set, what are the even-numbered bits starting at 18 of the ALU control word used for? We know that the odd-numbered ones are for controlling which ALU's in the pipeline are used, but it seems rather mysterious that only every other bit is used.Reply

Hi everyone - I just wanted to say thank you for the opportunity to answer your great questions this week. If you ever have any questions re: Mali or anything else related to ARM, the fastest way to get them answered is to follow @ARMPROffice and submit your questions via Twitter. Reply

Since Mr Davies is apparently no longer available, here are my views as the lima driver developer, the person who started the open ARM GPU drivers movement, the pesky hacker who pointed out the flaws in RPi foundation "open source" press release, leading to a proper open source strategy there, and one of the key players in open sourcing AMD Radeon graphics hardware.

Yes, free drivers would mean greater value to shareholders.* Marketing: at the time, ARM would've been the first to embrace open source drivers, and this would've given them a nice marketing boost. Broadcom has now ran away with that. Then there is the fact that the competition has free drivers. First and foremost, intel, which has a massive and growing team of open source developers. ARM today doesn't hold a candle to that.* ODM and SoC vendor satisfaction: Open source drivers, when done right and with sufficient support (which is still cheap compared to the binary only route), would severely reduce the integration and legal overhead for device makers and SoC vendors. The ability to grab a set of open source projects immediately, versus having to deal with account managers and lawyers and NDAs, and the ability to report issues directly and openly, and immediately profiting from advances or fixes from the rest of the community would severely reduce the overhead of bringing up new platforms or new devices. An open source strategy directly benefits ARM customers and will lead to increased sales and returning customers (who are less likely to jump ship, like Allwinner just did).* Consumer satisfaction: Cyanogenmod is the perfect example here. After market support for mobile devices is important and growing in importance, it won't be long before consumers buy devices based on prospective aftermarket support. Open source drivers are a key enabler for cyanogenmod, and would severely improve the quality of CM while reducing overhead.* While not as dominating a reason, alternative uses of ARM SoCs (like the raspberry pi is showing us, and like the OSHW boards from olimex) are becoming more important. These development boards allow a wide array of projects, and will spearhead all new developments, perhaps opening up whole new markets. Binary only drivers currently severely limit the feasibility or practicality of such projects.* ARM is going to try to muscle in on the server market. Their 64bit processors will function more like a PC, in that you will not require a board specific bootloader anymore. Lack of open source drivers will be holding this platform back, just like it did for AMD back when it bought ATI.* Then there is topics like GPGPU/Compute and many other corners which my synapses cannot recall off hand...

None of the above has so far convinced Mr Davies, neither has the fact that an open source strategy based on my lima driver would be highly credible, up and running quickly, and very cheap (6 manyears/year is what i projected initially). But Mr Davies has repeatedly rejected the idea of an open source driver for the mali family, even though my information is that the engineers in his department tend to not agree with him on this.

With Broadcom now doing a proper open source GPU driver, the question really becomes: Does ARMs lack of free drivers hurt shareholder value?Reply

Eg. on desktop side we see cross adoption of vendor specific extension/features when it make sense, in open source drivers. (VDPAU for video which is Nvidia child, AMD_performance_monitor for performance counter reporting, etc.) Thats good as it prevent Vendors from harming their own hw by limiting sw capabilities.

(And really determined partners can just hire somebody to implement what they really need, or thing would give Mali competitive edge, no reliance on ARM blessing, which is good to ARM in the long run)Reply