Micron unveils new parallel computing architecture

Micron's Automata Processors will first be available as an add-in PCI Express board, with future versions to be installed as simple plug-in DIMMs just like RAM.

Micron has announced what it claims is a fundamentally new computing architecture designed with heavy parallelism in mind, using the paradigm of dynamic random access memory (DRAM) to speed the search and analysis of complex and unstructured data streams.

Dubbed Automata, the technology looks to all the world like a standard dual in-line memory module (DIMM) but where one would normally find DRAM chips the board instead plays host to a series of Automata co-processors. Based around a mesh fabric design, which sees tens of thousands - scaling to millions, the company claims - of simple processing elements being enabled or disabled to create a task-specific processing engine, the technology has much in common with field-programmable gate arrays (FPGAs) but with a focus on mining big data in record time.

'Micron's Automata Processor offers a refreshingly new way of solving problems that is very different from all other accelerator technologies,' claimed Srinivas Aluru, professor of computational science at the Georgia Institute of Technology and one of the testers of pre-release Automata hardware, at the unveiling. 'By deploying this in interesting ways, we have been able to solve a much larger instance of the NP-hard biological motif-finding problem than was previously reported, using the resources within a single Automata Processor board.'

The system achieves high performance by tackling computation from the opposite direction to a standard central processing unit (CPU): where the CPU applies a single instruction to as many chunks of data as possible before moving onto the next instruction, an Automata processor runs thousands of instructions simultaneously with a view to solving a given problem in the least amount of time.

Micron, traditionally a memory specialist, is keen to point out that its technology is no vapourware - starting with the admission that similar technologies, from tile-based highly-parallel many-core co-processor architectures to the aforementioned FPGAs, already exist. The company is to publish a technical paper describing the Automata processor design and development in the IEEE Transactions on Parallel and Distributed Systems journal - with a pre-release copy available here (PDF warning) - and claims to have taped out the first silicon with prototypes in-hand. The first engineering samples, the company claims, will be ready next year - although the company isn't yet sharing a date for mass availability.

Based on a DDR3-like memory interface, Micron has stated it plans to make the Automata processors available as either individual components for use in embedded systems or as DIMM-packaged boards. A PCI Express card with several Automata DIMMs on-board is also in the works, allowing developers to get started ahead of the technology's integration into future motherboards. A software development kit is expected to be released in the first half of 2014.

Micron has already listed a swathe of markets where the Automata could find use, ranging from bioinformatics and other data-heavy scientific fields to video and image analysis and even network security scanning. To help drive adoption of the technology, the company has also announced a deal with the University of Virginia on the first Centre for Automata Computing.

It'll be a while before mainstream users are clamouring for Automata support, however: the co-processors are designed specifically for data analysis and pattern-matching, and are unlikely to find much use in a gaming rig or office desktop.

This is awesome and would save enormous space in a workstation. I'm affraid about the price though, but I could be surprised as the R2500 ray-tracing accelearator is not expensive ($1000, and comes with an SDK).

I see more and more co-processor being announced / released, this is very exciting and maybe a little of the current boredom will fade away :D

Oh, the Caustic? I interviewed Imagination's David Harold about SoC GPUs for that Custom PC feature a while back, and we got onto the subject of the Caustic. Fun fact: the company is doing its damndest to get the technology down to the point where you can do real-time high-resolution ray-tracing in a mobile phone. And it's not pie-in-the-sky, either: we're talking within the next five years.

And while this is completely the wrong place for it, given it's totally off-topic, here's the interview snippet in question - the bulk of which was never published.

Quote:

Originally Posted by David Harold, Imagination Technologies, Interviewed 23rd January 2013We have two add-in boards under the Caustic brand, and they are focused right at the high-end, they are for creative professionals who are doing things like game content creation, architectural visualisation, product visualisation, and, you know, creating video - those kind of things. And those boards are on sale as of this month, and, you know, we think very revolutionary in the way they bring ray-tracing out of the, sort of, non-real-time space into a very immediate, interactive real-time workflow. So, that's the start of a progression for us, but it's not necessarily a progression back into the add-in boards space, although, obviously, they are add-in boards, and they're quite high-end add-in boards. What it really is is the start of us bringing ray-tracing out of the realm of movie making and the high-end creative arts and down into mobile and embedded. So, over the next few years it is our goal to get ray-tracing into console, tablet, mobile phone, all the markets in which, you know, you'd want to see it but in which it just hasn't been possible with previous ray-tracing technologies. So, we have an approach to ray-tracing which is unique and quite innovative, in the way that we had an approach to doing rasterised graphics which was unique and very innovative, and we think that, you know, using this, we can get ray-tracing into hand-held devices.

[Me: Okay. I mean, that's a big thing - because real-time ray-tracing has, in the past... It's always been something that's just around the corner...]

Yeah, it's the holy grail.

[Me: Yeah. I think, was it, id Software released a ray-tracing engine for the... I think it was the Quake 2 engine, a while back, which kind of, sort of, almost worked...ish. But... So, using this technology that you've developed at the moment for, as you say, the very very high-end, very expensive add-in boards - do you believe that in the next, say, five years, it will be possible to, kind of, take that technology and shrink it down into something that can run on a battery and run in a tablet?]

Yeah, so that's exactly the kind of timeframe we're talking about. You know, it's not going to be immediate, but over the next few years we have a roadmap that goes from this high-end space where we're shipping boards right now, down into consumer devices of varying kinds, initially some of those will be tethered devices - they might be consoles, or TVs, or what have you - but gradually down into the smartphone, and we think, you know, we're very confident that with the technologies we have, we can indeed get ray-tracing down into mobile.

[Me: Excellent. Well, that's something I'm very glad to hear, because I've been interest in ray-tracing technology for years - I mean, I used to use RayCAD on DOS and wait twelve hours for my tiny little 320x200 scene to appear on the screen, so it's something I'm very excited...]

You should check out some of the YouTube videos that we have up of the Caustic technology... Caustic ray-tracing technology right now, because you'll appreciate just what a revolution it is to actually get it down into something which is real-time and iterative, and you don't go away and get a cup of tea and go 'oh, god, that's not quite right, I need to tweak it and go away again.' You know, it really is, sort of, it is changing the professional workflow, and the way we're doing that in the professional space is based on a technology for doing ray-tracing which is much, much more efficient - and those efficiencies are what are going to allow us to bring it down into the embedded space.

A little insight into how transcribed interviews read *before* the editing process there!

On-topic: I'm super-excited about the current push back into co-processors. For some reason, GPGPU leaves me fairly cold - but some of the tile-based stuff coming from the like of Tilera and Adapteva, plus Micron's regex-accelerator... Interesting times ahead!

Quote:

Originally Posted by GuilleAcousticThis also reminds me of this FPGA expansion board

I would be very surprised if the suspiciously marking-free chip in the middle of Micron's prototype PCIe board wasn't an FPGA. The final product, however, will be literally just a bunch of Automata Processors slapped on a DDR3-shape DIMM with a dedicated Automata controller built into the motherboard and no FPGA in sight.

Originally Posted by Gareth HalfacreeOh, the Caustic? I interviewed Imagination's David Harold about SoC GPUs for that Custom PC feature a while back, and we got onto the subject of the Caustic. Fun fact: the company is doing its damndest to get the technology down to the point where you can do real-time high-resolution ray-tracing in a mobile phone. And it's not pie-in-the-sky, either: we're talking within the next five years.

And while this is completely the wrong place for it, given it's totally off-topic, here's the interview snippet in question - the bulk of which was never published.A little insight into how transcribed interviews read *before* the editing process there!

On-topic: I'm super-excited about the current push back into co-processors. For some reason, GPGPU leaves me fairly cold - but some of the tile-based stuff coming from the like of Tilera and Adapteva, plus Micron's regex-accelerator... Interesting times ahead!I would be very surprised if the suspiciously marking-free chip in the middle of Micron's prototype PCIe board wasn't an FPGA. The final product, however, will be literally just a bunch of Automata Processors slapped on a DDR3-shape DIMM with a dedicated Automata controller built into the motherboard and no FPGA in sight.

Thanks for this interview, this is very informative. I'm also very excited about the possible future of computing.

Back on the FPGA, they are almost everywhere when you need prototyping. It's very convenient but also very expensive when you need huge amount of transistor and blocks. Adapteva use an FPGA built inside the ARM SOC to interface with the parallella 16 cores co-processor. Each Parallella co-processor can then connect itself to another parallella co-processor to expand the array.

I've been thinking about using them in my Amiga revival project, especially the one you can reprogram on the fly. I thought that the idea of a morphable co-processor was great (zip optimized copro, then ray-tracing optimized copro, etc.), but pricing is the real brake there. If you want something powerfull, it costs several thousands of dollars per FPGA, but this is far less than having to create a die for each prototype :D

Originally Posted by GuilleAcousticBack on the FPGA, they are almost everywhere when you need to interface 2 chips. Adapteva use an FPGA built inside the ARM SOC to interface with the parallella 16 cores co-processor. Each Parallella co-processor can then connect itself to another parallella co-processor to expand the array.]

Aye - I've interviewed Adapteva's founder Andreas Olofsson several times over the years, including once just prior to the Parallella's launch. It's not a new move, though: back before FPGAs were a thing the big bear technology was Uncommitted Logic Arrays, or ULAs. The ULA was a pre-designed chip which could then be masked off in a variety of different ways to make it do different things. The result wasn't as small or as a quick as a true ASIC, but it was significantly cheaper and a lot smaller than the old method of using various individual components. A move to a Ferranti ULA was directly responsible for the shrinking of the number of chips found in Sinclair's microcomputers, in fact: the ZX80 had 21 individual integrated circuits, while the most common form of ZX81 - released just one year later - had but four.

Sadly, there was a consequence: the Ferranti ULA was designed to run only around half of its logic gates in any given design; to make the ZX81 do what it needed to do and yet still hit the street for under £70 (or under £50 if you bought the do-it-yourself kit version) Sinclair ended up running it at closer to 75-80% utilisation. As a result, the chips got hot and died early - the most common reason, alongside a crumbled ribbon cable for the keyboard, for a ZX81 to break.