Posted In:Super Computing Archives - AppFerret

IBM Research announced Tuesday (Oct. 24, 2017) that its scientists have developed the first “in-memory computing” or “computational memory” computer system architecture, which is expected to yield 200x improvements in computer speed and energy efficiency — enabling ultra-dense, low-power, massively parallel computing systems.

Their concept is to use one device (such as phase change memory or PCM*) for both storing and processing information. That design would replace the conventional “von Neumann” computer architecture, used in standard desktop computers, laptops, and cellphones, which splits computation and memory into two different devices. That requires moving data back and forth between memory and the computing unit, making them slower and less energy-efficient.

Especially useful in AI applications

The researchers believe this new prototype technology will enable ultra-dense, low-power, and massively parallel computing systems that are especially useful for AI applications. The researchers tested the new architecture using an unsupervised machine-learning algorithm running on one million phase change memory (PCM) devices, successfully finding temporal correlations in unknown data streams.

“This is an important step forward in our research of the physics of AI, which explores new hardware materials, devices and architectures,” says Evangelos Eleftheriou, PhD, an IBM Fellow and co-author of an open-access paper in the peer-reviewed journal Nature Communications. “As the CMOS scaling laws break down because of technological limits, a radical departure from the processor-memory dichotomy is needed to circumvent the limitations of today’s computers.”

“Memory has so far been viewed as a place where we merely store information, said Abu Sebastian, PhD. exploratory memory and cognitive technologies scientist, IBM Research and lead author of the paper. But in this work, we conclusively show how we can exploit the physics of these memory devices to also perform a rather high-level computational primitive. The result of the computation is also stored in the memory devices, and in this sense the concept is loosely inspired by how the brain computes.” Sebastian also leads a European Research Council funded project on this topic.

* To demonstrate the technology, the authors chose two time-based examples and compared their results with traditional machine-learning methods such as k-means clustering:

Simulated Data: one million binary (0 or 1) random processes organized on a 2D grid based on a 1000 x 1000 pixel, black and white, profile drawing of famed British mathematician Alan Turing. The IBM scientists then made the pixels blink on and off with the same rate, but the black pixels turned on and off in a weakly correlated manner. This means that when a black pixel blinks, there is a slightly higher probability that another black pixel will also blink. The random processes were assigned to a million PCM devices, and a simple learning algorithm was implemented. With each blink, the PCM array learned, and the PCM devices corresponding to the correlated processes went to a high conductance state. In this way, the conductance map of the PCM devices recreates the drawing of Alan Turing.

Real-World Data: actual rainfall data, collected over a period of six months from 270 weather stations across the USA in one hour intervals. If rained within the hour, it was labelled “1” and if it didn’t “0”. Classical k-means clustering and the in-memory computing approach agreed on the classification of 245 out of the 270 weather stations. In-memory computing classified 12 stations as uncorrelated that had been marked correlated by the k-means clustering approach. Similarly, the in-memory computing approach classified 13 stations as correlated that had been marked uncorrelated by k-means clustering.

Conventional computers based on the von Neumann architecture perform computation by repeatedly transferring data between their physically separated processing and memory units. As computation becomes increasingly data centric and the scalability limits in terms of performance and power are being reached, alternative computing paradigms with collocated computation and storage are actively being sought. A fascinating such approach is that of computational memory where the physics of nanoscale memory devices are used to perform certain computational tasks within the memory unit in a non-von Neumann manner. We present an experimental demonstration using one million phase change memory devices organized to perform a high-level computational primitive by exploiting the crystallization dynamics. Its result is imprinted in the conductance states of the memory devices. The results of using such a computational memory for processing real-world data sets show that this co-existence of computation and storage at the nanometer scale could enable ultra-dense, low-power, and massively-parallel computing systems.

In the beginning, life in the cloud was simple. Type in your credit card number and—voilà—you had root on a machine you didn’t have to unpack, plug in, or bolt into a rack.

That has changed drastically. The cloud has grown so complex and multifunctional that it’s hard to jam all the activity into one word, even a word as protean and unstructured as “cloud.” There are still root logins on machines to rent, but there are also services for slicing, dicing, and storing your data. Programmers don’t need to write and install as much as subscribe and configure.

Here, Amazon has led the way. That’s not to say there isn’t competition. Microsoft, Google, IBM, Rackspace, and Joyent are all churning out brilliant solutions and clever software packages for the cloud, but no company has done more to create feature-rich bundles of services for the cloud than Amazon. Now Amazon Web Services is zooming ahead with a collection of new products that blow apart the idea of the cloud as a blank slate. With the latest round of tools for AWS, the cloud is that much closer to becoming a concierge waiting for you to wave your hand and give it simple instructions.

Here are 10 new services that show how Amazon is redefining what computing in the cloud can be.

Glue

Anyone who has done much data science knows it’s often more challenging to collect data than it is to perform analysis. Gathering data and putting it into a standard data format is often more than 90 percent of the job.

Glue is a new collection of Python scripts that automatically crawls your data sources to collect data, apply any necessary transforms, and stick it in Amazon’s cloud. It reaches into your data sources, snagging data using all the standard acronyms, like JSON, CSV, and JDBC. Once it grabs the data, it can analyze the schema and make suggestions.

The Python layer is interesting because you can use it without writing or understanding Python—although it certainly helps if you want to customize what’s going on. Glue will run these jobs as needed to keep all the data flowing. It won’t think for you, but it will juggle many of the details, leaving you to think about the big picture.

FPGA

Field Programmable Gate Arrays have long been a secret weapon of hardware designers. Anyone who needs a special chip can build one out of software. There’s no need to build custom masks or fret over fitting all the transistors into the smallest amount of silicon. An FPGA takes your software description of how the transistors should work and rewires itself to act like a real chip.

Amazon’s new AWS EC2 F1 brings the power of FGPA to the cloud. If you have highly structured and repetitive computing to do, an EC2 F1 instance is for you. With EC2 F1, you can create a software description of a hypothetical chip and compile it down to a tiny number of gates that will compute the answer in the shortest amount of time. The only thing faster is etching the transistors in real silicon.

Who might need this? Bitcoin miners compute the same cryptographically secure hash function a bazillion times each day, which is why many bitcoin miners use FPGAs to speed up the search. Anyone with a similar compact, repetitive algorithm you can write into silicon, the FPGA instance lets you rent machines to do it now. The biggest winners are those who need to run calculations that don’t map easily onto standard instruction sets—for example, when you’re dealing with bit-level functions and other nonstandard, nonarithmetic calculations. If you’re simply adding a column of numbers, the standard instances are better for you. But for some, EC2 with FGPA might be a big win.

Blox

As Docker eats its way into the stack, Amazon is trying to make it easier for anyone to run Docker instances anywhere, anytime. Blox is designed to juggle the clusters of instances so that the optimum number are running—no more, no less.

Blox is event driven, so it’s a bit simpler to write the logic. You don’t need to constantly poll the machines to see what they’re running. They all report back, so the right number can run. Blox is also open source, which makes it easier to reuse Blox outside of the Amazon cloud, if you should need to do so.

X-Ray

Monitoring the efficiency and load of your instances used to be simply another job. If you wanted your cluster to work smoothly, you had to write the code to track everything. Many people brought in third parties with impressive suites of tools. Now Amazon’s X-Ray is offering to do much of the work for you. It’s competing with many third-party tools for watching your stack.

When a website gets a request for data, X-Ray traces as it as flows your network of machines and services. Then X-Ray will aggregate the data from multiple instances, regions, and zones so that you can stop in one place to flag a recalcitrant server or a wedged database. You can watch your vast empire with only one page.

Rekognition

Rekognition is a new AWS tool aimed at image work. If you want your app to do more than store images, Rekognition will chew through images searching for objects and faces using some of the best-known and tested machine vision and neural-network algorithms. There’s no need to spend years learning the science; you simply point the algorithm at an image stored in Amazon’s cloud, and voilà, you get a list of objects and a confidence score that ranks how likely the answer is correct. You pay per image.

The algorithms are heavily tuned for facial recognition. The algorithms will flag faces, then compare them to each other and references images to help you identify them. Your application can store the meta information about the faces for later processing. Once you put a name to the metadata, your app will find people wherever they appear. Identification is only the beginning. Is someone smiling? Are their eyes closed? The service will deliver the answer, so you don’t need to get your fingers dirty with pixels. If you want to use impressive machine vision, Amazon will charge you not by the click but by the glance at each image.

Athena

Working with Amazon’s S3 has always been simple. If you want a data structure, you request it and S3 looks for the part you want. Amazon’s Athena now makes it much simpler. It will run the queries on S3, so you don’t need to write the looping code yourself. Yes, we’ve become too lazy to write loops.

Athena uses SQL syntax, which should make database admins happy. Amazon will charge you for every byte that Athena churns through while looking for your answer. But don’t get too worried about the meter running out of control because the price is only $5 per terabyte. That’s about 50 billionths of a cent per byte. It makes the penny candy stores look expensive.

Lambda@Edge

The original idea of a content delivery network was to speed up the delivery of simple files like JPG images and CSS files by pushing out copies to a vast array of content servers parked near the edges of the Internet. Amazon is taking this a step further by letting us push Node.js code out to these edges where they will run and respond. Your code won’t sit on one central server waiting for the requests to poke along the backbone from people around the world. It will clone itself, so it can respond in microseconds without being impeded by all that network latency.

Amazon will bill your code only when it’s running. You won’t need to set up separate instances or rent out full machines to keep the service up. It is currently in a closed test, and you must apply to get your code in their stack.

Snowball Edge

If you want some kind of physical control of your data, the cloud isn’t for you. The power and reassurance that comes from touching the hard drive, DVD-ROM, or thumb drive holding your data isn’t available to you in the cloud. Where is my data exactly? How can I get it? How can I make a backup copy? The cloud makes anyone who cares about these things break out in cold sweats.

The Snowball Edge is a box filled with data that can be delivered anywhere you want. It even has a shipping label that’s really an E-Ink display exactly like Amazon puts on a Kindle. When you want a copy of massive amounts of data that you’ve stored in Amazon’s cloud, Amazon will copy it to the box and ship the box to wherever you are. (The documentation doesn’t say whether Prime members get free shipping.)

Snowball Edge serves a practical purpose. Many developers have collected large blocks of data through cloud applications and downloading these blocks across the open internet is far too slow. If Amazon wants to attract large data-processing jobs, it needs to make it easier to get large volumes of data out of the system.

If you’ve accumulated an exabyte of data that you need somewhere else for processing, Amazon has a bigger version called Snowmobile that’s built into an 18-wheel truck complete with GPS tracking.

Oh, it’s worth noting that the boxes aren’t dumb storage boxes. They can run arbitrary Node.js code too so you can search, filter, or analyze … just in case.

Pinpoint

Once you’ve amassed a list of customers, members, or subscribers, there will be times when you want to push a message out to them. Perhaps you’ve updated your app or want to convey a special offer. You could blast an email to everyone on your list, but that’s a step above spam. A better solution is to target your message, and Amazon’s new Pinpoint tool offers the infrastructure to make that simpler.

You’ll need to integrate some code with your app. Once you’ve done that, Pinpoint helps you send out the messages when your users seem ready to receive them. Once you’re done with a so-called targeted campaign, Pinpoint will collect and report data about the level of engagement with your campaign, so you can tune your targeting efforts in the future.

Polly

Who gets the last word? Your app can, if you use Polly, the latest generation of speech synthesis. In goes text and out comes sound—sound waves that form words that our ears can hear, all the better to make audio interfaces for the internet of things.

This isn’t ARM’s first association with supercomputers. Earlier this year, Fujitsu announced that it plans to build a successor to the Project K supercomputer, which is housed at the Riken Advanced Institute for Computational Science, using ARM chips. In fact, it was announced today that the new Post-K machine will be the first to license the newly announced ARM architecture.

ARM has built a reputation for building processors known for their energy efficiency. That’s why they’ve proven so popular for mobile devices—they extend battery life in smartphones and tablets. Among the companies that license ARM’s designs are Apple, Qualcomm, and Nvidia. But the company’s energy-efficient chips also create less heat and use less power, which are both desirable attributes in large-scale processing applications such as supercomputers.

Intel will be worried by the purchase. The once-dominant chipmaker missed the boat on chips for mobile devices, allowing ARM to dominate the sector. But until recently it’s always been a leading player in the supercomputer arena. Now the world’s fastest supercomputer is built using Chinese-made chips, and clearly ARM plans to give it a run for its money, too.

It remains to be seen how successful ARM-powered supercomputers will be, though. The first big test will come when Fujitsu’s Post-K machine is turned on, which is expected in 2020. Intel will be watching carefully the whole way.