by Jack Clark

Attention & interpretability: modern neural networks are hard to interpret because we haven’t built tools to make it easy to analyze their decision-making processes. Part of the reason why we haven’t built the tools is that it’s not entirely obvious how you get a big stack of perceptual math machinery to tell you about what it is thinking in a way that is remotely useful to the untrained eye. The best thing we’ve been able to come up with, in the case of certain vision and language tasks, is attention where we visualize what parts of a neural network – sometimes down to an individual cell or ‘neuron’ within it – is activating in response to. This can help us diagnose why an AI tool is responding in the way it is.
.., Latent Attention Networks, from researchers with Brown University proposes an interesting way to improve our ability to analyze nets: by creating a new component to make it easier to visualize the attention of a given network in a more granular manner..
…In the paper they introduce a new AI component, which they call a Latent Attention Network. This component is general, working across different neural network architectures (a first, the researchers claim), and only requires the person to fiddle with it at its input or output points. LANs let them fit a so-called attention mask over any architecture.
…”The attention mask seeks to identify input components of x that are critical to producing the output F(x). Equivalently, the attention mask determines the degree to which each component of x can be corrupted by noise while minimally affecting F(x),” they write.
…The researchers evaluate the approach on a range of tasks from simple (MNIST! CIFAR) and to a game of Pong from the Atari Learning Environment. The ensuing visualizations seems to be helpful for getting a better grasp of how and why neural network classifiers work. I particularly recommend studying the images from Pong.
…Why it could be useful: this technique hints at a way to be able to take a generic component and simply fit it to an arbitrary network, then get the network to cough up some useful information about its state – if extended it could be a handy tool for AI diagnosticians.

Self-Normalizing Neural Networks cause a stir: A paper from researchers with the Bioinformatics Institute in Austria proposes a way to improve feed forward neural network performance with a new AI component, Self-Normalizing Neural Networks. “FNNs are typically shallow and, therefore cannot exploit many levels of abstract representations. We introduce self-normalizing neural networks (SNNs) to enable high-level abstract representations,” they write.
…The paper is thorough and is accompanied with a code release, aiding rapid replication and experimentation by others. The researchers carry out exhaustive experiments, bench-marking their approach (based around a SELU, a scaled exponential linear unit) against a movable feast of other AI approaches, ranging from Residual Nets, to Highway Networks, to weights with Batch Normalization or Layer Normalization, and more.
…They test the method exhaustively as well. “We compared SNNs on (a) 121 tasks from the UCI machine learning repository, on (b) drug discovery benchmarks, and on (c) astronomy tasks with standard FNNs and other machine learning methods such as random forests and support vector machines. SNNs significantly outperformed all competing FNN methods at 121 UCI tasks, outperformed all competing methods at the Tox21 dataset, and set a new record at an astronomy data set,” they write.
…Noteable fact: One of the authors is Sepp Hochreiter, who invented (along with Juergen Schmidhuber) the tremendously influential Long-Short Term Memory networks component, aka the LSTM. LSTMs are used exhaustively in AI these days for tasks ranging from object detection to speech recognition and the paper has over 4500 citations (growing with the massive influx of new AI research into memory networks, differentiable neural computers, Neural Turing Machines, and so on).
…The Self-Normalizing Neural Networks paper is amply thorough, weighing in at an eyebrow-raising 102 pages, split between the research paper (9 pages) with the other pages devoted to comprehensive theoretical analysis, experiments, and – of course – references, to back it up. More of this European precision, please!
Open publishing (Arxiv) versus slow publishing (Conferences and Journals). The Hochreiter paper highlights some of the benefits of the frenetic attention that publishing on Arxiv can bestow, along with/instead of traditional (relatively slow-burning) conferences and journals. I think the trade-off between speed of dissemination and lack of peer review is ultimately worthwhile, though some disagree.
…Yoav Goldberg, a researcher who has done work at the intersection of NLP and Deep Learning, writes that Arxiv can also lead to people having an incentive to publish initial versions of papers that are thin, not very detailed, and that serve more as flag-planting symbols for an expected scientific breakthrough than anything else. These are all legitimate points.
…Facebook AI Researcher Yann Lecun weighed in and (in a lengthy, hard-to-link to note on Facebook) says that the open publishing process allows for rapid dissemination of ideas and experimentation free of the pressure to publish papers at a conference.As of the time of writing the nascent AI blogosphere continues to be roiled by this drama, so I’m sure this boulder will continue to roll.
…(For disclosure: I side more toward favoring the Arxiv approach and think that ultimately bad papers and bad behavior gets weeded out by the community over time. It’s rare that people accept a con. Deep Learning has been in hyper-growth mode since the 2012 AlexNet paper, so it’s natural things are a bit fast-moving and chaotic right now. Things may iron themselves out over time.

Compute as AI’s strategic fulcrum: the AI community is getting much better at training big neural network models. Latest case in point comes from Facebook, which has outlined a new technique for training large-scale image classifiers.…Time to train ImageNet in 2012: A week or two across a single GPU, with need for loads of custom CUDA programming..
…Time to train ImageNet in 2017: One hour across 256 GPUs. Vastly improved&simplified software+hardware..
…Although, as someone commented on Twitter, most people don’t easily have access to 256 GPUs.

Better classifiers through combination: DING DONG! DING DONG! When you read those four words there’s a decent chance you also visualized a big clock or imagined some sonic representation of a clock chiming. Human memory seems to work like this, with a sensory experience of one entity inviting in a bunch of different, complementary representations. Some believe it’s this fusion of senses that gives us such powerful discriminative abilities.
…Wouldn’t it be nice to get similar effects in deep learning? From 2015 onwards people started experimenting en mass with getting computers to better understand images by training the nets on paired sets of images and captions, creating perceptual AI systems with combined representations of entities. We’ve also seen people more recently experiment with training audio and visual data together. Now, scientists from MIT have combined visual, audio, and text, into the same network.
...The data: Almost a million images (COCO & Visual Genome), synchronized with either a textual description or an audio track (377 continuous days of audio data, pulled from over 750,000 Flickr videos).
...How it works: the researchers create three different networks to ingest text, audio, or picture data. The ensuing learned representations from all of these networks are outputted as fixed length vectors with the same dimensionality, which are then fed into a network that is shared across all three input networks. “While the weights in the earlier layers are specific to their modality, the weights in the upper layers are shared across all modalities,” they write.
…Bonus: The combined system ends up having cells that activate in the presence of words, pictures, or sounds that correspond to subtle types of object, like engines or dogs.

Bored with the state of your supply chain automation? Consider investing in an autonomous cargo boat – the new craze sweeping across commodities makers worldwide!, as companies envisage a future where autonomous mines (semi-here) take commodities via autonomous trains (imminent) to autonomous ports (here) to the yet-to-be-built autonomous boats.

4K GAN FACES: A sight for blurry, distorted eyes. Mike Tyka has written about his experiments to use GANs to create large, high-resolution entirely synthetic faces.
…The results are quite remarkable, with the current images seeming as much a new kind of impressionism as realistic photographs, (though only for sub-sections of every given image, and sometimes wrought with Dali-esque blotches and Bacon-esque flaws)..
…”as usual I’m battling mode collapse and poor controllability of the results and a bunch of trickery is necessary to reduce the amount of artifacts,” he writes. G’luck, Mike!

You are not Google (and that’s okay): This article about knowing what large-scale over-engineered technology is worth your while and what is out of scope is as relevant for AI researchers and engineers as it is for infrastructure people.
…Bonus: the invention of the delightfully German-sounding acronym UNPHAT.

What China thinks about when China thinks about AI: A good interview with Oregon professor Tom Diettrich in China’s National Science Review. We’re entering an era where AI becomes a tool of geopolitics as countries flex their various strengths in the tech as part of wider national posturing. So it’s crucial that scientists stay connected with one another, talking about the issues that matter to them which transcend borders.
…Diettrich makes the point that modern AI is about as easy to debug as removing all the rats from a garbage dump. “Traditional software systems often contain bugs, but because software engineers can read the program code, they can design good tests to check that the software is working correctly. But the result of machine learning is a ‘black box’ system that accepts inputs and produces outputs but is difficult to inspect,” he says.
…AI in China: “Chinese scientists (working both inside and outside China) are making huge contributions to the development of machine learning and AI technologies. China is a leader in deep learning for speech recognition and natural language translation, and I am expecting many more contributions from Chinese researchers as a result of the major investments of government and industry in AI research in China. I think the biggest obstacle to having higher impact is communication,” he says. “A related communication problem is that the internet connection between China and the rest of the world is often difficult to use. This makes it hard to have teleconferences or Skype meetings, and that often means that researchers in China are not included in international research projects.”

Building little pocket universes in PyTorch: This is a good tutorial for how to use PyTorch, an AI framework developed by Facebook, to build simple cellular automata grid worlds and train little AI agents in them.
…It’s great to see practical tutorials like this (along with the CycleGAN implementation & guide I pointed out last week) as it makes AI a bit less intimidating. Too many AI tutorials say stuff like “Simply install CUDA, CuDNN, configure TensorFlow, spin-up a dev environment in Conda, then swap out a couple of the layers.” This is not helpful to a beginner, and people should remember to go through all the seemingly-intuitive setup steps that go with any deep learning system..
…Another great way to learn about AI is to compete in AI competitions. So it’s no surprise Google-owned Kaggle has passed one million members on its platform. Because Kaggle members use the platform to create algorithms and fiddle with datasets via Kaggle Kernels, it seems like as membership scales Kaggle’s usefulness will scale proportionally. Congrats, all!

Compete for DATA: CrowdFlower has launched AI For Everyone, a challenge that will see two groups every quarter through to 2018 compete to get access to free data on CrowdFlower’s eponymous platform.
…Winners get a free CrowdFlower AI subscription, a $25,000 credit towards paying for CrowdFlower contributors to annotate data, free CrowdFlower platform training and boarding, and promotion of their results.

[ 2045: Outskirts of Death Valley, California. A man and a robot approach a low-building, filled with large, dust-covered machines, and little orange robot arms on moving pedestals that whizz around, autonomously tending to the place. One of them has the suggestion of a thatched hairpiece, made up of a feather-coated tumbleweed that has snared into one of its joints.]

You can’t be serious.
Alvin, it’ll be less than a day.
It’s undignified. I literally cured cancer.
You and a billion of your clones, sure.
Still me. I’m not happy about this.
I’m going to take you out now.
No photographs. If I sense a single one going onto the Internet I’m going to be very annoyed.
Sure, you say, then you unscrew the top of Alvin’s head.

Alvin is, despite its inflated sense of importance, very small. Maybe about half a palm’s worth of actual computer, plus a forearm’s worth of cabling, and a few peripheral cables and generic sensor units that can be bound up and squished together. Which is why you’re able to lift his head away from his body, unhook a couple of things, then carefully pull him out. Your own little personal, witheringly sarcastic, AI assistant.

Death Valley destroys most machines that go into it, rotting them away with the endless day/night flexing of metal in phase transitions, or searing them with sand and grit and sometimes crusted salt. But most machines don’t mind – they just break. Not Alvin. For highly sensitive, developed AIs of its class the experience is actively unpleasant. Heat leads to flexing in casing which leads to damage which leads to improper sensing which gets interpreted as something a vast group of scientists has said corresponds to the human term for pain. Various international laws prohibit the willful infliction of this sort of feeling on so-called Near Conscious Entities – a term that Alvin disagrees with.

So, unwilling to violate the law, here you are at Hertz-Rent-a-Body, transporting Alvin out of his finely-filigreed silver-city Android body, into something that looks like a tank. You squint. No, it’s actually a tank, re-purposed slightly; its turret sliced in half, its snout capped with a big, sensor dome, and the bumps on its front for storing smoke flares now contain some directional microphones. Aside from that it could have teleported out of a war in the previous century. You check the schematics and are assured fairly quickly that Death Valley won’t pose a threat to it.
…Ridiculous, says Alvin. So much waste.

You unplug the cable connecting Alvin to the suit’s speaker, and carry him over to the tank. The tank senses you, silently confirms the rental with your bodyphone, then the hatch on its roof sighs open and a robotic arm snakes out.Welcome! Please let us accommodate your N.C.E codename A.L.V.I.N the arm-tank says, its speakers crackling. The turret shifts to point to the electronics in your hands.
Alvin, having no mouth due to not being wired up to a speaker, flashes its output OLEDs angrily, shimmering between red and green rapidly – a sign, you know from experience, of the creation and transmission of a range of insults, both understandable by conventional humans and some highly specific to machines.Your N.C.E has a very large vocabulary. Impressive! chirps the tank.
The robot arm plucks Alvin delicately from your hands and retracts back into the tank. A minute passes and the tank whirs. A small green light turns on in the sensor dome on the tip of its turret. One of its speakers emits a brief electronic-static burp, then-
I am too large, says Alvin, through the tank. They want me to do tests in this thing.

Five minutes later and Alvin is trundling to and fro in the Hertz parking lot, navigating between five orange cones set down by another similarly-cheerful robotic arm on a movable mount. A couple more tasks pass – during one U-Turn Alvin makes the tank shuffle jerkily giving the appearance of a sulk – then the Hertz robot arm flashes green and says We have validated movement policies. Great driving! Please return to us within 24:00 hours for dis-internment!

Alvin trundles over to you and you climb up one one of his treads, then hop onto the roof. You put your hand on the hatch to pull it open but it doesn’t move.
You’re not coming in.
Alvin, it’s 120 degrees.
I’m naked in here. It would make me uncomfortable.
Now you’re just being obtuse. You can turn off your sensors. You won’t notice me.
Are you ordering me?
No, I’m not ordering you, I’m asking you nicely.
I’m a tank, I don’t have to be nice.
That’s for sure.
I’ll drive in the shade. You won’t get too hot. Medically, you’re going to be fine.
Just drive, you say.
You start to trundle away into the desert. Have a good day! Shouts the Hertz robotic arm from the parking lot. Alvin finds the one remaining smoke grenade in the tank and fires it into the air,back towards the body rental shop.We have added the fine for smoke damage to your bill. Safe driving! crackles the robot arm through the distant haze.

Hi Jack, Regarding the new ImageNet result, one hour on 256 GPUs, there is a far more serious caveat than not typically having access to that many GPUs. It is that the three forms of machine parallelism (MP) currently used in Deep Learning (DL), i.e., data parallelism, model parallelism, and shared/tied weights, are completely irrelevant or implausible from a biological standpoint. There is really no possibility that the brain uses any of them. Yet, without MP, DL learning methods are intractably slow. To my mind, these two facts imply that despite DL’s impressive successes, it is highly unlikely that DL methods have captured the essence of biological intelligence. I have an abstract into CCN 2017, available off my web site, in case you would like to read more.