Month: October 2016

A Stanford School of Medicine machine-learning method for automatically analyzing images of cancerous tissues and predicting patient survival was found more accurate than doctors in breast-cancer diagnosis, but doctors still don’t trust this method, say MIT researchers (credit: Science/AAAS)

MIT researchers have developed a method to determine the rationale for predictions by neural networks, which loosely mimic the human brain. Neural networks, such as Google’s Alpha Go program, use a process known as “deep learning” to look for patterns in training data.

An ongoing problem with neural networks is that they are “black boxes.” After training, a network may be very good at classifying data, but even its creators will have no idea why. With visual data, it’s sometimes possible to automate experiments that determine which visual features a neural net is responding to, but text-processing systems tend to be more opaque.

In the deep learning process, training data is fed to a network’s input nodes, which modify it and feed it to other nodes, which modify it and feed it to still other nodes, and so on. The values stored in the network’s output nodes are then correlated with the classification category that the network is trying to learn — such as the objects in an image, or the topic of an essay.

“In real-world applications, sometimes people really want to know why the model makes the predictions it does,” says Tao Lei, an MIT graduate student in electrical engineering and computer science and first author on the new paper. “One major reason that doctors don’t trust machine-learning methods is that there’s no evidence.” Another critical example is self-driving cars.

At the Association for Computational Linguistics’ Conference on Empirical Methods in Natural Language Processing, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) will present a new way to train neural networks so that they provide not only predictions and classifications but rationales for their decisions.

“There’s a broader aspect to this work, as well,” says Tommi Jaakkola, an MIT professor of electrical engineering and computer science and a coauthor on the open-access paper. “You may not want to just verify that the model is making the prediction in the right way; you might also want to exert some influence in terms of the types of predictions that it should make. How does a layperson communicate with a complex model that’s trained with algorithms that they know nothing about? They might be able to tell you about the rationale for a particular prediction. In that sense it opens up a different way of communicating with the model.”

Interpreting a neural net’s decisions

In the new paper, the CSAIL researchers specifically address neural nets trained on textual data. To enable interpretation of a neural net’s decisions, the researchers divide the net into two modules. The first module extracts segments of text from the training data, and the segments are scored according to their length and their coherence: The shorter the segment, and the more of it that is drawn from strings of consecutive words, the higher its score.

The segments selected by the first module are then passed to the second module, which performs the prediction or classification task. The modules are trained together, and the goal of training is to maximize both the score of the extracted segments and the accuracy of prediction or classification.

An example of a beer review with ranking in two categories. The rationale for Look prediction is shown in bold. (credit: MIT CSAIL)

One of the data sets on which the researchers tested their system is a group of reviews from a website where users evaluate different beers. The data set includes the raw text of the reviews and the corresponding ratings, using a five-star system, on each of three attributes: aroma, palate, and appearance.

What makes the data attractive to natural-language-processing researchers is that it’s also been annotated by hand, to indicate which sentences in the reviews correspond to which scores. For example, a review might consist of eight or nine sentences, and the annotator might have highlighted those that refer to the beer’s “tan-colored head about half an inch thick,” “signature Guinness smells,” and “lack of carbonation.” Each sentence is correlated with a different attribute rating.

As such, the data set provides an excellent test of the CSAIL researchers’ system. If the first module has extracted those three phrases, and the second module has correlated them with the correct ratings, then the system has identified the same basis for judgment that the human annotator did.

In experiments, the system’s agreement with the human annotations was 96 percent and 95 percent, respectively, for ratings of appearance and aroma, and 80 percent for the more nebulous concept of palate.

In the paper, the researchers also report testing their system on a database of free-form technical questions and answers, where the task is to determine whether a given question has been answered previously.

In unpublished work, they’ve applied it to thousands of pathology reports on breast biopsies, where it has learned to extract text explaining the bases for the pathologists’ diagnoses. They’re even using it to analyze mammograms, where the first module extracts sections of images rather than segments of text.

Abstract of Rationalizing Neural Predictions

Prediction without justification has limited applicability. As a remedy, we learn to extract pieces of input text as justifications – rationales – that are tailored to be short and coherent, yet sufficient for making the same prediction. Our approach combines two modular components, generator and encoder, which are trained to operate well together. The generator specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction. Rationales are never given during training. Instead, the model is regularized by desiderata for rationales. We evaluate the approach on multi-aspect sentiment analysis against manually annotated test cases. Our approach outperforms attention-based baseline by a significant margin. We also successfully illustrate the method on the question retrieval task.

A Stanford School of Medicine machine-learning method for automatically analyzing images of cancerous tissues and predicting patient survival was found more accurate than doctors in breast-cancer diagnosis, but doctors still don’t trust this method, say MIT researchers (credit: Science/AAAS)

MIT researchers have developed a method to determine the rationale for predictions by neural networks, which loosely mimic the human brain. Neural networks, such as Google’s Alpha Go program, use a process known as “deep learning” to look for patterns in training data.

An ongoing problem with neural networks is that they are “black boxes.” After training, a network may be very good at classifying data, but even its creators will have no idea why. With visual data, it’s sometimes possible to automate experiments that determine which visual features a neural net is responding to, but text-processing systems tend to be more opaque.

In the deep learning process, training data is fed to a network’s input nodes, which modify it and feed it to other nodes, which modify it and feed it to still other nodes, and so on. The values stored in the network’s output nodes are then correlated with the classification category that the network is trying to learn — such as the objects in an image, or the topic of an essay.

“In real-world applications, sometimes people really want to know why the model makes the predictions it does,” says Tao Lei, an MIT graduate student in electrical engineering and computer science and first author on the new paper. “One major reason that doctors don’t trust machine-learning methods is that there’s no evidence.” Another critical example is self-driving cars.

At the Association for Computational Linguistics’ Conference on Empirical Methods in Natural Language Processing, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) will present a new way to train neural networks so that they provide not only predictions and classifications but rationales for their decisions.

“There’s a broader aspect to this work, as well,” says Tommi Jaakkola, an MIT professor of electrical engineering and computer science and a coauthor on the open-access paper. “You may not want to just verify that the model is making the prediction in the right way; you might also want to exert some influence in terms of the types of predictions that it should make. How does a layperson communicate with a complex model that’s trained with algorithms that they know nothing about? They might be able to tell you about the rationale for a particular prediction. In that sense it opens up a different way of communicating with the model.”

Interpreting a neural net’s decisions

In the new paper, the CSAIL researchers specifically address neural nets trained on textual data. To enable interpretation of a neural net’s decisions, the researchers divide the net into two modules. The first module extracts segments of text from the training data, and the segments are scored according to their length and their coherence: The shorter the segment, and the more of it that is drawn from strings of consecutive words, the higher its score.

The segments selected by the first module are then passed to the second module, which performs the prediction or classification task. The modules are trained together, and the goal of training is to maximize both the score of the extracted segments and the accuracy of prediction or classification.

An example of a beer review with ranking in two categories. The rationale for Look prediction is shown in bold. (credit: MIT CSAIL)

One of the data sets on which the researchers tested their system is a group of reviews from a website where users evaluate different beers. The data set includes the raw text of the reviews and the corresponding ratings, using a five-star system, on each of three attributes: aroma, palate, and appearance.

What makes the data attractive to natural-language-processing researchers is that it’s also been annotated by hand, to indicate which sentences in the reviews correspond to which scores. For example, a review might consist of eight or nine sentences, and the annotator might have highlighted those that refer to the beer’s “tan-colored head about half an inch thick,” “signature Guinness smells,” and “lack of carbonation.” Each sentence is correlated with a different attribute rating.

As such, the data set provides an excellent test of the CSAIL researchers’ system. If the first module has extracted those three phrases, and the second module has correlated them with the correct ratings, then the system has identified the same basis for judgment that the human annotator did.

In experiments, the system’s agreement with the human annotations was 96 percent and 95 percent, respectively, for ratings of appearance and aroma, and 80 percent for the more nebulous concept of palate.

In the paper, the researchers also report testing their system on a database of free-form technical questions and answers, where the task is to determine whether a given question has been answered previously.

In unpublished work, they’ve applied it to thousands of pathology reports on breast biopsies, where it has learned to extract text explaining the bases for the pathologists’ diagnoses. They’re even using it to analyze mammograms, where the first module extracts sections of images rather than segments of text.

Abstract of Rationalizing Neural Predictions

Prediction without justification has limited applicability. As a remedy, we learn to extract pieces of input text as justifications – rationales – that are tailored to be short and coherent, yet sufficient for making the same prediction. Our approach combines two modular components, generator and encoder, which are trained to operate well together. The generator specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction. Rationales are never given during training. Instead, the model is regularized by desiderata for rationales. We evaluate the approach on multi-aspect sentiment analysis against manually annotated test cases. Our approach outperforms attention-based baseline by a significant margin. We also successfully illustrate the method on the question retrieval task.

A robotic vehicle made by the German Research Center for Artificial Intelligence being tested. As engineers near the limits of semiconductors made with silicon, there is a sense of urgency about finding new computing methods.

Credit: Boris Horvat/Agence France-Presse—Getty Images

Ali Farhadi holds a puny $5 computer, called a Raspberry Pi, comfortably in his palm and exults that his team of researchers has managed to squeeze into it a powerful program that can recognize thousands of objects.

The New Worlds 2016 Space Conference

Dates: November 4 – 5, 2016Location: Austin, Texas

Within 25 years the first colonists will arrive at the New Worlds of space.

How will they survive?

How will they thrive?

For two days this fall, world class scientists, experts and engineers will join some of the newest minds and future leaders to explain, discuss and debate the challenges and solutions facing those who will go out there.

Old ideas will be challenged.

New ideas will be presented for the first time.

Government explorers will tell us their plans.

Entrepreneurs and financial experts will talk about new businesses and ways to pay for these new communities.

Over 500 high school kids will design their own cities to be built on the Moon and Mars.

Sometime during the 21st century you will stagger out of a club at 3am and hail a taxi. The vehicle, no longer allowed to loiter in busy areas, will pop out of a stack nearby, find its way to you and honk. You and your drunk companions will stammer out your destinations until they flash up correctly on a screen. And you will glide home, staring enviously at the few people still allowed to drive: emergency service people and maintenance engineers.

What will take you home will not be a car, but rather a system. It might be a passive system, which only orders the traffic and the speeds according to the sum of individual requests, from cars owned by individual people. But it is more likely that it will be an active system – because we, the electorate, will have made it so.

When all cars are driverless, will we need pedestrian crossings?

It will give precedence to traffic leaving the centre of town at 3am; it will redistribute vehicles to the suburbs ready for morning; and it will incentivise non-ownership – the driverless cars will only be there to meet demand that the trams, underground railways and automated buses cannot.

But these battles between regulators and the rent-seeking monopolists who have hijacked the sharing economy are, in the long term, irrelevant. The attempt to drive down cab drivers’ wages and reduce their employment rights to zero are, in their own way, a last gasp of the 20th-century economic thinking.

Because soon there won’t need to be drivers at all. Given that there are 400,000 HGV drivers in the UK, that at least a quarter of Britain’s 2.5 million van drivers are couriers, and that there are 297,000 licensed taxi drivers – that is a big dent in male employment.

The most important question facing us is not whether Uber drivers should have employment rights (they should), but what to do in a world where automation begins to eradicate work. If we accept – as Oxford researchers Carl Frey and Michael Osborne stated in 2013 – that 47% of jobs are susceptible to automation, the most obvious problem is: how are people going to live?

The most heavily touted solution is the universal basic income. With the UBI, people are paid a basic income out of taxation, which they top up with work, which is assumed to be sporadic. In their spare time, they could – as Marx suggested in his design for communism – “hunt in the morning, fish in the afternoon, rear cattle in the evening, criticise after dinner”.

The UBI has keen supporters now in the tech industry, whose billionaires have realised that, through rapid automation and its ability to render regulation useless, info-tech could create mass poverty over the next 20 years. At its most libertarian, the UBI becomes a replacement for state provision: you get a fixed sum from the state and you spend it on Uber-ised public services, hailing the cheapest social care worker on an app, or the cheapest eye operation.

In a way, Uber has done us a favour by making concrete the kind of rightwing libertarian dystopia that would come about if we allowed Silicon Valley to design the future.

Instead, we should begin by recognising that, as machines plus artificial intelligence begin to replace human beings, the entire social, political and moral dilemma for humanity becomes a question of systems.

Driverless cars need a city-wide public transport system to work properly. The OECD has estimated that, when combined with an efficient, automated transport system, driverless cars could reduce the number of vehicles needed in a city by 90%. Conversely, when modelled as only taxis plus private vehicles, the advent of driverless cars produces an unmanageable overload of journeys.

To take full advantage of the space freed up needs active management, says the OECD. But we have no intellectual models for “active management” of automobile travel, which – since its inception – has been associated with personal freedom.

A sensible debate would address two big issues: how we prepare, plan and regulate for the eradication of most driving work; and what an integrated smart transport network should look like in version 1.0. Beyond that it is difficult to plan, because how society reacts to the sudden orderliness, cheapness and swiftness of commuter journeys has to be balanced against the fact that few people will have the kind of jobs they have now.

If we start from what the smart transport network should look like, we have basic technical models now. The main technical dilemma will be: how much small vehicle travel is optimal, compared with the massive investment in underground rail, bus and tram capacity. One would expect the right wing of society to favour as much shared and autonomous car travel as possible to the extent of eradicating mass transport; and the left vice-versa.

But it can’t just be an issue of technical systems design. For example, one of the advantages of Uber is that all drivers can be traced and identified. In a smart transport system, all journeys can be traced and identified. You might want such data to be viewable, say, by police investigating murder – but would you want them to be viewable by HMRC, or your boss?

As to the transition from here to there, while I support the basic income, I don’t think it will be enough. To maintain a sporadically employed workforce through massive change, giving everyone £7,000 so they do not starve might not be the best use of taxpayers’ money. Alongside that, we are going to need the massive expansion of state provision of homes, university education, re-skilling, energy, transport and healthcare – either ultra-cheap or free.

It is good to protect the precariat – to protect the employment rights tech predators are trying to take from them. Even better to eradicate the precarity in sporadic work through state provision of the basic necessities.

The right used to object to mass state provision on the grounds that “everything looks the same”. Once we’re in a world of shared, driverless, essentially similar vehicles, or identikit shared apartments furnished with the same cheap stuff, that objection will tend to pale.