How machine learning ate Microsoft

At the Strata big data conference yesterday, Microsoft let the world know its Azure Machine Learning offering was generally available to developers. This may come as a surprise. Microsoft? Isn't machine learning the province of Google or Facebook or innumerable hot startups?

In truth, Microsoft has quietly built up its machine learning expertise over decades, transforming academic discoveries into product functionality along the way. Not many businesses can match Microsoft's deep bench of talent.

Machine learning -- getting a system to teach itself from lots of data rather than simply following preset rules -- actually powers the Microsoft software you use everyday. Machine learning has infiltrated Microsoft products from Bing to Office to Windows 8 to Xbox games. Its flashiest vehicle may be the futuristic Skype Translator, which handles two-way voice conversations in different languages.

Now, with machine learning available on the Azure cloud, developers can build learning capabilities into their own applications: recommendations, sentiment analysis, fraud detection, fault prediction, and more.

The idea of the new Azure offering is to democratize machine learning, so you no longer need to hire someone with a doctorate to use a machine learning algorithm. That could "pull big data out of the trough of disillusionment," suggests Joseph Sirosh, Microsoft's corporate vice president for information management and machine learning, who heads up the new Azure service, "taking it from looking in the rearview mirror with business intelligence to really being able to predict and generate forecasts you can act on."

Sirosh dreams big, suggesting that the potential goes far beyond forecasting and predictions, to the point where "every mobile app can now be intelligent and every IoT sensor can now send data to the cloud and call on APIs that provide it with intelligence." If that seems overly optimistic, it's worth looking at how important machine learning already is for Microsoft's own products.

Machine learning everywhere

Machine learning enables Clutter in Office 365 to determine with uncanny accuracy which email you'll want to read and which messages you're likely to ignore and delete. It's how you can open customer data from Salesforce or code from GitHub in the new Microsoft Power BI portal and immediately ask natural-language questions like "customer sales last quarter," to get not only numbers, but a chart in the style that highlights what's important in the data.

It's how Office 365 and Azure spot hackers trying to break into accounts, how Cortana can recognize what you're saying, how Kinect can detect the position of your fingers or the joints of your skeleton from an infrared image. It's also why the keyboard on Windows Phone is so accurate: Data derived from thousands of people correcting mistakes on their phones enables the software to guess which letter you're going to type next and make that key (invisibly) bigger.

The same machine learning technique makes it easier to touch the right menu on a Windows tablet with your finger and helps OneNote figure out your handwriting. Launch an app in Windows 8, and three-quarters of the time it opens almost instantly, thanks to machine learning that tells the system which apps to preload into memory because you're going to need them.

Machine learning takes enormous amounts of data -- whether it's a server log, a stream of information from sensors or a huge collection of images, videos, or audio recordings -- and merges it into a system that's better at handling complex situations than any algorithm. The idea has been around for 50 years, but as more and more data becomes available, machine learning has become increasingly useful, going from academic research to powering breakthroughs like usable voice recognition.

"I honestly can't think of any recent product development that Microsoft has been involved in that hasn't involved machine learning," says Microsoft's director of research, Peter Lee, who left DARPA to run Microsoft's research arm. "Everything we do now is influenced, one way or another, by machine learning."

Take the recent Microsoft Band, the flagship device for Microsoft's new Health platform. "We wanted to get the blood flow sensor to provide accurate readings even under extreme athletic duress like rowing," Lee explains (the vice president who approved the project is an avid rower). "It's a very low-cost sensor; just to interpret the reading from the sensor, we've found machine learning is the only practical approach to doing that."

Decades in the lab

How did Microsoft get this good at machine learning? Thank the often underestimated Microsoft Research (MSR) division. "Some of the earliest roots [of this success] go back more than 20 years, with the arrival of people like Eric Horvitz who really brought the whole vision of machine learning to the company," says Lee. "They very quickly came up with the idea of applying this to Microsoft products."

Horvitz, now managing director of MSR's Redmond Lab, has won awards from the Association for the Advancement of Artificial Intelligence to the American Academy of Arts and Sciences, and he recently funded a hundred-year study of artificial intelligence. Having someone that influential at MSR helped attract other pioneers as machine learning became relevant to one field of research after another.

"When we established the lab in Cambridge 15 years ago it added to the momentum, with people who worked on probabilistic modelling, like Chris Bishop, being attracted to MSR."

Bishop literally wrote the book on neural networks and pattern recognition; his textbook made statistical methods common in machine learning. He's now the chief research scientist at MSR Cambridge, where he runs the Machine Learning and Perception group behind skeletal tracking in Kinect, the AI in Forza Motorsport, the TrueSkill ranking system on Xbox, as well as some of the search features in Bing and SharePoint.

The team is also working on Infer.Net, a probabilistic programming toolkit that uses machine-language descriptions of the world to handle uncertainty, instead of needing every question to have the usual yes/no answer of computers. That's what Clutter uses to triage your inbox. Researcher John Winn and his colleagues worked with the Exchange team for four years on different ideas until they found something that could "really add value and not be in some way creepy or attract the negativity you can sometimes get when you start applying machine learning to personal email."

"Then as computer vision started to become more influenced by machine learning, [we attracted] a large number of very significant luminaries in that field who had one foot in machine learning and one in vision, and people like Andrew Blake became very relevant," Lee explains. (Blake, who now runs the Cambridge lab, pioneered key probabilistic computer vision algorithms at Edinburgh and Oxford University.)

A few years later, when AT&T closed down Bell Labs, many of the researchers joined Microsoft. "People who were really thinking about neural networks and more statistical methods started to arrive on the scene," says Lee. "That was timed with the emergence of the relevance of big data; that whole wave has been tremendously influential, not only inside Microsoft but in the whole industry."

Then in 2009, shortly before Lee himself joined Microsoft, a project that he jokes he might easily have rejected as "an unwise attempt to use layered neural networks for speech processing" helped take machine learning out of the lab and into mainstream computing.

"I would have said it was completely ridiculous, and I would have been backed by all the top researchers," Lee admits. Instead, that work became the foundation for the multilayered "deep" neural networks that have transformed voice and image recognition across the industry.

Diving deeper

Voice recognition used to mean training your computer to learn your voice, or sticking to a few simple commands; now it means you can buy a new phone and start talking to it -- and Windows 10 will bring that to your PC.

Image recognition has gone from spotting when there's a face in a photograph to coping with everything from text to traffic signals. The ImageNet benchmark tests identifying photos of a thousand objects, like recognizing not only pictures of 150 different dogs but also their breeds. "You have to distinguish Pembroke Welsh corgis and Cardigan Welsh corgis, one of which has a longer tail," explains John Platt of MSR.

This month, a team of Microsoft researchers in the Beijing lab announced that their deep learning system was the first to beat untrained humans on the benchmark (narrowly beating Google to the achievement).

That's all thanks to deep learning. It's one of the fastest-moving areas in AI today; the pioneers of deep learning work at Google, at Facebook, at Baidu -- and at Microsoft.

In 2009, when Geoff Hinton of the University of Toronto proposed creating a neural network that would recognize speech by gradually building up its understanding of more and more words (a vastly simplified version of one of the techniques the human brain uses to recognize patterns in images sounds), most researchers weren't interested. In a testament to MSR's willingness to experiment, an intern and a graduate student of Hinton got approval to work with his researchers and try out this deep network with real data.

Their results weren't only a little bit better; they were 25 percent more accurate. Once they were published, Lee points out, "not only Microsoft but most of the industry transitioned to using them."

Bringing machine learning to the masses

As Microsoft offers its own machine learning tools to developers, the company may enjoy greater recognition for its pioneering work. "We have a treasure trove of knowledge and algorithms and code across a vast array of machine problems that would be incredibly powerful and satisfying to get into wider use," says Lee.

Azure Machine Learning is how Microsoft is trying to do that. Joseph Sirosh calls it "the fastest way to build predictive models and deploy them. All you need is a Web browser to start machine learning. It allows simple, one-click creation of APIs in the cloud and that makes the deployment easy. It's easy to hook up a Web page, it's easy to hook up a mobile app. That's why I think it's transforming how development is done."

The new Azure ML service started out as a MSR Excel demo, sending data to experimental machine learning-driven data analytics running on Azure. A couple of years before he became CEO, Satya Nadella came across the demo and immediately saw the potential of turning it into a product for business customers. He persuaded the researcher, Roger Barga, to join a team inside his cloud division. "Satya got excited, and he got me excited," Barga remembers.

The idea was to combine the machine learning tools from the research team with the expertise that product teams across Microsoft had gained by implementing machine learning algorithms. Making machine learning work well isn't only about having a good algorithm or even making it perform at scale. You also need to make it consistent. The same algorithm in different machine learning packages often delivers different answers; using heuristics to find the model that best matches your data takes a lot of experience.

That experience offers a unique advantage, says Barga: "These algorithms have been hardened and proven over the years. We're able to draw on that expertise to implement it again in Azure ML. We know what the best practices are, what the heuristics are, what should we do to ensure this will be robust, scalable, and performant implementation we can deliver to our customers."

Developers can also design a machine learning system using the Azure ML Studio tool. That's popular even with experienced machine learning developers like the team at Mendeley, Elsevier's academic research network, which built a new recommendation system in a third of the time it took them with other tools. JJ Food Service in the United Kingdom used it to make a predictive shopping cart that puts products in for you; customers like the convenience and revenue is up 5 percent.

A machine that trains itself

In order to allow easier use of multiple machine learning algorithms together, Microsoft needed to build a suitable platform. That meant creating a system for moving new algorithms from research into production; as new techniques are developed, they can be plugged into Azure ML, keeping it up to date as machine learning continues to develop.

A common problem with older machine learning systems (and one of the issues that deep learning will address) is "ML rot." In other words, you spend a long time training your system, and when you roll it out, it works for a while -- but it falls out of date and you have to train it again. One way to avoid that is by retraining your model as you use it.

During the preview, customers were so keen on that idea that Microsoft added programmatic training and retraining. "They want to upload data to an API and have machine learning models do the learning, so we added that," explains Sirosh. "Once you have an API in place, you can keep uploading data and the model will update itself and stay fresh and be constantly learning."

That's what eBay used to train its translation system on the terms used in women's fashion. If you're selling handbags, dresses, shoes, or other fashion items on eBay, you might see much better sales overseas because automatic translations of listings are more accurate -- and available in all 45 languages Azure ML supports.

This week, Microsoft added a new machine algorithm used by Bing Ads that can handle very large amounts of data. "We can learn at a terabyte-sized data set," boasts Sirosh. "I don't know if any cloud service allows you to learn in terabyte sizes today other than Azure ML." That's useful for big data, where you might have to look at a huge data set to find the signals that tell you something.

Microsoft has a range of services that work together for big data scenarios. You can load data into HD Insight, Microsoft's Hadoop cloud service, or pull in data from websites and sensors with Event Hubs, then process that stream of data with Azure Stream Analytics or with Apache Storm, which Azure now supports.

"From that you can call the machine learning APIs to detect anomalies or fraud," explains Sirosh. "You can take enormous amounts of data using, say, HD Insight and use that distilled with Azure ML to learn models that can be deployed in an application. But big learning is a lot more than that. Say fraud is high in certain postal codes and not in others. There are millions of postal codes in the world. These techniques allow you to take these patterns into account; you're able to use very fine-grained information and be very precise about it."

Sirosh clearly believes his platform will accelerate machine learning adoption. "Today businesses hire data scientists and they painfully custom build their own machine learning apps. With a platform like Azure ML it becomes so easy to create custom apps ... Only when you have a special set of needs will you need to set up a team of data scientists to build and API for you."

Walk into a Chili's restaurants and you might find a tablet on each table for ordering food, watching videos, paying the bill, and giving feedback. The system, built by Ziosk, uses HD Insight to track how customers use the tablets in 1,400 restaurants -- and Azure ML to customize what offers and content they see. It can even change the interface on the tablet, based on how they use it.

Sirosh thinks everyone should be building that kind of system. "This is the birth of the intelligent cloud in many ways. Any application you build, you should now consider using the data generated from the app, or any other data you have, to create a better customer experience, to create efficiencies you wouldn't have otherwise tapped into."

Microsoft's big machine learning future

CEO Satya Nadella called out machine learning -- and the big data that powers it -- as a key development in his memo to Microsoft last July. "Billions of sensors, screens, and devices -- in conference rooms, living rooms, cities, cars, phones, PCs -- are forming a vast network and streams of data that simply disappear into the background of our lives. This computing power will digitize nearly everything around us and will derive insights from all of the data being generated by interactions among people and between people and machines. We are moving from a world where computing power was scarce to a place where it now is almost limitless, and where the true scarce commodity is increasingly human attention."

That sounds rather more achievable when you talk to Peter Lee about the advances he believes Microsoft can make in the next decade.

Last year he showed off early work on a machine learning system that could use your phone camera to not only recognize a dog, but identify the breed, or tell you whether a plant was poisonous. That's Project Adam, which is trying to apply cloud principles of scale to machine learning. Normally, machine learning happens on a single system that you can't scale beyond a cluster because it has to be synchronous; with Project Adam, the learning can be asynchronous, so you could spread it out over a whole data center.

Project Adam is only one of what Lee calls several machine learning "moonshots" -- "efforts that are truly aspirational but have really concrete, easy-to-assess goals so you know whether you've done it or not." He's very protective of them ("the pressure can be very distracting"), so he won't name the other projects or say what the goals are -- but they're big.

"Project Adam truly pertains to going beyond speech and vision to really a deep understanding of human discourse. Ultimately, it's the next stage of a true AI where we really understand at scale how to get a machine to understand what human beings are talking about. The goals there are so interesting. From a scientific perspective there are tremendous implications for our understanding; from an engineering perspective the scale is really dazzling and from a commercial perspective the prospects for applications are incredibly enticing. We have very significant efforts in the foundations of speech and translation along the same lines."

Lee is both excited and pragmatic about the potential of these big projects -- and the side benefits "that have started to dribble out already" -- from OneDrive (which now uses machine learning to tag your photographs) to the demonstrations of Skype Translator (where performance improvements from new techniques have left even researchers "stunned"). Plus, there's a ready-made platform in Azure Machine Learning for bringing those new techniques to product groups inside Microsoft and developers elsewhere.

"With these large aspirational efforts, there's always a part of me that harbors some skepticism about whether we'll ever get there," Lee admits. "Some of these things are so fantastical, but you never know! You get surprised. And as a research manager, I'm comfortable that whether we get there or not, there are going to be a tremendous number of spinoffs and new knowledge."

Whether or not Microsoft makes more fundamental breakthroughs in AI, what it learns about using machine learning will carry on showing up in all the products you use -- including ones you build yourself.

Copyright 2019 IDG Communications. ABN 14 001 592 650. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of IDG Communications is prohibited.