The Doors of Probability

Mike Lynch, Autonomy founder, made his name by understanding Thomas Bayes and inverse probability. Wendy Grossman explains what a 1700s mathematician has to do with modern search engines and how Lynch was influenced by him.

Mike Lynch has long been the most interesting UK technology entrepreneur. In 2000, he became Britain's first software billionaire. In 2011 he sold his company, Autonomy, to Hewlett-Packard for $10 billion. A few months ago, Hewlett-Packard let him escape back into the wild of Cambridge. We've been waiting ever since for hints of what he'll do next; on Monday, he showed up at NESTA to talk about his adventures with Wired UK editor David Rowan.

Lynch made his name and his company by understanding that the rule formulated in 1750 by the English vicar and mathematician Thomas Bayes could be applied to getting machines to understand unstructured data. These days, Bayes is an accepted part of the field of statistics, but for a couple of centuries anyone who embraced his ideas would have been unwise to admit it. That began to change in the 1980s, when people began to realize the value of his ideas.

"The work [Bayes] did offered a bridge between two worlds," Lynch said on Monday: the post-Renaissance world of science, and the subjective reality of our daily lives. "It leads to some very strange ideas about the world and what meaning is."

As Sharon Bertsch McGrayne explains in The Theory That Would Not Die, Bayes was offering a solution to the inverse probability problem. You have a pile of encrypted code, or a crashed airplane, or a search query: all of these are effects; your problem is to find the most likely cause. (Yes, I know: to us the search query is the cause and the page of search results if the effect; but consider it from the computer's point of view.) Bayes' idea was to start with a 50/50 random guess and refine it as more data changes the probabilities in one direction or another. When you type "turkey" into a search engine it can't distinguish between the country and the bird; when you add "recipe" you increase the probability that the right answer is instructions on how to cook one.

Note, however, that search engines work on structured data: tags, text content, keywords, and metadata all going into building an index they can run over to find the hits. What Lynch is talking about is the stuff that humans can understand - raw emails, instant messages, video, audio - that until now has stymied the smartest computers.

Most of us don't really like to think in probabilities. We assume every night that the sun will rise in the morning; we call a mug a mug and not "a round display of light and shadow with a hole in it" in case it's really a doughnut. We also don't go into much detail in making most decisions, no matter how much we justify them afterwards with reasoned explanations. Even decisions that are in fact probabilistic - such as those of the electronic line-calling device Hawk-Eye used in tennis and cricket - we prefer to display as though they were infallible. We could, as Cardiff professor Harry Collins argued, take the opportunity to educate people about probability: the on-screen virtual reality animation could include an estimate of the margin for error, or the probability that the system is right (much the way IBM did in displaying Watson's winning Jeopardy answers). But apparently it's more entertaining - and sparks fewer arguments from the players - to pretend there is no fuzz in the answer.

Lynch believes we are just at the beginning of the next phase of computing, in which extracting meaning from all this unstructured data will bring about profound change.

"We're into understanding analogue," he said. "Fitting computers to use instead of us to them." In addition, like a lot of the papers and books on algorithms I've been reading recently, he believes we're moving away from the scientific tradition of understanding a process to get an outcome and into taking huge amounts of data about outcomes and from it extracting valid answers. In medicine, for example, that would mean changing from the doctor who examines a patient, asks questions, and tries to understand the cause of what's wrong with them in the interests of suggesting a cure. Instead, why not a black box that says, "Do these things" if the outcome means a cured patient? "Many people think it's heresy, but if the treatment makes the patient better..."

At the beginning, Lynch said, the Autonomy founders thought the company could be worth £2 to £3 million. "That was our idea of massive back then."

Now, with his old Autonomy team, he is looking to invest in new technology companies. The goal, he said, is to find new companies built on fundamental technology whose founders are hungry and strongly believe that they are right - but are still able to listen and learn. The business must scale, requiring little or no human effort to service increased sales. With that recipe he hopes to find the germs of truly large companies - not the put in £10 million sell out at £80 million strategy he sees as most common, but multi-billion pound companies. The key is finding that fundamental technology, something where it's possible to pick a winner.

ORG Events

Contact us

Email us

Write for us

ORGzine welcomes contributions. If you are interested in writing a comment on a digital rights issue,
please get in touch

About ORG

The Open Rights Group campaign for digital rights, and defend democracy, transparency and new creative possibilities.
ORGzine is the Open Rights Group digital magazine. The zine is a space for news, opinion, features, and debate over the social,
political and legal issues associated with digital rights.