Wolfram|Alpha: The First Year

Years ago I wondered if it would ever be possible to systematically make human knowledge computable. And today, one year after the official launch of Wolfram|Alpha, I think I can say for sure: it is possible.

It takes a stack of technology and ideas that I’ve been assembling for nearly 30 years. And in many ways it’s a profoundly difficult project. But this year has shown that it is possible.

Wolfram|Alpha is of course a very long-term undertaking. But much has been built, the direction is set, and things are moving with accelerating speed.

Over the past year, we’ve roughly doubled the amount that Wolfram|Alpha knows. We’ve doubled the number of domains it handles, and the number of algorithms it can use. And we’ve actually much more than doubled the amount of raw data in it.

Things seem to be scaling better and better. The more we put into Wolfram|Alpha, the easier it becomes to add still more. We’ve honed both our automated and human processes, progressively building on what Wolfram|Alpha already does.

When we launched Wolfram|Alpha a year ago, about 2/3 of all queries generated a response. Now over 90% do.

So, what are some of the things we’ve learned over the past year?

First, encouragingly, people really seem to “get” the idea of Wolfram|Alpha. Perhaps those early precursors in science fiction helped. But people seem to understand the idea that they can ask Wolfram|Alpha a specific question, and it will compute a specific answer.

And indeed, right now, something over 50% of all queries to Wolfram|Alpha give zero hits in web searches: they are fresh new questions that aren’t explicitly written down anywhere on the web.

Another wonderful thing we’ve learned over the past year is that there are lots of people in the world who really want to see Wolfram|Alpha succeed. It’s been great to see how much support there is for what we’re trying to do, and to get so much helpful feedback.

Particularly valuable have been all the experts in so many areas who have volunteered their time and expertise—as well as their data and methods—to help us achieve our goal of deep, accurate, coverage of as many domains as possible.

I suppose one lesson we’ve really absorbed this year is how important it is to be working with the best, definitive, primary sources. By now, practically none of the raw data in Wolfram|Alpha is just “foraged from the web”.

Mostly it’s fed directly into Wolfram|Alpha from primary sources, based on relationships that we’ve developed—especially over the past year—with whomever is responsible for the data.

Something else we’ve learned, though, is that importing the raw data is perhaps only 5% of the work. After that we have to actually understand the data: how it’s represented—its units, conventions, etc.—and what it means. We have to align it with data we already have. And then we have to see how to compute from it, how to pick out what’s important, and how to present it.

We also have to work out how people will refer to the data: what they’ll call the entities; how they’ll describe properties they want. There’s almost never a systematic source. The web—and things like Wikipedia—are where we start. Doing automatic and manual “linguistic discovery”, trying to build up the right lexicons and grammars.

But of course now we have another crucial source: the huge stream of actual queries that have been fed to Wolfram|Alpha.

I’ve looked at data in many different fields. And I have to say that one of the surprises to me this year is how incredibly precise the “laws” of the Wolfram|Alpha query stream are. Perfect exponentials. Perfect power laws. Better than almost any physics experiment I’ve ever seen.

These laws tell us something we already knew: Wolfram|Alpha will never be “finished”. There’s always more tail. But they also tell us that with all the knowledge we’ve put into Wolfram|Alpha, we’re not in bad shape.

We study the Wolfram|Alpha query stream, distilling it to get a giant “to do” list. There are still some “obvious” things on it—like deeper coverage of popular culture, sports, and local information. And we’re working on these.

When we first launched Wolfram|Alpha there were some things I thought were just too obscure ever to cover. I kept on giving the example of “france goats” as a query we’d never be able to respond to.

But then, suddenly, quite a few months ago, I tried this query—and it worked! We’ve got data on livestock in France. With a time series of goat numbers back to 1971!

And I’ve now had this kind of experience many times. As we get deeper and deeper into the data that exists in the world, I keep on being surprised at how much is actually knowable, or computable.

One lesson we’ve learned, though, is that nothing is ever truly finished. Even before Wolfram|Alpha launched a year ago, we had by far the largest, most scholarly treatment of units of measure the world has ever seen. Nearly 8000 units, with all their patterns of usage carefully analyzed, and organized in computable form.

But over the course of this year, every week—from the corners of Wolfram|Alpha—we find more units to add. Like the “boepd” (barrel of oil equivalent per day), the “slinchf” (slinch-force), the “spat” (unit of solid angle), the “digney” (unit of resistance), or the “new hay load” (unit of mass).

One principle of mine is always to have a portfolio of development projects going on. From little enhancements to core multi-year engineering efforts to pie-in-the-sky research investigations.

New data flows into Wolfram|Alpha every second. But this year we were able to release a new fully-tested version of the whole Wolfram|Alpha codebase every single week.

We’ve introduced a few important new general frameworks this year. And as it happens, we have some major new ones currently not far away in the pipeline. Involving data. And computation. And linguistics. And presentation.

For me, Wolfram|Alpha is an exciting intellectual adventure. Not just for all the areas of knowledge it covers. But also because it represents a whole new paradigm for computing and for thinking about knowledge.

One of my consistent observations in the past has been that it takes me a decade to really absorb a new paradigm, and to see how to take the next big steps with it. And perhaps that will be the case with Wolfram|Alpha too.

But I’m happy to say that—perhaps because of the terrific team we have—I think there are some pretty big steps already visible.

We’ve recently made some breakthroughs, for example, in understanding how to bring together Wolfram|Alpha and Mathematica—to create a fascinating hybrid of ordinary human language and precise computer language, that I suspect represents the future of systematic interactions with computers.

We’re understanding how to make Wolfram|Alpha not just operate on its internal data and knowledge, but absorb new input from documents and sensors and feeds.

I even think that with the Wolfram|Alpha paradigm, I may have figured out something quite fundamental about a very abstract topic: the systematic automation of mathematics.

A lot has happened in the practical deployment of Wolfram|Alpha this year. The API. The beginnings of integration with Microsoft’s Bing search engine. The iPhone app, now the iPad app. The first ebook with integrated Wolfram|Alpha. And also the delivery of the first Wolfram|Alpha appliances for deploying custom versions of Wolfram|Alpha in enterprise environments.

But this is only the very beginning.

In many ways we’ve been holding back in expanding the use of Wolfram|Alpha—waiting until we felt we’d reached the right point. But now we’re there. And this year we’re going to be energetically making Wolfram|Alpha as broadly available as possible.

To mark our anniversary right now, we’re releasing a little burst of new features.

We’re also making a systematic addition to how we interpret queries. Usually, Wolfram|Alpha works by trying to understand each query precisely and completely. And that is what one wants, if it’s possible.

But the linguistic and content space covered by Wolfram|Alpha has now filled in to the point where there’s also something else to try. Even if Wolfram|Alpha can’t interpret a particular specific query, it can still try to find the “nearest” query that it can interpret.

And as of today, that’s how Wolfram|Alpha is set up to work. Over the next little while, there’ll be considerably more sophistication added to the notion of “nearest queries”. But already this technique is adding quite a bit to the typical experience of using Wolfram|Alpha.

Needless to say, Wolfram|Alpha still can’t do everything. A few days ago the team had just made live a test version of the new “nearest queries” capability. And I was looking at our real-time monitor of queries that we still couldn’t respond to.

And flashing by came “chickens on mars”. Well, I think that one will be a while. Then, a moment later, came “where is my hat”. I guess that might not be so long. Whether through RFID or vision or something else, I think we’re on a path to make Wolfram|Alpha be able to respond to that!

I’ve spent most of my adult life doing very large projects. Wolfram|Alpha is surely the largest and most complex so far. Over the course of this year we’ve continued to build up a terrific team. That’s turning what at one time seemed like an impossible goal into a practical, highly scalable, engineering effort.

With the help of many people, we’re building a remarkable intellectual structure—that’s steadily moving from being “interesting” to “convenient”, to absolutely necessary. Leading to a time when we’ll all wonder how on earth it was that before May 18, 2009 we could ever exist without Wolfram|Alpha….

27 Comments

Fantastic news! Wolfram|Alpha is probably the most exciting project on the web right now, and I look forward to new developments of this incredible knowledge engine. My most recent use of Wolfram|Alpha has been to check the eigenvectors and eigenvalues of students’ assignment answers on the fly from my iPhone.

I am still hoping that you deliver on an early notion you expressed which was to develop W|A as a new type of front end to Mathematica. One that gave a a significant contribution of code for a Mathematica user with which to refine and add to his or her initial enquiry in W|A.

I believe that W|A could be the key to unlocking understanding and profitable use of Mathematica.

Mathematica, as an ever expanding and extensible system, will continue to have an inherent problem of documentation and comprehensibility for both new & expert users. A “learn by starting example” pedagogy seems the most practical strategy for would be users.

You clearly believe that by the efforts you have put into the Demonstrations Project and the rich example set in your Mathematica Documentation for each Function.

This is fantastic. I absolutely love the design and can’t wait for everyone to be using Wolfram Alpha in the near future! Waiting for more information to be available before really pushing it to the world was a great decision. Thanks for the post!

I absolutely love wolfram|alpha for use as reference at university. The units of measure can be an issue at times though. Wolframalpha fails to convert some unit of energy, such as kcals to wavenumbers ….

Wavenumbers are the standard unit of energy for many quantum mechanical calculations and graphs.

It would be interesting if your site could interpret APL – the fit might work well, as APL is a highly terse mathematics-based programming language. APL “understands” matrices and other mathematical structures. I once wrote an APL program to enumerate all solutions to the “8 queens” (how do you put 8 on a chessboard so that each has its own row, column, diagonals) problem.

Is there currently a way for a site visitor ask to have a procedure/program computed, not just an equation?

I’ve just been introduced to W|A and Retirement will never be the same – it will never again be long enough. I wrote my first computer program in 1958 – back in the era of paper tape I/O and machine level coding. It’s hard for me to believe we’ve come so far. Wonderful.

I recived this link from an email. I have really had fun and looked up usefull info. I really enjoy the format. To be able to ask a question in plain language is really neat. I look forward to see the growth . I have forwarded to several of people one of which is my fatherinlaw which happens to be an engineer and math wiz he was impressed. Thaks its a feather in my cap

A friend sent me this website. WOW!!! I immediately forwarded to many friends. I wish I had this when I was teaching – no, better yet, when I was in school. My time on the computer will never be the same! Thank you!!!

Surveys repeatedly show genealogy as one of the most popular uses of the Internet, but your support of it for factual data is almost non-existent. Query including any given name ignores anything else in the query. Very poor.

I first saw Wolfram|Alpha last year, bookmarked it and then totally forgot about it. A good friend of mine (a chemist) recently sent me an e-mail with a link to you. I immediately remembered and came back to check it out.

What you are doing here is remarkable. Having worked in the computer industry since the mid 80’s, it has been a huge gripe of mine how slow the development of search has been. To me, Wolfram|Alpha represents the future of search, at least, intelligent search! Please, give me data, not a veritable plethora of useless links!

You say … ‘With the help of many people, we’re building a remarkable intellectual structure – that’s steadily moving from being “interesting” to “convenient”, to absolutely necessary.’ ..

Absolutely necessary is absolutely right. Your vision is brilliant!

Thank you for your great vision Stephen and even more so for putting action, energy and passion into it.

[...] our mission of making all of the world’s knowledge computable. Since Stephen Wolfram’s first annual update, we’ve been introducing curated data at an unprecedented rate, developing new site features [...]