Biz & IT —

Stephen Wolfram and the techno-dianetics of Google-ology

Physics prodigy turned software entrepreneur Stephen Wolfram can fairly claim to have revolutionized the math software niche with the 1988 launch of Mathematica, and that must have given him a taste for being the guy who revolutionized something, because with his 2002 book, A New Kind of Science), he set out to revolutionize nearly every scientific discipline at one whack, from biology to cognitive science to physics to cryptography. But the NKS revolution turned out much like the Segway revolution that slightly preceded it: lots of media hype and big talk—mostly from Wolfram himself—about how the world would never be the same, followed by the launch of something that, to this day, lives on mainly as an interesting geek curiosity.

But Wolfram hasn't let the "that's interesting, but I really have to get back to work" response to NKS from specialists in numerous fields dampen his revolutionary zeal, because he has apparently been at work for the past few years attempting to revolutionize the very production of human knowledge. Behold Wolfram Alpha, launching in May as a "true computational knowledge engine" that, well, allegedly "computes knowledge." As near as I can tell, Wolfram's new software takes in a sea of unstructured data from the web, structures it, runs algorithms against it, and produces "facts" and "answers" in response to queries that users enter via a Google-like search box.

If the "it computes knowledge" language above sounds suspiciously vague and outlandish that's not the fault of my summarizing skills. All of the text that has been written about Wolfram Alpha so far—a corpus that currently comprises all of two posts, one by the guy behind Twine (no stranger to techno-hype) and the other by Wolfram himself—is similarly vague and outlandish.

I'm not going to let myself get started excerpting Wolfram's blog post announcing the project, because I don't want to go into snark overdrive. But I will say of the announcement that I haven't read anything that thoroughly immodest and improbable since suffering through the opening pages of Dianetics. And no, the comparison with L. Ron Hubbard's crackpot chef d'oeuvre is not an exaggeration; Wolfram really does make a set of claims for his work that are up there with those of Hubbard on the "this science that I have privately devised has finally solved some of humanity's millennia-old problems" scale.

"It computes the answers"

I won't quote Wolfram, but I will quote Nova Spivak, who apparently spent two hours being blown away by a hands-on demo of Wolfram Alpha.

Wolfram Alpha actually computes the answers to a wide range of questions -- like questions that have factual answers such as "What country is Timbuktu in?" or "How many protons are in a hydrogen atom?" or "What is the average rainfall in Seattle this month?," "What is the 300th digit of Pi?," "where is the ISS?" or "When was GOOG worth more than $300?"

Think about that for a minute. It computes the answers. Wolfram Alpha doesn't simply contain huge amounts of manually entered pairs of questions and answers, nor does it search for answers in a database of facts. Instead, it understands and then computes answers to certain kinds of questions.

What kinds of questions does it "compute the answers" to? Only factual ones, apparently, where the answer is a "simple fact." Back to Spivak:

Wolfram believes that by focusing on factual knowledge -- things like you might find in the Wikipedia or textbooks or reports -- the bias problem can be avoided. At least he is focusing the system on questions that do have only one answer -- not questions for which there might be many different opinions. Everyone generally agrees for example that the closing price of GOOG on a certain date is a particular dollar amount. It is not debatable. These are the kinds of questions the system addresses.

Got that? It only deals with questions where the answers are "not debatable." If this is actually true, and of course it isn't, then you can substitute for "not debatable" equivalent descriptions like "not interesting" and "easily accessible by other means, like Google."

But I leave further hostile interrogation of Wolfram's and Spivak's explicit and implicit epistemological claims—most of which I don't suspect that they even understand are debatable—as an exercise to anyone with enough of a liberal education that he or she can define the words "hermeneutics" and "epistemology" without resorting to Google. And I leave evaluation of Wolfram Alpha's prospects as a "Google killer" to Web 2.0 pundits, and, ultimately, to the marketplace. (Regarding the latter, Wolfram Alpha's commercial prospects are considerably brighter than its singularity-inducing metaphysical ones, since Wolfram does know how to produce great software.)

In the end, any good humanist, scientist, or journalist knows how hard it is just to assemble a reliable and relevant set of facts, much less to take the next step and synthesize those facts into understanding, and then communicate that understanding to an interested reader. I expect that the May launch of Wolfram's service will remind everyone who overestimates the power of computers that this process is not something that can be automated—not even for apparently simple questions, and not even by Stephen Wolfram.

30 Reader Comments

If you "read between the lines" a little bit, I think this sounds like a potentially interesting software project. When you're searching for an answer to a fairly straightforward question, and the expected answer is pretty simple in itself (EG. one-word answers, like the name of a state's capitol, or numerical answers like the population of a given place according to the latest census), the typical web search engines aren't that great a solution.

(Google did add the ability to give answers to math equations as a search result, which is about the only example I can think of, of something like what Wolfram is suggesting. But that just focuses on being a basic calculator.)

Normally, you search with some keywords you think are relevant, and you have to read through the resulting pages to see if you can find the answer among all the text.

Sure, this isn't going to give "absolute answers" to questions about anything subjective like medical treatments or psychological conditions ..... but it's a way to save people some effort on searches, where you have a very specific result in mind.

Very easy and oh so satisfying to be critical of something that isn't really well understood.

It's also easy to make outlandish claims about something without providing any details. That's what he's being critical of. He's not at all being critical of the thing itself.

Hype is almost never on target. We *do* know how this story ends... and we've been given a significant hint as to how it will all turn out by the hype itself:

quote:

Wolfram Alpha actually computes the answers to a wide range of questions -- like questions that have factual answers

... it will be significantly useful for a small subset of queries (making the assumption that it does this well), and seemingly ignore a vast swath of search.

There's nothing wrong with having a niche and filling it well, but from the excerpted statements made by the people involved, it seems they want you to think that search is a solved problem from here on out, and I can see no details that even remotely back that up.

I read NKS, or tried to. It's repetitive and actually boring. But the guy does produce. I'm going to ask his engine about cumulative degree heating and cooling days in various locations because that info is actually very difficult to find.

Quoting Wolfram: "But I’m happy to say that with a mixture of many clever algorithms and heuristics, lots of linguistic discovery and linguistic curation, and what probably amount to some serious theoretical breakthroughs, we’re actually managing to make it work."

So it's a kludged together mess of ad-hoc rules and descriptive linguistics. Pfft. This isn't computability, nor is it even interesting machine learning. Which is sad, because there have been some (IMO) really nifty advances in machine learning recently, which might actually one day give us elegant, unsupervised machine summary and re-expression of a corpus instead of the current situation where the supervision is so overwhelming that the actual learning part is essential nonexistant.

Leaving aside such Godwin Facts I assume it must have some kind of weighting and feedback mechanism to support its heuristics which begs the question of how they hope to avoid being gamed or driven down the avenue of solipsism? And if somehow this thing works, which I am deeply sceptical about, how long then before we get Fact Engine Optimisers?

The two most recent version of Mathematica (6 and 7) have included access to a reasonably large proprietary database. It includes, for example, historical and current weather, various mathematical structures like graphs and geometrical shapes, details about the known physical elements, and the human genome. Based on this announcement it seems certain the new product leverages some extension to this database. As Wolfram seems to like "big ideas" it wouldn't really shock me if this new product was the original purpose for the database, and Mathematica access was of secondary importance. I'd also predict we'll be seeing the descriptor "curated" plenty in the future.

One thing in Wolfram's post particularly caught my eye: "I wasn’t at all sure it was going to work. But I’m happy to say that with a mixture of many clever algorithms and heuristics, lots of linguistic discovery and linguistic curation, and what probably amount to some serious theoretical breakthroughs, we’re actually managing to make it work." Will we ever see these advances (if that's what they are) in journals? While I'm not an "information wants to be free" zealot, Wolfram is notoriously tightfisted with many aspects of his software's capabilities, and I think there is a legitimate concern if even some of the above claims are true. Granted, the blog makes it seem NKS is somehow critical to the whole venture (I'm guessing cellular automata?) which could make for awkward reading if he isn't willing to separate the raw mechanics from the philosophy. (Note: I have not read NKS, but have read various summaries, critiques, etc.)

An additional concern is citations. Citing Wolfram Alpha alone should be a quick path to a failing research grade, so it had better include references to the primary data. Even though Wikipedia is very uneven in this regard it isn't quite the black-box Wolfram Alpha could become. I'm not holding out a ton of hope on this count, however. Curated data offer many benefits, but fine-grained references are essential for a black-box service that offers access to esoteric information. As far as I can tell these are not currently in the Mathematica-accessible database. Instead, there are short lists of principal sources for the different major categories of information, such as here.

Other legitimate discussion, such as what constitutes a "fact" or how we contextualize information, I'll leave to other posters. Anyway, if Wolfram wants to spend time trying to revolutionize science, more power to him, but let's stay grounded.

Background: I frequently use Mathematica as a glorified form of research scratch paper and notebook, and for a few larger projects. It serves my needs quite very well, and I happily recommend it for similar purposes.

Since the thing hasn't been launched yet, there's little point in having an opinion about it. We shall see when we can see. I'm sure that like all of Wolfram's work it will be useful for something!

For what it's worth, NKS does contain at its core a fundamental philosophical proposition which is neat, simple, necessarily irrefutable and yet reduces almost all physical sciences to measurements of the shadows in Plato's cave. It's worth a read for that alone.

It also has to be said that NKS would be half the size if you took out all the personal pronouns. :-/

There is no doubt that Wolfram is a master of hype and hyperbole. There is also no doubt that he is extremely gifted, extremely hard working and that he gives a lot of thought and attention to the "big problems".

Will Alpha succeed on the scale Wolfram predicts? I cannot say. I would be surprised, though, if it doesn't exhibit some significant technological advances. He deserves credit for trying to look beyond the incremental and trying to go for the big leap forward.

There is plenty of room for improvement in Human/machine interaction and whether Alpha represents a dead end, an incremental change or a huge leap forward remains to be seen. Still, Stephen Wolfram deserves credit for devoting his talents and resources to problem solving.

I'm glad I read the RSS synopsis and generally ignore the article title, otherwise I might have dismissed what seems to be a pretty interesting new method for solving a certain class of problems. I urge everyone to read the linked-to Twine blog post, it contains a lot more interesting and relevant commentary than this Ars post.

In particular, the end of the Twine post mentions how the output is fairly complex. These aren't the "one word answers" from some simple rule engine that many of the commenters seem to assume is the case.

Clearly none of us can guess whether it's real or hype, nor can we guess at the usefulness of whatever he has built, but I'm certainly interested in taking it for a spin. I'd also be interested in seeing if and how he productizes it -- I can imagine large businesses developing internal models of their own processes and turning the engine loose on their data as a kind of decision support system for the non-technical business user...

Kind of reminds me of the Slashdot's famous dismissal of the iPod after Apple's original announcement.

Also, uncharacteristically for Jon, the article seems to largely miss the point of what Wolfram Alpha is supposed to be.

quote:

From the article:As near as I can tell, Wolfram's new software takes in a sea of unstructured data from the web, structures it, runs algorithms against it, and produces "facts" and "answers" in response to queries that users enter via a Google-like search box.

According to both linked posts, Alpha's data set is not based on a "sea of unstructured data" that it automatically absorbs and structures. Rather, it is based on a discrete -- but large -- trove of already structured data sets chosen by human operators.

In fact, there seems to be hinting in those posts that they system requires human assistance not just in choosing data sets, but in integrating new types of data. That is, the systems would not "know" what to do with airline schedule data (for example) without some "training". (please forgive the anthopomorphisms, it's just easier to describe it that way)

Sounds like the Giant Brain in the Infosphere from Futurama. There are some good facts in there, like "Beavers mate for life", "11 > 4", as well as, "For quality carpets visit Kaplan's carpet warehouse!"

As near as I can tell, Wolfram's new software takes in a sea of unstructured data from the web, structures it, runs algorithms against it, and produces "facts" and "answers" in response to queries that users enter via a Google-like search box.

Nope. Wolfram has hired about 100 people to manually encode mathematical models from various fields in Mathematica. All of the "facts" and "models" in the Alpha database are entered explicitly. This is not a web-crawler. Wolfram Alpha never deals with "unstructured data" except in its natural-language query parser in the UI.

It's basically just a bigger library of constants and algorithms for Mathematica. I don't see the philosophical problems.

I waded through a good chunk of A New Kind of Science and kept waiting for some light bulb to go on in my head of why it was supposed to be interesting, let alone groundbreaking. Yes, simple algorithms can generate complex behavior, and..?

Nope. Wolfram has hired about 100 people to manually encode mathematical models from various fields in Mathematica. All of the "facts" and "models" in the Alpha database are entered explicitly. This is not a web-crawler. Wolfram Alpha never deals with "unstructured data" except in its natural-language query parser in the UI.

It's basically just a bigger library of constants and algorithms for Mathematica. I don't see the philosophical problems.

This isn't computability, nor is it even interesting machine learning. Which is sad, because there have been some (IMO) really nifty advances in machine learning recently, which might actually one day give us elegant,we search with some keywords you think are relevant, and you have to read through the resulting pages to see if you can find the answer among all the text.Like we search copy dvd then we find http://www.copy-dvd.org , the two has the same meaning.

there seems to be hinting in those posts that they system requires human assistance not just in choosing data sets,http://www.vcao.net These aren't the "one word answers" from some simple rule engine that many of the commenters seem to assume is the case.