Posted
by
kdawsonon Monday April 27, 2009 @01:59AM
from the computational-knowledge-engine dept.

An anonymous reader points out a ReadWriteWeb piece on an hour-long demo of Wolfram|Alpha (which we discussed at its announcement). Stephen Wolfram does not like to call it a "search engine," preferring instead the term "computational knowledge engine." It will open to the public in May. "The hype around Wolfram|Alpha, the next 'Google killer' from the makers of Mathematica, has been building over the last few weeks. Today, we were lucky enough to attend a one-hour web demo with Stephen Wolfram, and from what we've seen, it definitely looks like it can live up to the hype — though, because it is so different from traditional search engines, it will definitely not be a 'Google killer.' According to Stephen Wolfram, the goal of Alpha is to give everyone access to expert knowledge and the data that a specialist would be able to compute from this information."

...with their web-crawling keyword-sculling technology, but it's only normal that someone else was researching what to ~do~ with all the data. IE (data analysis for human comprehension) and Google would make one fierce - and useful - blend.

This blog post reads more like a marketing piece written by a shill, and if there is any hype, it just seems like it's just self-delusion or just wishful thinking at this point.

Any search engine query and corresponding results can be manually optimized and tweaked to quasi-perfection. In fact, that's the exact recipe many of the now defunct search engines were using a while ago. They would optimize the hell out of a couple of queries or use case scenarios, and then they would fall in love with the layout and content of their contrived results. And then, when the users didn't use the search engine the way the developers wanted them to use it, the developers tried changing the behaviors of their users instead of trying to change their search engine. For the most recent example of this, of actually one company that still had money to waste a year ago, think back to the ask.com commercial where they tried to teach us about the *cool* ajax feature of previewing web sites. Not that this feature was bad per say, but if it was any good, or groundbreaking in any usable way, users would be telling each other about it -- they wouldn't need to be educated about it -- at such a large expense.

And the same goes for the tone of this blog post was written in. It was written from the perspective of a shill, or from the perspective of the company itself, but not from the perspective of an actual user. Personally, I don't want to know about the supposed hype or marketing-speak from the developer's own mouth, I just want to know how useful it's going to be for me. And I don't want contrived examples, I want one or two random example from the (supposedly independent) blogger himself (if possible). And I don't want an actual screenshot of the search box, I want the actual search box itself. Am I only one who tried clicking on it? And if you're going to give me the screenshot of something, give me the screenshot of the search results page (at the very least) and not just a verbal description of it.

Which brings me to my last point: Show. Don't tell. And if there is one thing that Google does well, it's that they don't try to prematurely hype their nascent lab products. They release them first, then they see if the users fall in love with their creation (or not), which is rather a hit-or-miss proposition and a long iterative process. So don't tell me about a fancy search engine, if it's not even out for a public trial yet. I want to try it. I don't want to be told about it.

IE (data analysis for human comprehension) and Google would make one fierce - and useful - blend.

Finding relevant information other than the Wikipedia page for any specialist topic is a pain in the ass. If these guys can find a way to index only the good stuff, i.e. not based on general popularity but content accuracy, they could have a future.

Do I have to remind everyone how annoying it is to search for technical documentation for something vaguely Linux-related, only to find the first 30 hits are various forums with more or less clueless newbies discussing installation difficulties and the syntax of apt-get?

Do I have to remind everyone how annoying it is to search for technical documentation for something vaguely Linux-related, only to find the first 30 hits are various forums with more or less clueless newbies discussing installation difficulties and the syntax of apt-get?

Gods yes. And not to mention that 80% of them are from 2006 or earlier.

That works too...I had many many queries which I was redirected to EE, and then I found that answers were not available - and skipped the page.Actually, I found answers to most queries from other sites, otherwise I would have paid them to get the answers.Only now, after you had mentioned that it is there below, did I notice that the answers are infact available.So, I guess it is a real working idea.

Finding relevant information other than the Wikipedia page for any specialist topic is a pain in the ass.

Only if you don't know how to use a search engine properly. On the contrast, I often find the wikipedia page on any particular topic to be the least informative, at least if you want any level of detail and not just a very rough overview.

Yes, neither Google nor any competitor regularily give you the most interesting page at the top. But if you know how to narrow down your search, and realize that quality results will take you more than 10 seconds to find, then it's all pretty straightforward.

Damn you for not logging in! I could not, for the life of me, figure out what IE was supposed to stand for, but I knew that neither of the uses I am accustomed to were correct. Thank you for that.

On a related note, can we have a Slashdot moratorium on pointless and confusing abbreviations? Last I knew, "Information Engineering" wasn't such an extensively used term that it warranted abbreviating, especially not given that "IE" is already in heavy use. That is, of course, unless you want me to start talking a

I've been using mathematica professionally since nearly its inception, and I have never found it to be incorrect or in any sense not useful. It's not the correct tool for every purpose, but then again it's not very good idea to use a razor to pick your nose either.

...according to Stephen Wolfram, Alpha is built on top of 5 million lines of Mathematica code which currently run on top of about 10,000 CPUs (though Wolfram is actively expanding its server farm in preparation for the public launch).

5 *million* lines of Mathematica? How many code monkeys does he have working for him?

I knew I should have looked this up before clicking submit: this makes Wolfram Alpha 1.25 million times more complicated than the entire universe, which Wolfram expects to be expressible in 4 lines [umich.edu] of Mathematica [metafilter.com].

I knew I should have looked this up before clicking submit: this makes Wolfram Alpha 1.25 million times more complicated than the entire universe

All of this time we've been worried about the LHC hoovering-up the solar system with a black hole, when in fact a search engine running mathematica will in fact be frying the galaxy by increasing local entropy by a factor of millions:)

And 10,000 CPUs and expanding for the public launch... who is going to pay for all that? It's not that Google has a few more of those CPUs running now, but when Google went public I'm quite sure it was less. They just expanded with the expansion of their market.

Maybe this is the late 90s again (prepare for totally unrealistic user numbers), or this search engine indeed needs so much horse power, meaning in effect that it can never become profitable.

Google at the time had a.o. AltaVista to contend with, at the time the number-one search engine. It was set up by some college students in their dorm room, who had a better idea about searching/indexing web pages, and managed to implement that idea. Then it went live from a single computer for their friends. Who told their friends, and soon the whole campus used them, etc.

Google never advertised their service, it was pure word of mouth. They just got better results than the competition. And they got started of course in a geek environment, so the first word got out and spread quickly.

Good chance that the "next Google" starts up just like that. Hell, I bet The Pirate Bay started up that way. Craigslist did so at least - just a guy called Craig who started a local classifieds page for friends and friends of friends.

Yes the stakes are huge but just throwing money at the problem generally won't get you far, I would say good chance it gets you doomed even as big money often takes away the focus from the innovation that is needed.

If you mean the next big internet based company you might be right, if you mean the next dominant player in the search market then you're almost certainly wrong.

Doing anything which requires an exhaustive or near exhaustive database of internet content requires far more resources than it would have in the mid-90s. Doing something that requires you to actually rate / select from this database of billions of records also requires resources well be

BackRub is written in Java and Python and runs on several Sun Ultras and Intel Pentiums running Linux. The primary database is kept on an Sun Ultra II with 28GB of disk.

That were, at the time, very serious computing resources, but nothing special for a university to have available. Nowadays this will be the same: just add a zero or two for to the specs. It is even something that a normal start-up with venture capital funding can afford, start up a little smaller and it becomes living room material. 1000/1000M Internet is readily available even for consumers, so even bandwidth is not a problem. For starting up there is no need to index "the Intern

Doing anything which requires an exhaustive or near exhaustive database of internet content requires far more resources than it would have in the mid-90s.

True, but largely irrelevant; winning the search engine war might requiring that, breaking into search doesn't. Better results in even a narrow domain would establish a toe-hold, and then you grow it from there.

Google won't be killed until someone perfects an AI that you can have a search 'conversation' with, who can understand goddamn context and intelligently narrow down, find relevant articles that don't contain your keywords, etc. Kinda like the librarian from Neal Stephenson's "Snow Crash" novel, but more powerful.

The main reason no one will beat Google until then is that Google is extremely wealthy and can outspend you as it continually perfects information sorting itself, not to mention buy any technology that comes close to threatening it. If you really developed a Google-killer and presented it to the world, do you also have the stones to turn down, say, $100 million? I don't think so, it would take you probably 20-30 years to make that on your own, if you're lucky, with the search field full of competition and Google's mature business-plan in place. Even the days of Alta-Vista were essentially the Cowboy West, unsophisticated and without any proven business plans. Google walked in and owned right away, then discovered how to make money off search when no one else was.

Even then, the founders of Google tried to sell their brilliant search idea not for $100 million dollars, but for $1 million dollars, and there were no takers. They were forced to go it alone. If someone had offered them $500,000 they probably would've taken it and ran.

Although, if you really do develop an AI, there'll be a billion more profit opportunities than search, that's peripheral. An AI can do menial labor far better, faster, stronger than a human. What happens when McDonalds is staffed solely by robots. That would be pretty damn cool actually. They work for the price of electricity, maybe we can get the price of a cheeseburger back down to $0.25:D

The main reason no one will beat Google until then is that Google is extremely wealthy and can outspend you as it continually perfects information sorting itself, not to mention buy any technology that comes close to threatening it.

Yes because it's always the wealthy, on top company that innovates the ground breaking ideas, like the airplane, the home computer, the telephone...

I'd be happy if they just made it possible to filter out all of the shopping reviews and blogs.

When I'm looking for information, I don't want to know 500 people's uneducated opinion on something(actually millions but who searches more than 10 or so pages deep in Google?). I'm 99% of the time looking for the original source data.

Some years ago I was working late one night. One of the team went out for food. When they returned, I was given a burger. I took one bite of it and spat it out. It was awful. I asked them where they got it from. They replied "Oh, the McDonalds just down the road".This was in the suburbs west of Boston just off I495.

I immediately vowed never to visit one of their establishment and never to eat what they call 'food'.

"Wolfie" Steve Wolfram HAS developed a rather successful software for mathematical modeling. You may have heard of it: "Mathematica". He also wrote a book called "A new kind of Science" which lays out some interesting ideas based on what are called "Cellular Automata" - basically a simple algorithm turned into a loop.

Certain, very simple algorithms appear to be rather respectable pseudo-random number generators, and he uses the fact that they are (repeatable) pseudo-random number generators to be a plus rather than a minus.

I'd like to see some challenging of his ideas, specifically, just how "random" is the output of these simple algorithms? Are they really as incompressible as they seem? It strikes me that there are only so many states possible in a narrow, N-bit wide field that he uses like a register, and thus this would severely limit the "randomness" in the result to being far less than claimed.

In his book, he went too far - he even suggested that cellular automata explain all the phenomenon of the universe! - and for that, his other, useful ideas will tend to be dismissed, even if he IS right.

Using your melting point of iron on google gives the answer 1811K right at the top along with a link source.
Alpha is going to have to be alot better which would require human intervention which leads to a Yahoo type directory and that has alot less entries.

No, it does not attempt to compete with Google directly. There are plenty of scenarios that won't be useful if you have a search engine knowedlege base like this, such as finding advice on a new camera purchase, or whatever. Google will still rule in that regard. Finding out actual facts from solid data, and building new facts based on an existing scientific foundation, all asked in natural language, on the other hand... Google has never even tried improving in that area, and that's where this service is su

If they can figure out how to get this thing to understand financial data, it would be quite useful. That whole area needs more theoretical work.

Machine understanding of financial data is tough. Partly because the data is willfully obfuscated. I once developed a system for turning SEC filings into XBRL (which is an XML representation for financial statements.) At one point, I had several hundred euphemisms for "Net Loss". The connection between financial reporting and reality is at times tenuous.

Accounting is fundamentally mis-designed. The problem is that some numbers are actual, some have tolerances, some are estimates whose actual value will be known at a future date, and some are estimates whose actual value will never be known. Numbers of all four categories are added, and the result is given as a number without a tolerance. That's just wrong. Accounting works that way for historical reasons; it was designed when arithmetic was expensive. Why it stays that way is more interesting, but beyond the scope of this posting. Because of these problems, machine understanding of traditional accounting data is very difficult.

(Back when I did Downside [downside.com] I was more into this, but when I started getting invited to accounting conferences, I realized I didn't want to get into accounting standardization as a field.)

Meh, financial analysis is tough because it's a chaotic system that monitors itself. Upon monitoring itself, it changes the system. Sure, you might be able to model the data, but models are just simplifications of the actual system. And this sux because the minutia of each possible data point could have wide-sweeping implications for the whole system. You come up with a tool to measure the whole system (because it's not simply the sum of its pieces), you'll be king of the world.

Finally someone who understands the role of chaos/complexity in finance - it seems to me that even economists in general have never viewed it that way (though some are working on it).

I don't think you can come up with the tool you are suggesting, though, because the tool would alter the game and would thus have to take *itself* into account in the model, and it can only do this through the simplifications you want to avoid.

For decades chaotic systems (explicitly labeled as such) have been successfully modeled using statistical methods.

I don't think you can come up with the tool you are suggesting, though, because the tool would alter the game and would thus have to take *itself* into account in the model, and it can only do this through the simplifications you want to avoid.

While simplifications are often unavoidable, it is worth noting that there are many examples of such models working to some degree. For example, the valuation models for a number of modern derivatives, Moore's Law, the Law of Supply and Demand, and the so-called inflation-unemployment "trade off".

FWIW The valuation of REAL modern derivatives (Options, swaps and futures) will not be performed using simple differential equations (like Black-Scholes or Binomial model) because such models are based on a lot of very strong assumptions.

Usually what you use are monte-carlo simulation models or other algorithmic driven process (i.e., computational process) which can calculate the correct price of the Derivatives under a weaker set of assumptions.

Accounting works that way for historical reasons; it was designed when arithmetic was expensive.

As an engineer who had to study accounting briefly in order to get my degree, it made me want to scream: "What, you haven't discovered what negative numbers are by now ?!?" and many other expletives. It stays that way because it allows for creative accounting.

I think it is worse than that. They use those formulas because it prevents them having to think. I had accounting in a one semester stint in the Business School at a Large Unnamed University (before I realized that way madness lay). In the middle of an exam, I couldn't remember the Big Formula for solving one particular problem. So I derived my own, solved the problem with the correct answer. My formula was a special case of the Big Formula and my formula was just fine for the particular test problem. I got no credit, that's when I figured Business School Product deserved no credit.

Another query with a very sophisticated result was "uncle's uncle's brother's son."
[...]
Alpha actually returns an interactive genealogic tree with additional information, including data about the 'blood relationship fraction,' for example (3.125% in this case).

Just announced: Corey Hart has joined on as the lead promoter of the new "computational knowledge engine." In related news they have now renamed the engine to signify the merging of their separate talents.

"I believe that Wolfram Hart has the ability to become the Alpha and the Omega of internet informatics" said Hart in a midnight press conference.

Not everyone is celebrating this new turn of events, however. A man only identifying himself as Angel has come out in opposition to a company who openly support those that wear their sunglasses only at night.

Alpha, however, will probably be a worthy challenger for Wikipedia and many textbooks and reference works. Instead of looking up basic encyclopedic information there, users can just go to Alpha instead, where they will get a direct answer to their question, as well as a nicely presented set of graphs and other info.

So this means we just get the straight answer in the future. No more thinking for yourself, no more understanding where the answer comes from, no more critical thinking about the validity of the answer. E.g. TFA mentions that the answer to how many Internet users there are in Europe includes the factoid that there are only 93 in Vatican City. Is this true? Well it must be because Alpha gives it, right? Or maybe it is not true? But why would it be not true and what would be a more realistic number? How many people do really live/work in Vatican City, for example? How does this relate to the number of Internet users?

An encyclopedia search will give one heaps of background information that is highly relevant to the question, and gives a lot of understanding about the answer. It makes the answer more than just a number.

For example if one would look up the question "what is the national flag of the USA", the answer is of course "the stars and stripes", and may include an image. But now I happen to know there is a story behind it: why this number of stars, and that number of stripes, and those colours. I bet this will be in Wikipedia's answer but not in Alpha's answer.

Search engines like this sound really interesting to me, and can be very useful, though it will never replace textbooks and encyclopedias. There is just so much more to answer to a query than just a straight number. And there are so many questions that can not be answered that way, such as "why is polcarbonate so much more temperature resistant than polyethylene?" for example. The full answer to this question includes details about the chemical make-up of the two polymers, and how polymer chains work. That is what textbooks are for.

I read his book "a new kind of sciene", but while it is interesting indeed, his writing style is so full of himself that it gets annoying after a while. It looks unprofessional. He should have used a more neutral writing style and not mention himself all the time. What he did really hurts the book, and my image of him whenever I hear about other projects, like this one, related to his name.

That'll be useful for the 5 billion people who don't happen to speak it. He can't claim to be creating a google killer in one sentence then complaining about how his company doesn't have the money to do multilingual in the other. Come on , at least try french and spanish , its not hard to find fluent speakers of those languages in north america!

Depends on what their system needs. If it's just localization of output, it's reasonably easy. But if it's linguistic resources for language analysis, what's available for english is orders of magnitude more than for any other language (WordNet, VerbNet, FrameNet, XTag...).

The true value of an expert is not the information he accesses, but the use he makes of it. They can often perform seeming feats of mental teleportation, jumping from observation A to conclusion B without going through the in-between steps of reasoning through years of honing their intuition, something no search engine can do (yet). Also, anyone who's ever tried to explain something technical to a non-technical person and been met with blank stares knows that it's not just about information - it's about und

Not only is this not a Google killer (it's not even a search engine, so how can it be?!), I very much doubt it'll be of any use/interest to anyone outside of the intellectual elite. Googling for "swine flue" is of widespread interest, but I suspect that Alpha-ing for computed relationships and statistics is not.

For anyone who has yet to read about Alpha, what it is is basically a large expert system written in Mathematica that computes the answers to queries covering a very large real-world knowledge domain. I havn't even read that it goes out to the web at all - it's basically based on a huge human collated/organized ("curated" in the Alpha parlance) data set of statistics and relationships. Apparently the results are presented in a very slick way including charts and graphics.

No doubt Alpha is a huge achievement in it's chosen domain of knowledge organization and computation, but I find it hard to imagine that a significant portion of the population will find it useful.

> You'll be waiting for a long time. It's impossible to index a database for matching via regex, therefore searches on such an engine would be inordinately expensive to process.

It may not be the same as a database index, but Perl has the 'study' function, and I'm sure that there are more than a few ways to speed up regex searches once the data to be searched is known. I wouldn't be too quick to assume that indexing something is the only way to do efficient searching any more than I would say that compar

Probably because there's no discernible difference between them: Alpha is described to be a web page where you type queries into a text box (queries much like you'd type into Google, it appears), click a button, and it gives you answers that are somehow better than Google's.

Or it could that all the tech reporters just like hating on Google and hope that some uber-genius will come along and smack them down David vs. Goliath style. (Disclaimer: In no way am I saying Wolfram is any kind of uber-genius.)