Posted
by
kdawson
on Tuesday May 19, 2009 @03:56PM
from the badges-coming-soon dept.

eldavojohn notes that Groklaw is highlighting the unexpected Wolfram|Alpha ToS — unexpected, that is, for those of us accustomed to Google's "just don't use it to break the law, please" terms. Nothing wrong with Wolfram setting any terms they like, of course. Just be aware. "We've seen people comparing Wolfram's Alpha to Google's Search from a technical standpoint but Groklaw outlined the legal differences in a post yesterday. Wolfram|Alpha's terms of use are completely different in that it is not a search engine; it's a computational service. The legalese says that they claim copyright on the each results page and require attribution. So for you academics out there, be careful. Groklaw notes this is interesting considering some of its results quote 2001: A Space Odyssey or Douglas Adams. Claiming copyright on that material may be a bold move. There's more: if you build a service that uses their service or deep-links to it, you may be facilitating your users to break their terms of use, and you may be held liable."

I'll never forget the CIO who told me (I was a consultant presenting a Help Desk application that we had been hired to implement and were about to deploy at his company) - "It doesn't look enough like Google. I want it to look like google - just one line that I type what I want into."

Now, to me, google (or google's address bar) is a huge improvement on the Command Line. I bet the same guy wouldn't have wanted to return to the days when you had to guess what the command-line needed you to type, much like an Infocom adventure game.

That's why Google is a huge improvement - it tries to figure out what YOU want. That's the reverse of a command-line, where you have to figure out what IT wants.

It seems half finished. If I look up the catalogue number of an exoplanet, for example, it'll read me off it's orbital parameters. If I then try and ask what 'longitude of periapsis' means, it'll shrug it's shoulders and return absolutely nothing.

The design of the system is that it intelligently scrapes quantifiable information that can be put into a defined knowledge base structure and inter-related. Length, weight, oribital period, age, population, molecular weight, wavelength, numeric series, calories... values that are measured in units or physical properties of the world around us. By fitting this information into a defined structure the system has the ability to now extrapolate from it to answer questions... hence the words 'computational engine'.

Why build another text search "library index"? It's been done out the ying-yang. This system is orders of magnitude more ambitious and complex and while still in it's infancy, it's a pretty spectacular achievement already IMO. Just allow yourself to think outside of the 'search engine' box. While it contains some facts about the world, it's not a search engine.

But then you search for "SAT score" and get back distribution information about SAT scores. Search for "GRE score" or "ACT score" and it returns nothing. While neither of those tests are as big as the SAT, they're certainly big enough that you'd expect to get something back. Half finished is about right to describe my experience. Extremely useful for some topics (my hometown, for instance) but not so much for others (my wife's maiden name, for example).

I just tried a search for my hometown, Hickman, CA. It came up with a link to Hickman, Kentucky, and suggested I use Hickman Nebraska instead. Who wants Nebraska? Then I saw a link that just said "Hickman." I tried it, and it came up with a demographic breakdown, that didn't quite seem to match any place I've lived. Then I realized it was giving me the demographic breakdown for those with the last name of Hickman. Interesting, but not what I was looking for.

In fact, that's how I would characterize the entire system: interesting, but not what I was looking for.

Finally, I tried Hickman, CA again, and realized it had recognized California, but instead was comparing the location of Hickman Kentucky with California. So I now know how the lowest point in California compares to the lowest point of Hickman Kentucky. Except it didn't actually list the lowest point for Hickman Kentucky.

Then, a search for "Angelina Jolie nude" resulted in Wolfram|Alpha isn't sure what to do with your input. Hmmmmm.

I just tried comparing "Wales" and "Scotland" in WA. Instead of the countries, I got information about two cities. Hmm. Then again, comparing "Welsh" and "Scottish" returned some genuinely interesting information about the two languages.

They aren't claiming ownership of the bits of data they provide, they're claiming copyright over the whole page. Sort of like how an encyclopedia will copyright the book even if it includes quotes from people. Basically over the presentation of the data.

Additionally much of what they would be claiming copyright over isn't subject to copyright protections. Things such as birth dates and astronomical data aren't subjected to copyright protection.

Then I guess you should have read the actual terms before you posted, hmm?

Attribution and Licensing

As Wolfram|Alpha is an authoritative source of information, maintaining the integrity of its data and the computations we do with that data is vital to the success of our project. We generate information ourselves, and we also gather, compare, contrast, and confirm data from multiple external sources. Where we have used external sources of data we list the source or sources we relied on, but in most cases the assemblages of data you get from Wolfram|Alpha do not come directly from any one external source. In many cases the data you are shown never existed before in exactly that way until you asked for it, so its provenance traces back both to underlying data sources and to the algorithms and knowledge built into the Wolfram|Alpha computational system. As such, the results you get from Wolfram|Alpha are correctly attributed to Wolfram|Alpha itself.

If you make results from Wolfram|Alpha available to anyone else, or incorporate those results into your own documents or presentations, you must include attribution indicating that the results AND/OR [emphasis mine] the presentation of the results came from Wolfram|Alpha. Some Wolfram|Alpha results include copyright statements or attributions linking the results to us or to third-party data providers, and you may not remove or obscure those attributions or copyright statements. Whenever possible, such attribution should take the form of a link to Wolfram|Alpha, either to the front page of the website or, better yet, to the specific query that generated the results you used. (This is also the most useful form of attribution for your readers, and they will appreciate your using links whenever possible.)

A list of suggested citation styles and icons is available here.

Failure to properly attribute results from Wolfram|Alpha is not only a violation of these terms, but may also constitute academic plagiarism OR [emphasis mine] a violation of copyright law. Attribution is something we expect you to give us in exchange for us having provided you with a high-quality free service.

The specific images, such as plots, typeset formulas, and tables, as well as the general page layouts, are all copyrighted by Wolfram|Alpha at the time Wolfram|Alpha generates them. A great deal of scholarship and innovation is included in the results generated and displayed by Wolfram|Alpha, including the presentations, collections, and juxtapositions of data, and the choices involved in formulating and composing mathematical results; these are also protected by copyright.

You may use any results, including copyrighted results, from Wolfram|Alpha for personal use and in academic or non-commercial publications, provided you comply with these terms.

If you want to use copyrighted results returned by Wolfram|Alpha in a commercial or for-profit publication we will usually be happy to grant you a low- or no-cost license to do so. To request a commercial-use license, go to this form and provide the input for which you want to use the corresponding output along with information concerning the nature of your proposed use. Your request will be reviewed and answered as quickly as practical.

You didn't know what you were looking for, we did.we found it for you, you WILL find that what we gave you is what you were looking for.If you have a problem with this, we will kill you.(or failing that, come close enough for a copyright suit... how about a copyright vest? trousers?... what about a copyright shirt and tie?)

Don't see what the big deal is, here. Since Google doesn't host any of the actual information, you don't need to cite them as a source. You do need to cite the page you get to from Google, though. Think of W|A like a procedurally generated encyclopedia/textbook/almanac. Just like any of those other sources, you should cite it as a reference.

The sooner people stop associating Google and Alpha in their heads, the better.

Just like any of those other sources, you should cite it as a reference

I should, if I'm writing an academic paper. I never thought of it as something that should be enforced, with them claiming I've violated the ToS, or threatening copyright infringement, especially when all I'm doing is posting a search result to Slashdot.

Google doesn't host any of the actual information, you don't need to cite them as a source.

Google does, in fact, host all of the information used in their searches (it doesn't go out a spider the web in response to your request, it spiders it earlier, creates a database, supplements that database with information about your and other users past searches and behavior, and uses that database when you enter a search query.)

I don't see the problem here. It really would be plagiarism to copy paste one of those plots into your paper and claim you generated it yourself.

I think we would need a lawyer for any further analysis, but I never really did think I could just gather a bunch of PDFs from Alpha (e.g. pages of common probability distributions) and claim the compiled book as my own.

In many cases the data you are shown never existed before in exactly that way until you asked for it, so its provenance traces back both to underlying data sources and to the algorithms and knowledge built into the Wolfram|Alpha computational system. As such, the results you get from Wolfram|Alpha are correctly attributed to Wolfram|Alpha itself.

If it didn't exist before I asked for it, and my asking for it was the only human action that caused it to come into existence, if there is an "author" for copyright purposes, it's me. The only way Wolfram could, therefore, claim copyright on it is if it was a work for hire, but since I'm not a Wolfram employee acting within the scope of my employment, and since there is no agreement signed by both parties designating it a work for hire, that doesn't work either.

Consequently, I'd say their own terms of service defeat their claim to copyright.

the legalese says that they claim copyright on the each results page and require attribution.

and that day appears a long way off, especially given the way they hyped it.

Besides, all their data comes from somewhere, and I don't see those attributions. And by all their data I mean symbolic integration, fractals, and Wolfram's formulation of a Turing machine which no one else uses.

I don't know what Alpha will be like in the future, but I was extremely disappointed in the present, and imagine Google^2 will make Alpha obsolete very soon anyway.

All they ask is that you attribute them when publishing results derived from their service. Example:

Methods: "The comparative population studies were derived from the Wolphram Alpha service (Wolphram, 2009)"

Regular thing for academics. I cite NCBI blast service, I cite PFAM, I cite dozens of other services out there. Most of these tools require or ask for an attribution; and in most cases, this is anyways necessary in a scientific procedure.

How is it reasonable to ask for attribution for having a computer perform a calculation on someone else's data? Wolfram Alpha has do nothing except code a turing machine, I do not cite HP when I do a calculation on my calculator and I see no reason why more complex but equally wrote calculations should be. I ask the computer a question and it gives an answer, is the question or code used to find the answer the insightful/citable part of the idea?

You think it's not reasonable? Then write your own Wolphram Alpha, if you really think it is that simple, and use that instead of WA for your work. Man, you have no idea what you are talking about here. Modern biology would be nowhere if people who build such "turing machines" were not credited for their work, and consequently get grants for their research.

For example, tons of software in bioinformatics is written with a completely open source and well known algorithms, using data gathered by experimentalists, and yet they get the recognition -- because someone had to come up the with the idea, gather (and maintain!) the data, run tests, implement, etc. etc. Believe me, even with simple ideas and algorithms and for simpler data sets this is a shitload of work. Heck, even re-implementations of existing tools get recognized.

Secondly, a scientific procedure requires that you publish your methods -- you have used software X to generate figure Y and table Z, then you have to write how you did it. And noone in her or his right mind will reimplement existing tools just for the sake of the current work without a very good reason.

That said, sometimes a tool like that allows you to "get on the trail" -- which you then pursue using something else. For example, WA would give you a hint that there might be a connection between cancer and, say, cigarettes, and you show this connection using clinical trials. In such a case, however, when you do not publish the data from WA directly, nor any figures derived from it, you are not required to cite it.

Note that I am in no way convinced that WA is of any use. The parts of it that overlap with my area of expertise (biology / biocomputing) are naive and rudimentary, and mostly useless to say the least.

Which is exactly why academics will ignore the Wolfram|Alpha terms-of-service and simply use their own best judgment to decide when to cite it.

No academic would cite Wolfram|Alpha (or any other software package) when they use it to perform some simple calculation, like sin(x) or whatever. But if the piece of software is performing a non-trivial calculation, then it should be cited, both to provide proper credit/attribution, and to make the methods section of a paper complete (it is possible that there is so

Yeah, but such a citation is also very useless for the readers of an article, since a search engine/computational service does not produce immutable results. You never know when you read the article and check the stuff in Wolfram Alpha yourself, if the results you get are the same the authors used.

Basically a service like Wolfram Alpha is not usable as an academic source.

Wolfram Alpha doesn't just provide you with knowledge. It provides you with a new kind of knowledge. Any knowledge you gain from it must be attributed to Stephen Wolfram... because he invented it. It is actually safer to attribute all citations to Stephen Wolfram, in fact, because he is smarter than you.

Wolfram deserves a big wet raspberry from everybody who thinks he is nothing more than an insufferable ego-maniac windbag that like the rest of us ride along on the coattails of the really great minds that came before.

This whole "new kind of [whatever]" meme might be really funny if it weren't so sad -- not because Wolfram doesn't really think he is smarter than almost everybody else (he does), but because - reportedly [google.com] - he can't be prevailed upon to care about what most other people think, let alone how his choices might affect them:

Of course I can see them wanting to be attributed for calculations? But what's the problem with that? I *want* to see attribution when a blog, newspaper, or scientific report spits out a series of numbers anyway, especially if it involves something else than raw mathematics, like statistics. That's something I see as important as they can just as well demand it in my opinion. I consider it a service to me.

If there's something that annoy me, it's unsourced calculations. If it's attributed to WA, then I can at least use the same query on WA and in turn see what WA used as sources for that specific query (under the "source information" link at the bottom of each page)

If there's something that annoy me, it's unsourced calculations. If it's attributed to WA, then I can at least use the same query on WA and in turn see what WA used as sources for that specific query (under the "source information" link at the bottom of each page)

You are making the easily understandable mistake of assuming that the "Source Information" link does, in fact, liest the sources of information used in the query. While you'd think that would be the case, if you actually read the disclaimer at the

The law already protects databases of public facts. Why would a spontaneously generated list not be copyrightable? Personally, I hope that the courts will see through that argument and call it a violation of the spirit of the law, but I won't hold my breath that they won't say that a list of copyrighted quotes isn't protected if the creator of the list claims that THAT list is protected.

The Wolfram terms of service says that Alpha is capable of generating content from several data sources,and sometimes Alpha considers the content sufficiently original that it will attribute the content to itself. Otherwise, it will attribute the content to the source where it was derived. What is interesting is that we have a machine generating what is essentially one time use content, and the machine then gains a copyright to the content that others, even humans, have to respect. It is no more crazy than assigning a copyright to a corporation, so we should not be surprised. In any case, Wolfram does have a point that content should always be attributed to a source, and that people have become quite lazy on this issue, as various accusations of high level plagiarism have shown. Since Google only indexes, it does not really know Providence and cannot claim copyright to anything in particular.

There are couple of really scary things in the terms of use. For instance, minors are not allowed to use the service without the permission of adults, and adults become fully responsible for the actions of the child. I am unsure of why they felt they had to put that in there. Then there is the first sentence "The Wolfram|Alpha service may be used only by a human being using a conventional web browser to manually enter queries one at a time". I hate to have to define what a conventional browser is. For may people it would be only IE.

More scare is the ambiguous policy to deep linking. To wit "It is not permitted to use Wolfram|Alpha indirectly through another website that has created a large number of deep links to Wolfram|Alpha, or that automatically constructs links based on input that you give on that site, rather than on Wolfram|Alpha. You may not in effect use Wolfram|Alpha through an alternate user interface presented by another website." Clearly they want to not have bots and third parties writing code to hijck the site. Disappointing given the wonderful work they did with Mathworld.

In many cases the data you are shown never existed before in exactly that way until you asked for it, so its provenance traces back both to underlying data sources and to the algorithms and knowledge built into the Wolfram|Alpha computational system. As such, the results you get from Wolfram|Alpha are correctly attributed to Wolfram|Alpha itself.

Does that mean that Wolfram|Alpha can be sued for slander if its algorithm generates a false statement about some individual or corporation by "misunderstanding" the data it is digesting? In other words, if the result is something uniquely generated by Wolfram|Alpha, deserving of attribution in the same way that an author of a book deserves attribution, do they also deserve to be held liable if the content they are generating is incorrect or slanderous?

Result: there is unfortunately insufficient data to estimate the velocity of an African swallow(even if you specified which of the 47 species of swallow found in Africa you meant)
(asked of a general swallow (but not answered) in Monty Python's Holy Grail.)

It looks like its results are case sensitive, but the redirects don't know that.

Did a search for 'hockey' and got some general information (as expected).
Tried a new search for 'ice hockey' which attempted to redirected to 'Hockey' which apparently isn't a doesn't exist (the capital 'H' throws it off).

The legalese says that they claim copyright on the each results page and require attribution.

How is a document generated by a computer program in response to an external users query an original work of authorship created by Wolfram? Sure, the computer program itself is, but that's a different issue. If its not, it isn't subject to copyright by Wolfram, and nothing in W|A's terms of service can make it so.

The source information is ridiculously general; it tends to be either blank, or list every source used anywhere in a very general way. If all the results cite two different versions of the Encyclopedia Britannica and also Wikipedia, how can we tell which particular Wikipedia page the information came from? (That's needed to know the author list and thus know the information required by Wikipedia's license, whether it's GFDL or CC-by-sa.)

All calculations generate the sources under the "Source information" link on each page.

They don't identify the sources of particular facts used (for instance, if you ask for the population of a country, you'll get a Wolfram|Alpha "Primary Source" -- and a whole list of other sources that are generically root sources of population data.)

Meanwhile, if I ask Google for the population of a country, I get a numeric answer with a specific website that is the source of the information. (I point to that specific e

I'm not sure how revolutionary Wolfram Alpha [wolframalpha.com] really is. But, if you've tried it, you'll have discovered that it's not a google alternative - It's not even trying to be. It's a completely different tool. It's kind of fun to tinker with, but I haven't decided yet how useful it will be.

And, just so that I can blatantly violate their TOS (which I've yet to read except for in TFS and I've not agreed to), here are the results for 2+2:

I think that being a google alternative is like being an ipod killer, and we've all seen how successful companies have been in that endeavor. Good on them for not trying to play follow the leader or at least claim not to, I haven't actually USED the service or anything to tell if they are or are not imitating/trying to replace google.

Another cool thing, do a search for any website (here is slashdot for the click impaired [wolframalpha.com]). It comes up with an element hierarchy for the page. I'm not sure how useful it is, but it's pretty.

I'm supposed to be impressed because the people who sell Mathematica have figured out how to solve a differential equation? Call me when Wolfram Alpha can solve Schanuel's conjecture. Then, I'll be impressed.

I just asked Wolfram Alpha if every finitely presented periodic group was finite and it told me to go fuck myself.

Well it claims to make information computable. I accept it's not meant to find results like Google but the issue with it is it doesn't even seem to gather basic data in a computable form.

I mean, you try things like "On what date did the Falklands war commence?", "How many species of Melocactus are there?", "On what date was Adolf Hitler born" and it outright fails.

Okay, so I figured maybe I'm asking questions that are out of the intended realm of knowledge it supports and the assumption is that you'd never want to compute with this information. So I tried something Mathematical - I mean, that is Wolfram's speciality right?

"How many non-isomorphic labelled trees are there with 4 vertices"

Fail.

I've tried a few other relevant, factual questions and it just falls flat over, not even able to try and answer them.

I'm sure it does do a great job of making information computable, the problem is it's unable to gather the information in the first place.

Ironically, Google, that doesn't claim to make information computable manage to provide answers for all these questions within it's first page, often as the first hit. Sure it may not be presented in a standardised format, but data that needs to be parsed is certainly more computable than data that simply can't be provided at all.

I can see what Wolfram was trying to do, but why did he have to couple it with immense hype that it's as important as Google? Why has he been going on and on about it to the media when it struggles to even do what it's supposed to absolutely excel at? I think they could've at least saved face if they'd stopped being so cocky about it and released it with a little less hype and fanfair and let it improve and become more useful and hence more greatly adopted over time. One has to ask when there was so much hype about it and with a ToS like this whether it was all just about Wolfram gathering data for himself or something than providing a tool useful to everyone else. Either that or he simply beleives his own hype and believes the tool is better than it really is. Perhaps in developing and using it himself he was blinded in making and seeing it work well for applications specific to what he wanted without ever truly seeing how well it performs in other problem domains?

For wolfram alpha to be successful they will need to develop their natural language parsing abilities, it's not easy to do, each question may require individual interpretation. At this point using google is better for understanding more abstract concepts.

I've used wolfram alpha to help with my linear algebra homework for the past few days. Good info for checking my work. Matrix example [wolframalpha.com]

The best part is using it on a phone, it's made my G1 a more powerful calculator than my good ol TI-92.

Yes, it does NOT search. But they sold it this way - or at least they played aggressively with the idea.While creating PR buzz around it, they introduced it like "not a Google killer", when nobody had any idea what the thing was (so they could introduce the concept just the way they wanted to, and they explicitely chose to introduce the Google benchmark, even if to negate it.) And they obviously KNEW where this approach would have led to, in people's mind.

Let's say I have a TI-86 in front of me....and I add 2+2. They're saying that if TI wanted to, that they can forbid me from using that 4 without attributing it to them. Sounds a little ridiculous to me.

A contract requires some specific elements to be enforceable: an offer, an acceptance, and a consideration. You could say putting up the site is an offer to use it, and actually using it is an acceptance of this offer. But there's no consideration being traded. Hence, their TOS is not a contract.

It's more accurate to say the TOS is a license to use their site. But even in that case, what remedy could they pursue if someone used their information without their permission or in a way that contravenes the TOS?

How are people who show up to use a free service "customers?" Google's customers, for example, are their advertisers, not the people who use the free stuff.

They can both be considered customers. I'm Google's customer because I give them money; not directly, but through their advertising. Of course, that depends on the definition that you use for customer, but I'm giving Google something they want (pageviews and advertisement clicks) in exchange for them giving me something that I want (good search results). If we're not their customer, then we're very close. If I go to another site for my searches, then Google loses money.

Anybody who has used Wolfram's products, such as Mathematica, for more than a few versions, knows that they don't have, how shall I say this? a very enlightened view of the relationship between the party that sells a product and the party that buys that product.

In fact, their user agreements have always been among the very worst in the software industry, that is, if you happen to believe that the consumer has any rights at all beyond the right to give money to the vendor.

Could you give some examples? Not that I'm doubting you, I'm just curious.

I've been left without access to mathematica licenses on multiple occasions due to misunderstandings between Wolfram and my institution. Because Mathematica was my primary platform at the time, that meant days that I was unable to do or access my work.

The first time that happened, I decided to learn an open platform; the second time, I migrated. In my projects, I now absolutely avoid writing core functionality in Mathematica.

Another complaint: you can't discover how defaults work in some cases. As far as I can tell, setting things to "Automatic" means "proprietary and undescribed." I've asked Wolfram for details in one case, only to get a "we can't tell you" response.

Oh, and being told off for filing bug reports is pretty unimpressive. I separately reported different manifestations of the same bug, separated by some time. I'd actually forgotten about the first report, but if they'd fixed the bug, the situation wouldn't have arisen. When I've submitted a bug report to open source projects, they have usually been along the lines of "this line is wrong, and this seems to be an acceptable fix."

I think the arguments for open, modifiable, redistributable source code (that is guaranteed to retain those properties) are extremely strong. I.e. the GPL, probably v3. Once you know it well, Mathematica is a stunning programming language and library set, but I now don't care: as a whole, the platform has been unreliable for me.

You make a good point. If you are doing ground-breaking research that depends on computer calculations, how can you be sure that the results given to you by mathematica are accurate and not a bug? Reworking the calcualtions by hand defeats the purpose of using the software in the first place, if that is even possible (some physics simulations take weeks of _computer_time_ to compute).

At least with open source computer algebra, one can verify the method used to compute the results.

Actually, one could argue that making money is the entire point of this ToS. They provide the service for free, while putting restrictions on reusing the data so that you have to buy a license/subscription/whatever in order to use it in a professional setting. Otherwise, it'd be a completely free service.

Though IANAL, their copyright would be on the way they express the search results on the page (the way the content is displayed, ranked, any graphic design elements, etc.). They do not actually have standing to claim copyright on the content that is displayed (e.g., song lyrics, citation from a book), but only the way they display it.