Posted
by
kdawson
on Tuesday January 19, 2010 @02:52PM
from the reductio-ad-absurdum dept.

theodp writes "Two years ago, David DeWitt and Michael Stonebraker deemed MapReduce a major step backwards (here are the original paper and a defense of it) that 'represents a specific implementation of well known techniques developed nearly 25 years ago.' A year later, the pair teamed up with other academics and eBay to slam MapReduce again. But the very public complaints didn't stop Google from demanding a patent for MapReduce; nor did it stop the USPTO from granting Google's request (after four rejections). On Tuesday, the USPTO issued U.S. Patent No. 7,650,331 to Google for inventing Efficient Large-Scale Data Processing."

The bad thing isn't the fade in itself. It's that Google used to be run by people who knew what sucked and what didn't. Now it seems like there are people who don't know in positions to call some shots. It's a bad omen.

They're probably about 10 years away from their own version of Microsoft's "Bob".

The fade-in is nice. Not so much because it's a fade-in (which is just visually more pleasant than an instant-display), but because you can visit www.google.com and get a very clean page (google logo, search field, and currently a Haiti relief notice), and just type away (as focus is set to the search field) and be done with it. This is very much like how google.com -was- in the very early days.

If you want to access any of the other services that google have started to offer since then, you can move your mouse anywhere within the screen and hey presto those options become available to you. If you don't need them - why clutter up the screen with them?

You can always customize your own google page and set that as your bookmark/start page/whatever and display exact what you want to have displayed from the get-go.

I agree about the Google redirects. I know they have been there for awhile now, but I first actually "noticed" them (as in they caused me a problem) just the other day when I was trying to get some links to "further reading" to go into some technical document I was writing. I sure didn't want Google redirect links in my document so I actually ended up going to Bing and doing the same search. That worked better as Bing apparently doesn't do the redirect thing and the links are actually links to the site you

The fade-in is nice. Not so much because it's a fade-in (which is just visually more pleasant than an instant-display), but because you can visit www.google.com and get a very clean page (google logo, search field, and currently a Haiti relief notice), and just type away (as focus is set to the search field) and be done with it

I don't really think a small menu of text links hurts. But it sucks to make google your home page for a new tab and then have to wait 5 seconds after opening a tab before gmail or news links are clickable.

Sure, some people loved the show-menu-delay in windows too but I'd rather be able to click instantly on the next menu item rather than wait a couple seconds for it to fade in.

The fade-in is nice. Not so much because it's a fade-in (which is just visually more pleasant than an instant-display), but because you can visit www.google.com and get a very clean page (google logo, search field, and currently a Haiti relief notice), and just type away (as focus is set to the search field) and be done with it. This is very much like how google.com -was- in the very early days.

Call me paranoid, but the sole reason I see for this change is to hide the "You are logged in as user X" message as much as possible.

Now that Youtube and Gmail (and other services?) accounts are inter-linked and that they ALL provide a "Keep me Signed-In" option which is convenient on a per-site basis, most people are now "signed-in" when doing search queries without realizing it.

Of course, one can argue that you are always "signed-in" in some way and that Google can already provide law enforcers, other bus

I've noticed that when you do a google search and mouseover the links, it shows the direct link in the status bar, but that is a lie. If you look at the actual URL in the link properties, you'll see that it redirects through google. Sneaky.

I did look at the url properties. It was the plain url. A search for "houston chronicle" returns this
<a href="http://www.chron.com/" class=l onmousedown="return clk(this.href,'','','res','1','','0CAcQFjAA')"><em>Houston Chronicle</em></a>
right clicking and copying the link location copies "http://www.chron.com"

I get the same as you if I turn of javascript with noscript. However, as long as I have it on, the link defaults to looking like direct, but when you click it or do properties you see that it actually goes through a google redirect.

I find this almost as intimidating as the fact that google maps never opens on your home location (which e.g. bing does) even though google targets local ads at you so clearly could. You really begin to think about how stupid people must be if they Google is managing to fool

So, the "pure research and engineering culture" never makes anything that sucks? In my experience, bad ideas aren't exclusive to any particular group. Good ideas aren't either. "Us vs. Them" produces a lot of heat, but no light.

It's not like they just did it on a whim. They did research with usability between different models before settling on the current one. I don't see why you need any of the other stuff anyways, so the fade just helps you cut out the clutter and zero in on the search bar when you load the page.

This sounds more stupid than evil, which is interesting, because Google doesn't do obviously stupid things very often.

The patent won't do them any good, because it won't stand up in court. They could use it to attack someone small -- an open source developer who would have to back down because they couldn't handle teh legal fees -- but they don't have much of a history of that sort of thing, and there's no reason to think they would in this case, either.

It won't do them any good at all against someone big -- MS and Bing, for example -- because MS would have good lawyers who could demonstrate prior art to a court.

The point is probably to create and keep a nice big portfolio of patents to be used the next time Google gets sued for patent infringement. It's common practice for big tech firms (and others, of course) to hold a reserve of patents at the ready in the event that they need to defend against a patent suit. The aggressor company sues for infringement, the defender digs up a few patents that the aggressor is violating, and they settle out of court for a mutual licensing agreement.

Of course it's ridiculous, and sounds stupid, but it's a symptom of the broken patent system, not a peculiarity of Google.

The backdoor to that system as we've seen is to sell of a patent to a investment firm which stands up a patent troll company (or buys a small company in the field and turns it into a patent troll) and have them abuse it, the MAD strategy then no longer works as the opponent only exists to spend their cash reserves on the lawsuit and to turn over any profits to the investors.

So why not follow the money and retaliate against the investors? An attack is an attack regardless of whether it is done by proxy. That is in line with MAD thinking too where an attack by or on an ally is escalated against the parent aggressor.

What karma? Google bowed to pressure from the Chinese government to censor their results from the beginning. Some may argue that that was the price they had to pay to open up China but it was still a massive karma burn. Google didn't just grow a conscience about dealing with China, they are acting in their own selfish interest as they always have been.

Why do you think the recent Google-China issue is either all about Google having a conscience or all about Google acting in their own self interest? It's both, and it's complicated.

For one thing, having a conscience is in Google's best self interest. Public image is crucial for a company like that.

For another, companies Google's size (or any size, if they are competent) don't make decisions based on 1 factor. They take into account many, many factors, including conflicting ones, and they arrive at a decision. In this case, clearly both the conscience issue was a factor as well as the self interest factor.

They've always done that, haven't they? You sign up, then they text you a validation code.

No, they haven't. I have two GMail accounts for myself, and have created a couple more for parents and acquaintances. In no case I recall any texting being involved. Last time did it, though, it was over a year ago.

A somewhat optimistic guess is that they'll be restricted to using this defensively. Are they really going to sue Hadoop, the open-source implementation of MapReduce? Hadoop not only implements a version of MapReduce, it even uses its name [apache.org], so is not at all coy about being a direct infringement of this patent. And yet, I would be surprised if Google sued them, or the many people using it [apache.org]. They certainly haven't said anything yet, as far as I can find--- when things like Amazon Elastic MapReduce [amazon.com] were launched, I can't find record of Google saying, "hey, you're stealing our tech!"

If they don't enforce their patents, they effectively become public domain. They will probably not sue Hadop, but will try to arrange for some official acknowledgment from Hadop of Google's patent rights and grant them some sort of license explicitly for open source projects. This will strengthen Google's claim. They did not fight their way through 4 rejections and hundreds of thousands of dollars of attorney fees to not enforce this patent.

Could they unleash it immediately on MS? Since MS has been talking about setting up server farms and managing workloads in the last year or so, and since it sounds pretty much like they're trying to do what Google do, maybe MS are the intended target.

you are thinking of trademarks. Those are not related to patents. There are various limitations on their ability to demand damages for past actions that they didn't act about, but their patent won't become invalid through lack of use.

Google has at least 173 issued patents [google.com] as well as over two hundred pending applications [google.com]. That doesn't include the various patents (such as the PageRank patent) that it is the exclusive licensee for but does not actually own (Stanford owns it). Google's software patent strategy dates back to at least 1997, when it filed this application [google.com], which actually predates the PageRank application.

Near the very top, just above the author name ("Dean et al.") it says "United States Patent". A published application would instead say "Patent Application Publication".

Another way to tell is that it has an issued patent number. Currently, patents are being issued with numbers in the mid-seven-millions. They (more or less) started at one and continue to increase from there. Published applications have numbers in a different format: YYYY/NNNNNNN, the YYYY being the year of publication and the NNNNNNN be

Before you go acusing Google of doing Evil (TM), think.
If they don't do this, some troll will. The troll will lose,
but Google will waste a lot more money defending against it.

This is why IBM takes out so many patents too. Most of them
are "defensive" patents.

We (that being everybody except the USPTO) could agree not to take
out any more software patents, and the industry would breathe a collective
sigh of relief. Trouble is, it only takes a few bad apples to spoil
that approach. It's the same reason Communism didn't work.

It is not true that if Google doesn't patent it, a troll will. A technique that is well known, such as MapReduce, is the property of the general public and is unpatentable. Any technology that has been sold or in use for over a year is unpatentable.

We're called examiners, rather than clerks, and the issue with the vast majority of patents reported on Slashdot isn't that the examiners are clueless concerning the prior art, but that Slashdotters are ignorant of how patent law and patent examining actually works.

For example, in this case, the claims are extremely long - so long, in fact, that the patent is probably worthless for its offensive capacity. The more limitations that a claim has, the narrower the invention has.

If the Google patent is truly for something that is already known then it should not have been issued. I did not read the whole patent. Patents always have to be for something new that is not yet known by others: http://www.uspto.gov/patents/basics.jsp#novelty [uspto.gov]. If what Google patent was already known (and I'm not saying that it is because I didn't read the whole thing) then it can be challenged and overturned in court. It can be challenged in federal district court, appealed to the Federal Circuit, and

yes, it can be challenged, but at what cost? Running a patent case up the appeals ladder like that will cost many tens of millions of dollars, if not hundreds of millions. So, the only people who can preemptively challenge patents are the people who have the most interest in seeing the status quo continued (IE IBM, MSFT, GOOG)

Anyway, the ancestors in this post are correct, if google doesn't patent this, and say, linkedin, or facebook, or any of the other hundreds of businesses that use map/reduce had trie

And yet (as I said) - GOOGLE MANAGED TO GET A PATENT ON IT. And if Google could, then a patent troll could too. (Again, as I said before.)

If what Google patent was already known (and I'm not saying that it is because I didn't read the whole thing) then it can be challenged and overturned in court. It can be challenged in federal district court, appealed to the Federal Circuit, and appealed to the Supreme Court.

And what do you suppose that would cost? I can guarantee you it would be *significantly* higher than the $1300 Google spent at the USPTO.

I'm not defending patents, but this patent was filed for on June 18, 2004. The MapReduce paper was released in December 2004. The fact that it is similar to functional programming primitives is largely irrelavent - it is the application of the technique in a novel way to solve a specific problem (ie large scale data processing) which makes it patentable. For a start, the system described in the patent includes details on parallelisation/processing task distribution, rack awareness, and lots more.

Any technology that has been sold or in use for over a year is unpatentable.

A patent based on such technology may not stand up in court, but to start with, in practice "patentable" means something the USPTO will issue a patent on. And the examiner looking at whether to grant such an issue may not be familiar with relevant prior art, not to mention that they may not even have any particular incentive to examine a given patent application closely.

It is not true that if Google doesn't patent it, a troll will. A technique that is well known, such as MapReduce, is the property of the general public and is unpatentable.

If it was unpatentable in practice Google would obviously not have been granted a patent on it; since they were granted a patent on it, the inescapable conclusion is that, in practice, MapReduce is in the category of things which can be patented (whether it should be or not), and therefore, it is not at all inconceivable that if Google h

Any technology that has been sold or in use for over a year is unpatentable.

Except if you have applied for the patent prior to it being offered for sale. In theory you then have only a year after the application to get the patent, but there are ways that patent attorneys can stretch this out by making amendments, etc. to the patent application.

I have a couple of applications pending from a previous employer. It has been about three years, and it seems like each year, around August, I get a call/email from the patent attorney asking me to sign another version of the application.

At the risk of being modded Offtopic (which I am), You are either trolling for grammer Nazis, or you misapprehended a phrase... I believe you meant to say "For all *intents* and *purposes*" in your signature line, not "For all intensive purposes..."

At the risk of being modded Offtopic (which I am), You are either trolling for grammer Nazis, or you misapprehended a phrase... I believe you meant to say "For all *intents* and *purposes*" in your signature line, not "For all intensive purposes..."

We're probably never going to get rid of software patents, odious as they are; at this point there are too many enormous players, of which Google is not at all the worst offender, with way too much invested in them. But it occurs to me that one change to patent law that might be politically feasible, and which would really help cut down on clearly frivolous patents like this one:

If any claim in the patent is held to be invalid, the entire patent is invalid.

Claim 1 of the patent is simply an arcane, legalistic description of the operation of pretty much every parallel processing algorithm ever. Some of the subsequent claims actually do describe novel, non-obvious, and useful ways of handling large data sets across multiple processors. If the patent were restricted to these claims, well, it would still be a software patent and therefore Evil, but it might at least have some claim to promoting "the progress of science and the useful arts."

In general, it seems like this would make both patent trolling, and big companies like Google lawyering small independent developers to death, a little more difficult.

"We're probably never going to get rid of software patents, odious as they are"

Careful with generalizations. MapReduce is not such a bad thing to patent, provided of course that you actually invented it. The problem here seems not to be that the patent is frivolous but rather that Google didn't even come close to inventing the thing they've patented.

A machine, or a chip, is just a physical manifestation of math. Can you come up with good justifications for your assertions? Why should a machine be patentable and an software algorithm not be? Why should a chemical process (an algorithm for manipulating chemicals, such as refining aluminum) be patentable and a method for manipulating information not be?

Software patents are inherently wrong. It doesn't matter if you invent an algorithm or not, because algorithms are just mathematical expressions, and you can't (or shouldn't be able to) patent math. And algorithms are usually implemented, not in physical (patentable) devices, but in software programs, for which the appropriate protection is copyright, not patent.

So you're asserting that you should be able to copyright math?

The whole "software is math" argument is old and debunked. Anything which requires c

I'd like to believe you're right, but the serious lobbying hasn't started yet. And the fact that patents like the one discussed in this story are still being granted shows that there's still life in the old beast.

I tend to think the extinction of software patents has a chance. While most of the big software companies use them, they are aware of the double-edged sword nature of them. They might lobby for them somewhat, but I don't think you'll see them doing a full-court press. Some of them may even decide they'd rather see software patents go away so they won't be targets of the trolls.

Yeah, but they can't deny patents due to a pending lawsuit that could potentially outlaw them...

I'm telling you though, read through the arguments. It will only take you 15-20 minutes. The justices, all of them, continuously ripped these people new orifices. It was a beatdown quite unlike anything I've seen (read?) in the Supreme Court for a long time.

Yes, but you are assuming software isn't considered a method. It SHOULD be a method, because it doesn't inherintly change anything within a computer, and if I'm not mistaken there are fights going on right now to define it as such...it is a method by which a computer is told to operate, but doesn't actually modify the computer in any way.

Analogy: making a choose your own adventure book doesn't change the fact that it is a book...it merely changes the way you obtain and utilize the data (words) on its pages

Yes, but you are assuming software isn't considered a method. It SHOULD be a method, because it doesn't inherintly change anything within a computer, and if I'm not mistaken there are fights going on right now to define it as such...it is a method by which a computer is told to operate, but doesn't actually modify the computer in any way.

First, software absolutely changes things within a computer. Different switches get flipped, electrons grow in places where they never grew before, etc. Computers are deterministic machines - if software didn't "actually modify the computer in any way" then the software wouldn't be executing and the output would not change in any matter from before the software was applied.

Second, the current test for patentable subject matter in method patents is that it's either transformative (turning iron into steel, f

all such a rule would do is 1) force small inventors out of the patent field due to increased costs, and 2) SLAM the USPTO with applications which are virtually identical but have different claims instead of 20 claims in one patent, you'd have 20 patents, each with one claim.

I wrote a parallel application to process scientific data on multiple servers at a previous place I worked, using just SQL statements with a mod function on a primary key. The resume builders there then hired a consultant to help them rewrite the whole thing (excluding the core atomic algorithm part) using Hadoop and MapReduce, because the previous one didn't use Hadoop and MapReduce. They made a total mess and it's so hard to configure and deploy that IT still uses the version I wrote a year before.

The greybeards have a point there. In my branch of signal processing where have gone through cycles several times as computer hardware evolves. In my experience we've been through minicomputers, array processors, workstations, clusters, stream processors, multi-cores etc. Each configuration as different balance of CPU speed, memory size, memory bandwidth, and so on. So we've gone through the difference algorithms, the integral algorithms, the spectral, the local-transform, cyclic matrices, etc. back and forth several times. Sometimes each new generation of grad students feels it has invented something new if sloppy work by their faculty advisor doesnt correct them.

- did the submitter actually read the claims, before asserting that it was obvious and/or anticipated?
Here's claim 1 (it's a monster):
1. A system for large-scale processing of data, comprising: a plurality of processes executing on a plurality of interconnected processors; the plurality of processes including a master process, for coordinating a data processing job for processing a set of input data, and worker processes; the master process, in response to a request to perform the data processing job, assigning input data blocks of the set of input data to respective ones of the worker processes; each of a first plurality of the worker processes including an application-independent map module for retrieving a respective input data block assigned to the worker process by the master process and applying an application-specific map operation to the respective input data block to produce intermediate data values, wherein at least a subset of the intermediate data values each comprises a key/value pair, and wherein at least two of the first plurality of the worker processes operate simultaneously so as to perform the application-specific map operation in parallel on distinct, respective input data blocks; a partition operator for processing the produced intermediate data values to produce a plurality of intermediate data sets, wherein each respective intermediate data set includes all key/value pairs for a distinct set of respective keys, and wherein at least one of the respective intermediate data sets includes respective ones of the key/value pairs produced by a plurality of the first plurality of the worker processes; and each of a second plurality of the worker processes including an application-independent reduce module for retrieving data, the retrieved data comprising at least a subset of the key/value pairs from a respective intermediate data set of the plurality of intermediate data sets and applying an application-specific reduce operation to the retrieved data to produce final output data corresponding to the distinct set of respective keys in the respective intermediate data set of the plurality of intermediate data sets, and wherein at least two of the second plurality of the worker processes operate simultaneously so as to perform the application-specific reduce operation in parallel on multiple respective subsets of the produced intermediate data values.

That's one heck of a detailed claim. Infringement would require some effort; anticipation (every limitation appearing in a single document, arranged in the same manner as the claim) is unlikely.

"That's one heck of a detailed claim. Infringement would require some effort; anticipation (every limitation appearing in a single document, arranged in the same manner as the claim) is unlikely."

Um...

My "Computer Science" foo may not be strong, but I do see a problem.

Let's begin with the definition of "process" and "interconnected processors". When translated this actually doesn't mean much, especially if using a functional notation. In short, a functional sort has to conflict.

I'm not sure it would be so hard to infringe. In fact, reading through it, I don't see how a merge sort implemented on multiple processors would not fit that exact description. Merge sort is one of the oldest sorting algorithms in the world, it was invented by Von Neumann himself (I don't know when it was first used in a multiprocessor system, but I would guess no later than the 70s).

That's one heck of a detailed claim. Infringement would require some effort; anticipation (every limitation appearing in a single document, arranged in the same manner as the claim) is unlikely.

Anticipation is narrow. Infringement, however, is broad. A slight difference between the purported prior art and the claim means the prior art doesn't invalidate the claim. A slight difference between the device claimed to infringe and the claim, however, doesn't make the device non-infringing. This means that wh

My research are is HPC, and I sometimes have toyed with trying to work for Google. They seemed like something special.

Now that they're pursuing unjustifiable software patents, I'm forced to sadly put Google into the same mental category as Microsoft and IBM. Like the other two companies, Google does some cool stuff, but I wouldn't feel much better about working for Google than I would for IBM or Microsoft.

A patent is only worth it's strength in court. The USPTO has clearly given up trying to judge if a patent is truly worthy on their own, relying on the courts to decide afterwards when a patent is put to use and put to the test - in court.

What bothers me the most is the fact that anyone can get a patent for anything as long as they keep revising their application.

At the end of the day, those with the biggest wallets will get their patents, and they will also have their guns to fight and win in court.

Previous post seems to indicate that this patent application was filed in 2004, before Hadoop was created. If true, and if Google decides to use this patent against Hadoop, and if the patent withstands the scrutiny of a court battle, then Hadoop, at least as Open Source, would be dead.

Don't jump to conclusions yet, however. It'll take some time to digest the patent and decide what it's really attempting to cover.

You realize that Hadoop is a reimplementation of the MapReduce technology widely in use inside Google for a long time. Google invented it, filed a patent on it, published a paper on it, and Hadoop reimplemented it... then finally the US PTO granted the patent. Clear?

It works on a large scale with todays available processing setups, but its far from 'efficient' in any sense of the term I consider.

Pyramids were built with (so the theory goes) millions of laborers because thats the only way they could handle such a large scale project. Map reduce is the same thing. On that scale, with todays technology, thats the way we do it.

It works, today, so we use they method, but thats where it ends.

Would you build the pyramids today with a million laborers? No, you'd bring in so

I'll reserve judgement until this patent is involved, offensively, defensively or otherwise, in litigation.

Google has got a good reputation so I'm not as quick to condemn them as I am to condemn Microsoft which has a PROVEN track record of evil.

It's entirely plausible that this patent is part of a defensive patent portfolio whose sole purpose is to protect Google.

And considering the zany IP landscape, if anyone's going to have a patent on this, I'd rather it be Google than anyone else. If Microsoft had this club in their arsenal you can bet your bottom dollar they'd make their assault on Tom-Tom look like a puny peashooter.

Since this is Slashdot, the usual knee-jerk reaction to any patent story is "duh, this is obvious". In most cases, people posting such replies haven't even read the claims, or if they did, do not understand how to interpret them properly.

I'm very skeptical that Google had indeed somehow managed to patent the fundamental principle of MapReduce, given that map and reduce (fold) have been basic FP building blocks for several decades, under these very names. I suspect, rather, that Google patented their particu