While the first reaction of many might be “OMG, WTF, how could they,” this is actually good news,
with an unlikely cast of characters working together including Google, Intellectual Ventures, and the Internet Archive.

In September, the Patent Office announced a rather strange “Request for Information” (RFI).
Under this proposed scheme, the Patent Office would receive a substantial (upwards of $10 million!) donation of equipment from a vendor.
In return, the vendor would get to be the official distributor of the patent database to the public,
and would get to sell “value-added products.” Among other things, the vendor would get access to
the patents before the public does, allowing them to mine the database, and would be allowed to sell
a variety of bulk products.

While the RFI makes a nod to public access, like all these Zero-Dollar deals
the government cuts, there would be a lot of limits on what is “public” data as the vendor tries
to recoup their investment by selling the so-called “value-added” products. Readers may remember
a similar fiasco with the General Accountability Office where the Federal Legislative Histories
were
<a href="given away to Thomson West
and now even the U.S. Congress has to pay to access this material.

The patent database is no ordinary database. This is the only database specifically called out in the
U.S. Constitution as being the responsibility of the U.S. Executive Branch to run! A lot of people
think this Zero-Dollar deal the Patent Office is contemplating kind of stinks, and I’m really pleased to
announce that
a broad coalition has come together to make this data more broadly available immediately:

Intellectual Ventures, the IP group founded by Nathan Myhrvold, is donating several terabytes of the back file to Public.Resource.Org,
the Internet Archive, and a variety of other groups to make available to everybody.

Google asked for permission to crawl the public application system (known as “PAIR”). The
announcement by the Patent Office of a “sole source contract to Google” was the government’s way
of saying we have permission to crawl their system and bypass the CAPTCHAs. This is good news, because
the PAIR system contains the “binders,” which is all the material that supplements the basic applications
and grants.

The Internet Archive has set aside a boatload of disk drives to serve this data. In addition,
Public.Resource.Org will provide the usual rsync and FTP, and we expect a variety of other groups
to provide mirrors both for bulk access and end-user systems.

It goes without saying that Google, the Internet Archive, and Intellectual Ventures are 3 groups that don’t often work together, and I think this
illustrates the compelling public interest in making the patent database more broadly available.
We announced this Section 8 Task Force in a letter to Congressman Mike Honda. And, we also sent in
a FOIA request to the Patent Office, putting them on notice that we expect any responses to their
RFI $0 boondoggle to be made available to the public, as required by law.

In the long-term, Patent Office just needs to fix their system instead of resorting to silly $0 deals.
They have 600 staff in Information Technology and spend hundreds of millions of dollars.
Surely, they can find a way to serve the public as part of that? Putting a lien on the Patent database in return for $10 million in hardware instead of fixing their 70’s-era mainframes just doesn’t make sense.

In the meantime, we should have the first 8 terabytes of data up pretty soon.
Those interested in learning more about the issue are urged toconsult the paper trail on our PTO page which
includes letters to and from Congress, and pointers to the Patent Office procurement docs.

Hi Luigi. Intellectual Ventures bought all the commercial data feeds from USPTO which come on DVDs. That includes page images, applications, grants. They’re one of a dozen vendors to have bought these roughly 1,000 DVDs of data.

Intellectual Ventures is simply putting the 1,000 DVDs on a disk drive, and making it available to people like the Internet Archive (Brewster got his disk last night) and Public.Resource.Org (we’re expecting ours this week).

In addition to the data on the commercial products, there is additional information available on-line inside of the PAIR system. People have tried for a while to crawl PAIR, but the PTO infrastructure is so poor that it was quickly overloaded and they put in a CAPTCHA system. So, individuals have been able to access this additional info on a one-off basis, but bulk providers have been unable to incorporate it into their systems.

Our goal is to *not* be in the crawling PAIR business, but it is a decent stopgap, particularly when coupled with the bulk DVD data, and hopefully will motivate the PTO to take more positive steps to provide their own database directly to the public.

This looks to be excellent news. The PAIR system is notorious for being out of action outside of US business hours and so any help to fix this, and get around the immensely irritating CAPTCHA system, would be great.

I can foresee the usual suspects objecting to this but in my mind any third party who is going to make patent data more accessible deserves to be encouraged. Maybe one day I could monitor my US cases via a Google web app, getting reminders in my Google calendar of due dates?