We wanted to let you all know that a few months ago we quietly released - or actually re-released - an Optical Character Recognition (OCR) engine into open source. You might wonder why Google is interested in OCR? In a nutshell, we are all about making information available to users, and when this information is in a paper document, OCR is the process by which we can convert the pages of this document into text that can then be used for indexing.

This particular OCR engine, called Tesseract, was in fact not originally developed at Google! It was developed at Hewlett Packard Laboratories between 1985 and 1995. In 1995 it was one of the top 3 performers at the OCR accuracy contest organized by University of Nevada in Las Vegas. However, shortly thereafter, HP decided to get out of the OCR business and Tesseract has been collecting dust in an HP warehouse ever since. Fortunately some of our esteemed HP colleagues realized a year or two ago that rather than sit on this engine, it would be better for the world if they brought it back to life by open sourcing it, with the help of the Information Science Research Institute at UNLV. UNLV was happy to oblige, but they in turn asked for our help in fixing a few bugs that had crept in since 1995 (ever heard of bit rot?)... We tracked down the most obvious ones and decided a couple of months ago that Tesseract OCR was stable enough to be re-released as open source.

A few things to know about Tesseract OCR: for now it only supports the English language, and does not include a page layout analysis module (yet), so it will perform poorly on multi-column material. It also doesn't do well on grayscale and color documents, and it's not nearly as accurate as some of the best commercial OCR packages out there. Yet, as far as we know, despite its shortcomings, Tesseract is far more accurate than any other Open Source OCR package out there. If you know of one that is more accurate, please do tell us!

We are grateful to all the people at HP who made it possible to release Tesseract into open source, and especially John Burns, who championed and babysat the project. We would also like to thank the original Tesseract development team, a partial list of whom is here. Last but not least, many thanks to our friends at UNLV's ISRI, including Tom Nartker, Kazem Taghva, Julie Borsack and Steve Lumos, for all their help with this project.

Several Python developers came together at both Google's Mountain View and New York offices last week for a bi-coastal Python Sprint. Our stalwart sprinters worked on everything from enhancements to the Python-3000 interpreter to triaging bugs and improving unicode testing for Python 2.5/2.6. If you'd like to learn more, check out Guido van Rossum's Python Sprint Report.

Following on from last week's Linux World convention in San Francisco, Google hosted The Ubucon, an informal conference for Ubuntu hackers, enthusiasts and professionals. We had about 70 members of the Ubuntu community, from novice users to Cannonical staff members, in for presentations, general discussion of Linux and FLOSS, and, of course, why Ubuntu rocks. Check out The Ubucon Blog, Corey Burger's write up of the conference, or the Ubuntu community page to learn more about what's shaping up to be an annual event at the Googleplex. Our thanks go out to John Mark Walker, the conference organizer, and the community for coming together to make The Ubucon such an awesome event.

We're excited to announce the availability of the Google Base data API, which lets you write applications that dynamically interact with Google Base. You can insert, edit, or delete items programmatically, complementing existing input means like the Google Base front-end or the bulk upload mechanism. You can also query other users' published content and access their items via the API. This enables you to create domain-specific search applications (or mash-ups) combining Google Base content with other services.

Today we're announcing the launch of the Google Developers Event Calendar! You can use it to see a schedule of upcoming developer events where Google employees will be speaking about open source, Google APIs, and all things code.

Most users will find it easiest to add the Developers Event Calendar to their own Google Calendars. To do so, simply follow these steps below. If you have any questions about using Google Calendar, please refer to the Google Calendar Help Center.

Click this button:

If necessary, log into Google Calendar. Note: Logging into Google Calendar requires a Google Account. If you use Gmail, Google Groups, or other Google services, you already have a Google Account. Simply use the same login and password.

Once you are logged in, choose Yes, add this calendar to add the Google Developers Event Calendar to your Calendar. You should now see Google Developers Event Calendar events listed on your Google Calendar.

Adding in another format

If you'd prefer to view the Google Developers Event Calendar in another application such as a feed reader or a product that supports the iCal format (like iCal for Mac), please click the relevant link here to obtain the URL:

The Google Ajax Search API is designed to work seamlessly with the Google Maps API. One way it adds instant value is to allow your Maps-based applications to execute a search, then take the search results and plot them on a map. Our model for this is simple and straightforward — each search result is a JavaScript object that contains a number of properties including a URL, title, array of phone numbers, street address, city, latitude and longitude, etc. Therefore, adding a search result to a map is as simple as:

The AJAX Search API team produced a number of simple sample applications to teach the basics of search-integrated Maps mashups. Two of the most popular samples are My Favorite Places and My Phone List, so take a look and see if they inspire you to add Search to your Maps mashups!

Last Thursday at the O'Reilly Open Source Conference, we announced availability of project hosting on Google Code; our goal is to provide a service to help foster innovation and support Open Source projects through simple, easy-to-use and reliable tools.

Currently, there are several thousand projects underway and we're very pleased with the enthusiasm shown by the Open Source community. So thank you for your support, especially those of you providing valuable feedback. If you haven't created a project, give it a try. We look forward to incorporating your feedback too.

For more information please take a look at our FAQ or join in the discussion on Google Groups.

We've seen a lot of great gadgets created for the Google homepage recently. Topping the list of our Top Gadget Developers is Caleb Eggensperger. Caleb is a 16 year-old student at the Arkansas School for Mathematics, Sciences & the Arts. He's famous for his Countdown gadget, which our users are crazy about. Go figure.

Its a classic mashup API that lets you easily add search to your site, but we have done this with a twist... We make it VERY VERY easy to remember or "clip" a search result onto your page.

Why did we do this?

We observed countless interactions in email and message boards where a question is being posed, e.g., "Does anyone know of a good Sushi place in Santa Barbara?", or, "What kind of fancy new camera where you using at the game the other day?", or, "I am thinking of putting Campy Compact Cranks on my bike. Do you think this is a good idea?", or, "We just stayed at The St. Francis in San Francico and had a great time?"

Often times, the most accurate way to answer or add value to these discussions is with a search result. When responding to the Sushi question, a Google Local search result provides the name of the restaurant, the address, its phone number, as well as a link to the landing page on Google Maps. The result also contains the lat/lng coordinates so that if you have a map available, plotting the result on a map is trivial.

When developing the initial mockups and ideas for this API we built a very powerful demonstration, based on phpBB. What I did was change phpBB to include our little search control and made it possible to include search results into a post.

The changes to enable this were trivial... All I had to do was change the subSilver/overall_header.tpl to include our stylesheet, and then subsilver/posting_body.tpl to fire our control and process clip events, and serialize the clipped content on submit.

I have included two screen shots. The first is a reply to a post about Sushi places near Google. Note that the reply contains to local search results.

Obviously, I could have left phpBB, looked up the Akane in Google Local, futzed around a little to get local to produce a url, and then paste the URL into the response. This, in my opinion, represents, "The Old Way"... something that only the tech savvy can master. In the real world, cut/paste, mastering multiple windows, are not skills that we can or should take for granted.

With our search control, seamlessly integrated into phpBB, I type "Akane" into a search box, then click the "copy" button. The resulting post content content is shown in the first attachment.

The second screenshot shows the editing experience. I took 300px to the right of the compose form and added in our search control. Its very simple to use and fits in very nicely with the rest of your app.

When I show this demo to people, they all instantly "got it" and understood how much more valuable message board interaction could be when search results are a click away. Now granted, this isn't something that everyone would use in every single post, BUT I think everyone who saw this felt that this is the kind of thing that they would definitely use once a day in either an email, blogging, or message board environment.

I had never seen the phpBB code before. I simply unzipped it, set up a database, and within an hour, had found the three or four touch points that I had to edit in order to enhance it with this new capability. I think it dropped in very easily and naturally. It would be very cool to see this out in the wild, and I would be more than willing to help you guys get up and running, get started, whatever you need.

As part of the GSoC mid-term evaluations, we asked our mentors to give us a review of everything from their students' progress to date to their favorite color. We had a great session at OSCON 2006 where we shared the aggregate results of the surveys and some additional statistics for the program. For those of you who weren't able to attend, we've posted excerpted slides for your perusal.

And since you'll no doubt be wondering, our mentors overwhelmingly prefer blue in all its various hues.