IndexTank, a Hosted Search Startup Launches

IndexTank, a San Francisco-based startup, announced Tuesday that it’s offering a hosted search API that would make it easy for app developers and web services to add instantaneous search in a jiffy. The company, started by a former Inktomi engineer, received $1.6 million in funding from Harrison Metal, Freestyle Capital, and Baseline Ventures.

Unlike some of the more recent real-time search startups that flamed out, IndexTank is taking a decidedly mundane approach to the business: sell its service to other service providers and stay in the background, a formula that proved to be effective for Inktomi, a search engine company that made its name in the first version of the web. IndexTank CEO and founder Diego Basch was senior software architect at Inktomi, where he focused on web crawling and indexing.

The company is one of the growing numbers of “infrastructure as an app” companies we’ve been following for the past year. These companies offer building blocks to hackers building mobile apps, e-commerce site operators and digital media publishers. IndexTank search API is free to try, is pay-as-you-go and is a model very similar to that adopted by Heroku, the Ruby-on-Rails platform-as-a-service. It competes with self-hosted search tools such as Sphinx, Solr and Xapian.

New World, New Search

In order to understand why the world needs a service like IndexTank, one needs to take into account the state of the Internet. Today, it’s not uncommon to find folks criticizing Google, which a decade ago, was viewed as our digital savior. The same broadband connection that allowed people to become addicted to Google, has also allowed proliferation of user-generated content, which made the web two-way, as well as bigger and more complex.

The search needs of today are very different from the search needs of 10 years ago. In 2000, the Internet was growing at a much more modest pace. The websites were mostly static. It was only services like Amazon that created webpages that numbered hundreds of thousands.

Today, the number of webpages has increased by several orders of magnitude. We are looking at a world where, thanks to tools such as WordPress (see disclosure) and Tumblr, anyone can start creating web pages. A decade ago, it took a lot of work to publish a web page. Today, it’s as simple as type-and-publish.

Twitter updates, constantly changing Facebook pages, Quora answers, dozens of always-updating photo services such as Flickr and plenty of video services, are just a surface view of what is a very dynamic, two-way web that is growing at hyper-speeds. In short, the web of today needs a new kind of search: search attuned to help us navigate through the data smog, created by web pages and web objects.

Time to Take Action

Basch decided it was time for him to do something about the pollution problem for search. After leaving Yahoo, which bought Inktomi, he started a consultancy based in his hometown of Buenos Aires, Argentina. His core business was building bespoke search solutions for other companies using open-source solutions such as Lucene.

Unfortunately, those open-source projects were developed during what were clearly sedate times in the history of the web. “At Inktomi, our index updated once a month,” recalled Basch. These days, he says that because of increased content velocity, all indexing has to be real-time: incremental updates to the index that reflect those changes. So, in 2008, he decided it was time to take a different approach to search.

He shut down his software consultancy, moved back to Silicon Valley and started working on building this search product, now known as IndexTank. “Search right now is where cellphones were before the iPhone, and we’re trying to change this fact,” says Basch.

How It Works

Basch says IndexTank has worked on making the search very fast, ensuring results take much less than five seconds. It has done this by looking at a document and treating constantly changing elements of a page – Facebook Likes, Comments, Twitter Count, Time & Date and even Location – differently from the static content. It then uses memory (and not storage) to store these changing-parts of the document. Today’s web document (or a mobile app) follows the following lifecycle:

A document is created and published.

It is instantly indexed and appears in search results.

The community starts interacting with the documents, and new signals start coming in: likes, comments, votes, page views, ratings.

The application (or website) tells IndexTank about this as fast as the app wants, even several times per second.

The new documents stay in memory, as most of the interactions happen early in the document’s life.

Documents that are older — say a week-old blog post — slow down their updating, and at that point ,IndexTank may send the text to disk for efficient resource usage. It still retains signals such as Likes, comments, and ratings in the memory.

So Will It Sell?

IndexTank created a test site using GigaOM as a data source, and I have to say, it worked as advertised. The search had sorted and displayed content based on timeliness, relevance and other such vectors. The service has worked well for reddit, one of the hottest social news sites on the web. I, for one, wouldn’t hesitate to pay for the service, as long as it makes life of my readers easy and boosts our overall value proposition.

The question now is whether or not other developers and web services will pay for the service. I think the biggest challenge for all Infrastructure-as-a-Service companies is that it’s very easy for them to sign-up beta customers and free users; getting them to pay is the hard part. I hope Basch and his group figure a way around that quandary.

Disclosure: Automattic, maker of WordPress.com, is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, Giga Omni Media. Om Malik, founder of Giga Omni Media, is also a venture partner at True.