Names You Need to Know: Open Source Search Solr and Lucid

Our civilization may pride itself on the amount of information we create - more than every conversation, ever, in a couple of years; enough to jack The Library of Congress to the Moon on data-packed CD-Roms, take your superlative - but we’ve also made a holy mess of it. It is mostly, as they say, “unstructured,” meaning as random as your last 25 emails, your tweets, and all those spreadsheets and documents piling up at work. That’s why Solr/Lucene and Lucid Imagination are names you need to know in tech.

Solr is an open source software project for enterprises is another search; Lucene is the accompanying software “library” that was joined with Solr. Lucid Imagination is a private company founded by Solr’s inventor and a couple of other tech heavyweights to sell a stable version of Solr to companies. Its product was launched in 2009, and Lucid already has 190 clients, including Hewlett Packard, Cisco, and Salesforce.com. Thousands of other companies (including Twitter and MySpace) use Solr on their own; collectively, it may be a larger data search project than either Google or Microsoft’s Bing.

That’s because of the phenomenal amount of data that lies behind firewalls, inside corporate and private databases, or is otherwise unseen by the Web crawlers. “The U.S. Department of Energy’s database is probably larger than the Internet, just by itself,” says Marc Kellenstein, Lucid’s Chief Technical Officer. “There is a far bigger amount of material out there that has to be searched, all the time.”

Technically, it’s a captivating problem too; internal data doesn’t come with the tags, URLs, and metadata marks designers use on the Web precisely so search engines will see and catalog them. There is also a lot of language parsing for the specific domain of a database (for example, does the word “bob” refer to a person, a hairstyle, or a bobcat?). On the other hand, internal search doesn’t have to deal with link farms, spammers, and fakes the way search engines do.

“Enterprise search is almost the opposite of Web search,” says Lucid chief executive Eric Gries. “For them, scale and performance matter most, while relevance comes third. In an enterprise, we’d better hit a query right, but it doesn’t matter if it takes a few seconds. Even Google (which sells its own version of corporate search) keeps two different code bases for the two jobs.”

Solr could arguably scale up, and take on Internet-type search, but Gries, who came to the company not long after its 2008 founding, says there’s no plan to take on the giants. “That’s Google and Bing -- they do a good enough job, and they are giants. Besides, that’s an ad-based business.” Lucid makes money by offering a standardized version of Solr, plus technical support and add-ons like security, spell check, and more recently data analysis via other open source projects.

Products start at $36,000 a year, and the average selling price is about $100,000, says Gries. That compares with something closer to $1 million at traditional corporate search competitors. Lucid has 38 employees, funding of $16 million, and, according to Gries, “double-digit millions” in revenues. t would be profitable soon, adds, except for a planned international expansion.

“This is the next step of databases,” he says, doubting the future of his traditional, and far larger, competitors. “It disrupts Autonomy, Endeca, and Google Enterprise Search -- as far as I’m concerned, it’s just a question of time.”

Bold talk, but there is something to what open source can do. Linux reshaped the server market. Oracle may have its eye on MySQL when it purchased Sun (that open source project was worth $1 billion to Sun when it was independent, just as a way of talking to a new customer base.) Even if Lucid goes half as far as Gries hopes, it will change business for everyone -- and Solr is around come what may. That’s why Solr and Lucid are names you need to know.