Lucid Imagination

After the big news in the enterprise search space about Autonomy and FAST last week, the announcement of Lucid Imagination raising $6 million may seem anti-climatic. That’s nothing compared to the $775M Autonomy spent to acquire Interwoven, let along the $1.2B that Microsoft paid for FAST last year. And can Lucid Imagination really succeed as the Red Hat of enterprise search, making money by supporting open-source Lucene and Solr?

Perhaps. Lucene is certainly popular among folks looking for a free search engine. Moreover, for people who want to tinker with it, its being open source is a big plus.

But Lucene deployments require extensive customization. This is often the downside of open source, and the reason that industrial use open source software often involves a significant transfer of funds from enterprises to consultants. In contrast, closed-source solutions tend to come with more tooling, integration support, etc. Those are the sorts of details that don’t necessarily excite open-source developers but are crucial for enterprise software.

Will Lucid Imagination revolutionize the enterprise search market by providing low-cost services on top of free software? Perhaps, though I’m skeptical–and not just because they are a potential competitor to my employer. If they are to be more than a body shop, they’ll have to productize their customization efforts. But I’d imagine that, if Lucid Imagination were to build such products, it would contribute them back into the Lucene code base. That might be great for customers, but it’s not clear how it translates into a sustainable revenue model.

It’s also worth noting that Lucid Imagination isn’t the first company pursuing this model. Sematext, founded in 2007, is another company implementing solutions on top of open source software, including Lucene. In fact, its founder, Otis Gospodnetic, is a regular at The Noisy Channel. Perhaps he can comment on the space.

18 responses so far ↓

I don’t think it’s their aim to compete with Enterprise search directly (business suicide), though I suspect they might pressure the pricing in the mid-market of search. The small market has been mostly eliminated by Google/Yahoo/MSN site search and open source engines.

Note also that they do not seem to (yet) provide support for Nutch or Droids.. meaning that they are missing a spidering/crawling engine. Same with Tika (office document support). Search result clustering may be coming soon via SOLR-769. No content-management or versioning. (These are fixable pieces given all the open source out there)

There is no good native support for rich taxonomies in Solr/Lucene, nor is there native support for some of the interesting semantic-web data driven features. No self-learning or auto-personalization of results. No analytics (though one could go elsewhere for that).

Lucid is also not offering a hosted Solr service .. so they are not an SaaS play either.

All that said, they obviously have some huge wins within the software industry.. but it’s a tough road to go after accounts like Home Depot, Albertson’s, or the government entities.

Enterprise search is mostly about finished feature sets and a near full admin GUI for non-programmers. The question is in these lean economic times if a given customer considering “build versus buy” is willing to risk starting a professional services engagement to build what they want for cheap, versus purchase a commercial ES product with way more features than they think they need.

I do think that a smart customer will have new leverage during the sales cycle to credibly threaten the ‘build’ option and get the ‘buy’ price down. And Lucid certainly should affect the ability of the ES companies from getting a customer bought in then milking them for professional services, integration and customization fees… Lucid provides a credible switching threat to cut bait and start over.

Google, Yahoo and open source projects like Lucene have commoditized basic search, so ES is about value-added features, innovative R&D and taking away customer pain and complexity.

Some of the people in Lucid have big plans (Grant Ingersoll comes to mind), and there is absolutely no question that Lucene has made some search vendors look like dinosaurs with slow engines and archaic index structures.

It will be some time before open source catches up to ES.. but it just might not be as long as some would hope.

There is room for both the ES vendors and companies like Lucid or Sematext. Which road a client chooses to take depends on a number of factors, such as initial price (obvious), TOC, feature set, speed of engagement and delivery of solution, presence of in-house search expertize, so on. ES vendors clearly have more features, more user-friendly UIs, etc., but I think it’s a matter of time when Lucene & friends catch up or at least significantly close the gap. Initial price is clearly on the OSS side, while the TOC is debatable. Speed of engagement is probably on the side of OSS and companies like Sematext and Lucid – I imagine ES vendors go through long and expensive sales cycles.

Here is a question for Daniel. ES currently have more features and are more polished. But there must be clients who need a feature that an ES solution doesn’t exist. What’s the percentage of such customers and what do ES vendors do for/with them? Do they build this custom functionality? If so, do you think this ends up being a lot like the Sematext/Lucid model, but where the customer still needs to purchase the expensive ES vendor’s kitchen sink/core? Or are ES vendors’ offerings sufficiently modular that purchasing modules a la carte is inexpensive or comparable in terms of TOC to what one would spend by using Lucene/Solr/Nutch/Tike and hiring an external company to provide custom pieces?

Neal, I think you’re right that Lucene and Solr are commoditizing basic search, raising the bar as to what constitutes basic search, and thus applying price pressure to the overall market. And vendors like Sematext and Lucid Imagination make Lucene and Solr more palatable to enterprise customers that have no desire to develop in-house expertise.

Otis, thanks for taking the bait. 🙂 I agree that time erodes differentiation–as I was telling Matt Asay, this is a healthy dynamic, as it drives innovation. That’s not only good for customers, but also good for me personally, as my job is to deliver innovation!

As for your question, I don’t have percentages, but Endeca’s customers run the spectrum from hosted solutions provided by a partner (e.g., http://www.thanxmedia.com/) to customized applications jointly built by our customers and our professional services department.

But over time, we’ve been reducing the customization costs in three ways: packaging up vertically-focused solutions, exposing a modular framework to customers and partners, and making a fair amount of customization available to non-technical users through graphical tools.

I believe we give our customers good value. Clearly we are not the cheapest option in the space–at the lower price points, it’s mostly a fight between the Google Search Appliance and open source solutions. But we have no trouble finding customers who think the former is too limited, and that the latter is not fully featured enough. I expect you guys and Lucid Imagination to make the latter a tougher fight, but, as I said, that will just push us to keep innovating. It’s a win for everyone.

It seems like the “enterprise search” market is really two distinct solution sets. On the low end is the customer that says, “I just want a better search box” – without spending a lot of money. On the high end, people are trying to drive significant business value by trying to provide better access to corporate (and/or web) information. The former is a low price point, with low customization, with a clear user experience in mind (“search box + 10 results”). Google fulfills this need from both a technology perspective as well as a brand/support perspective – “nobody ever got fired for buying Google”. And the appliances are not very expensive.

On the more strategic side, you have information-rich, experience-rich applications being delivered by the ES vendors. But at significant costs in data engineering, 1:1 engagement with the customer, and application construction (costs vary between vendors of course, but it’s much more significant than installing a box and turning it on). These apps don’t “shrink-wrap” easily.

My question around open-source, Red-Hat style companies in the enterprise search space is, which are you? At the low end, there’s little to no consulting revenue and Google a is safe, inexpensive option for an IT manager to buy. At the high end, you are competing with organizations that have made much more robust toolsets to handle these kinds of applications. I am not sure which direction Lucid intends to go – but to me it seems they face a stiff headwind either way. I’m not sure there is a middle road.

Does open source Lucene/Solr than require more outside assistance than comparable commercial products?

No.

(I’m tempted to stop there, but that’s not quite fair.)

That was, in essence, a question posed by Daniel Tunkelang above. It’s a fair one to ask and an issue that has been raised before.

First, some distinctions. Lucene is a Java search library with best practice indexing and query capabilities that have been continuously improved over the past 10 years. It’s cleanly implemented and not very hard to embed in an application, as thousands of Lucene-based application developers have discovered. But it’s only a library, so there is a certain amount of infrastructure that you have to build around it if you want to create an application, especially if you want something that’s flexible and easily maintained.

Enter Solr, released by Apache in January, 2007. Solr is a scalable search platform. It puts Lucene over http and adds infrastructure such as schema-based management of fields (including how they’re defined and processed), an admin interface, cache management, tools for replication and data loading, logging and statistics and more. It also adds faceting, which is not an infrastructure improvement but an additional user-visible feature for doing the kind of categorizing of results that has become very popular at places like Amazon.

With Solr you no longer need to write a Java program. One of the popular talks at ApacheCon is ‘Solr out-of-the-box’ – all the things you can do with Solr without writing code. People can try this out by downloading the Solr tutorial included with the Lucid Solr download or by downloading it from Apache. Does Solr have all the packaging and features found in the best commercial search engines? Not today. But neither does every user need all (or even most) of them. What’s more important for most developers is how easy it is to turn a search platform – commercial or open source – into an application, and how easy or hard it is to maintain it. Lucene/Solr is well written software with a modular architecture, and it’s easy to plug in or unplug various available modules … or, if needed, to write new ones. Disk overhead, relevancy ranking, throughput and query speed are all on par with, and often superior to, the best commercial engines. That translates to less needed customization and ongoing attention to a Lucene/Solr application.

There are also the obvious advantages to open source: complete access to the code to figure out what’s going or make a change that you want, when you want it…an active community that responds to questions or problems…and no license costs.

Q: Do open source projects often require consultants?

In my experience, no more than commercial applications, some of which pretty much require the vendor’s professional services group to make a complex product usable, or to make needed, even trivial modifications, over and above the license cost of the product. Consultants may ‘stand out’ more in some open source situations… but that’s because they’re often independent consultants, rather than the paid services a commercial company provides with its product.

Q: Might you be inclined to hire consultants for open source applications?

I think yes…but that’s because you have the freedom to hire them to get the application to do just what you want, and getting search to be really good is still an art. Much of the work we’ve done at Lucid has been to extend it for customers who wanted an extension when they needed it, or to assist them with planning their search application.

Q: So…is there any down-side to open source Lucene/Solr?

Yes…but it’s not really about complexity or customization. What you don’t have with Lucene/Solr is a comprehensive support contract – Apache doesn’t provide them. It’s solid enough (and has source available) that this hasn’t stopped organizations of all sizes from moving to Lucene/Solr, but community and self-support are not for everyone. What you also don’t have with Lucene/Solr is a single assured place to turn to when you do need expert assistance, or want training. Those are gaps Lucid Imagination hopes to fill by providing high quality responsive support and services from a staff with over 70 years of cumulative experience building search applications.

Marc, thanks for sharing your thoughts here! I am delighted to see you and your team stepping up, and I wish you luck in your venture. I’m not sold on your pitch, but I’m sure the skepticism is mutual. 🙂 I’m sure the market will sort it all out, and that we’ll both find our corners in the sky.

I’d like to chip in – I run another open source search engine company, based on Xapian, not Lucene (although we’ve worked with Lucene in the past and respect it as a technology).

There is certainly a revolution brewing in enterprise search; customers are no longer happy to pay for non-scalable and inaccurate search software, especially if the charging model is out of date (i.e. they’ve got a million documents and the price on the vendor’s site is ‘£call’ – ouch!). They also want to see inside the ‘black box’ – the basic technology of an inverted index, relevance ranking etc. has been around for years, there’s no point hiding it away and trying to convince your customers it’s some kind of magic.

Customers I’ve talked to in the last twelve months have told me that the big enterprise search vendors can be arrogant, inflexible and unresponsive to their needs. I agree with Marc that it’s often impossible to solve your problem without paying for consultancy – so why not just go open source? What you really need is the supplier’s expertise in creating indexing schemes, performance tuning, ranking tweaks etc.

Perhaps the eventual winners in this game will be whoever offers the best customer service – and in the open source game, when your software is free, this is all you’ve got.

Hi Marc – good to hear from you again! As with Daniel, I wish you and your team the best – in today’s economy we need as many job-creation engines as possible. It will be interesting to see as things go along where Lucid finds it’s sweet spot in the market!

These comments still leave me wondering what market Lucid will be aiming at.

If I look at the scenarios for enterprise search we describe in our reports, Solr isn’t that good a match for most of them for two reasons (there are more, but I try not to take up too much space): connectors (to the various repositories), and languages (and I use that as a very broad indication of all sorts of functionality customers would expect).

Now, you could of course take Lucene (or Solr) and add-in stuff you license from a third-party (those Entropysoft connectors aren’t just used by Endeca, after all). Or, if you were to be some software and consulting giant, you might already have stuff lying around to add in the mix. Nobody has to know it’s Lucene-based — it comes down to good marketing.

Since there’s already at least two of those products around, and reading what’s been said above, I’m gathering Lucid isn’t going to go that way.

On the other hand, Lucene has seen great uptaking by developers and is starting to be the OEM search component I encounter the most. I completely get Marc’s comment (“What’s more important for most developers is how easy it is to turn a search platform – commercial or open source – into an application, and how easy or hard it is to maintain it”). No doubt that’s why many developers are now picking Lucene when they need search in their software.

But in that case, Lucid would be mostly targeting the OEM market — and Solr doesn’t seem to be the most logical choice there (the core Java library — or the .NET port — would do just fine for that).

So, basically, I’m just confused. (Well, not really confused, just hoping to be enlightened :P).

On a side-note, I find it interesting that so many people seem to accept so easily that Google is a cheap solution. It’s not, really. For what many of their customers invested, they could have gotten Lucene, support from Lucid Imagination for a couple of years, a Ferrari, and fuel to drive the Ferrari from Amsterdam to Rome every weekend.

Perhaps that just teaches us that it’s easier to sell management on yellow boxes with a Google logo than on red boxes with a prancing horse. But it’s quite possible there are more sensible conclusions to be drawn 😉

I confess I am not completely sure of pricing on the Google appliance and don’t have any direct experience, but according to the Google website you can buy a Google mini for 2 years lease, that handles 100,000 documents, for about US $4,000. I don’t know if there are any “gotchas”, but that’s pretty close to “free”. And if you take into account paying for support from an open source vendor, I expect the prices would be nearly equivalent. Now, the Google box cannot be customized, at least that I know of – but then if all you want is “a search box and 10 results”, there isn’t much customization….

My impression on Google is that it is relatively inexpensive as an entry-level solution ($4k, and allegedly no need for services) but that the cost quickly goes up if you need to do anything more complicated.

I do recall Stephen Arnold commenting on Google’s enterprise solutions not being as cheap as many people believe them to be, but I can’t remember in which of his abundant blog posts he talked about it.

Mark – I’ve encountered a few GSA boxes in the past and know how much was paid for them. They were not cheap. Adriaan is correct. Perhaps the Mini is cheap, but prices, while negotiated with each (bigger) customer, go up, I think exponentially. For example, I know one case where a company paid $250K for a GSA limited to 2M documents.

Dan: Have you tried to use Lucene, or FAST, or Autonomy to get real work done?

Anyone else: Have you tried Endeca, FAST, or Autonomy?

I simply can’t compare, having only used Lucene among these offerings. I can try to draw analogies to open source vs. proprietary in much bigger markets like web servers (Apache vs. the world), databases (MySQL vs. Oracle), and operating systems (Windows vs. Mac vs. Linux vs. Solaris).

Whether companies around these tools every make any money is a different issue as to which technology eventually dominates in which sectors.

I don’t care for all the fancy admin interfaces aimed at civilians, but then I’m a programmer and pretty much insist on being able to automate with scripts (yes, I do know about Windows macros). For instance, I never use the Tomcat GUI manager on the web, I just script to the web service. I don’t create a MySQL database with their GUI, I write and execute SQL. How else could I ever reproduce my results?

I’ve also gotten very used to having the source for products I’m using like Lucene so that I can see what’s really going on. (It’s critical for Lucene, because Lucene’s doc and web site org is one of the worst I’ve ever seen, so I’m hoping Lucid spends some of its $6M cleaning this up and contributing back to the community.) I even find myself opening up the Java source to figure out what’s going on in tricky APIs.

While I’m pretty familiar with Endeca’s technology, my familiarity with the others is only from reading their documentation and what others have written or said about them. In any case, I’d hardly be credible as an impartial evaluator.