Editorial: So You Want to Write a Search Engine?

We're frequently approached by companies who want to tell us about their new search engine, or who would like help in making their new engine a success in the market. Of course we really like search technology, so we're happy to talk shop. And sometimes we even offer advice and assistance. Heck, we even started a search engine of our own back in the dot-com days, anybody remember SearchButton.com? But times have changed, the rules have changed, and in hindsight we would have done things differently.

But if you're in the midst of creating a new search engine, we have some free advice: think long and hard about it. There is still room for innovation, so we're not saying "don't do it" right off the bat, but you'd be surprised how many other folks are doing this, and how bad the odds are of phenomenal commercial success. Yes, Google did unseat some of the established players, so it IS possible, who remembers Lycos and Alta Vista?, but we believe the odds of any individual upstart unseating a major player are really low.

The GOOD NEWS is that there are possibly more lucrative channels for this energy and vision. If you have the knowledge and drive to create a search engine, we're simply suggesting other avenues for your talent, other than insisting on opening your own MyNewSearchEngine.com.

Top reasons folks want to build a Search Engine, and our usual response:

I wrote an engine in grad school, it's awesome!

Hmm… Did you go to Stanford? No. MIT? No. Sadly it may be that nobody cares. To be fair, there is a proud tradition of class projects transitioning into search companies, see the "History" section below. But for most of them, those were the earlier days, and now there's a lot more established players and other grad-student projects out there competing. Being from a really top name brand school does seem to help.

This all seems horribly unfair, your engine probably IS really cool, we STRONGLY empathize with you, but seriously, pack up your code into something you can bring to a job interview at one of the established players. It's not so bad, they probably offer free soda!

Still want to move forward?

Also, folks like Google, NetFlix and TREC do run programming contests, do consider entering them.

And if you do get a job at one of these companies, you'll be surrounded by like-minded people, so you will be able to innovate and learn, and help grow a culture of excellence – being a part of that would be something to be proud of.

I'm a Seasoned Engineer with Patents, and I have a new search engine that is really different

Disclosure: we are NOT attorneys, so these are just our non-lawyer opinions.

We talk to folks sometimes that seem unaware of what others in the industry are doing. Some people we talk to have ideas that were talked about 15 or 20 years ago, but they're really quite sure that they're the first. Just the other day I saw proprietary code from a vendor that was almost identical to something I wrote six years ago, and I'm pretty sure I wasn't the first.

Sometimes ideas are better the second or third time around, or somebody has an improvement, or maybe the packaging or UI is much better, so these are still good things.

We're hugely skeptical that folks are having ideas that nobody else has thought of before - this is really unlikely in the software industry. On the flip side, yes, there is plenty of room for improvements or refinements - bring it on! - just be honest with yourself and start by doing your homework. .

Patents not as Valuable as You Might Think
In theory the patents would help a little bit with funding, as some type of tangible asset, but in reality do you have a customer and revenue flow?

Again, this may seem a bit unfair, but patents are not as valuable as folks think; we're not saying they are worthless, but they do not guarantee anything, not even close.

What would be more interesting, a really good compliment to your Intellectual Property, would be a stream of paying customers.

Uniqueness of Your Ideas

Some of the patents we hear about just don't sound that
unique or enforceable. We get the impression that some patent
holders "don't get out much", with regard to what
others in the industry are doing, or are comparing themselves to
1990.

As we said above, have you really looked around at what others are doing? There's been a lot of search engine research over the years...

But assuming you're sure...

Have you worked past an initial "prior art" judgment with your patent application yet? Applications are often initially stamped with "prior art". The submitter then appeals that decision, and sometimes wins. Even at this point some humility is prudent, but this is at least further down the road.

Patents can be challenged by companies with lots and lots of money, and until or unless that happens, the patent is not a "sure thing". It's not just a question of some company infringing on your patents, but that they could argue that you are infringing on theirs. As your patents are scrutinized by other companies' hordes of lawyers, the chances of your ideas overlapping with part of their large patent portfolio increases.

We've been advised to think of granted patents as a defensive resource vs. as an offensive tool. If somebody challenges you, there is at least some ammo in your war chest. If, on the other hand, you start going after other companies, you may awaken their massive legal teams, and ultimately increase the chance of your patent being invalidated. This does sound a bit depressing, software patents are a very controversial topic, and vary from country to country.

A few companies do make it their business to acquire patents and then go on the offensive against other companies. Even if you ignore any potential ethical, legal or financial concerns, at a minimum this is certainly a different endeavor than starting a search engine company.

Your Patent(s) != Your Code-base
If you do have a truly unique and valuable idea, then we're certainly rooting for you!

But consider implementing your vision on top of one of the established players. There are ways to do this that still preserve your Intellectual Property and profits. Additionally, going with an established platform could conceivably reduce implementation efforts and time to market, as well as add perceived legitimacy to your offering.

If you think your code-base is just too complex or difficult to port or integrate with other engines, we'd suggest that this may be more of a symptom of the state of your code-base than of your original idea, and that perhaps a port of your code is long overdue! Some folks are really attached to their particular pile of spaghetti-code, and god help any new engineer that joins their team; not your code of course!

Summarizing our no-lawyer advice on patents:

Think revenue plus patents, not just patents

Think of patents as defense vs. offense

Don't confuse IP with source code

Stay up to date on what others are doing

I've already presented to VCs (or "Angels") and they said it was a great idea!

Did they actually give you money yet? If not, please read on.

VC's rarely say "no" - it would sometimes be helpful if they did, but maybe out of politeness they're never that candid.

Did the VC say something like "that is really interesting and has some real potential" or maybe "it's so good in fact you should really talk to ‘Charlie' over at XYZ Ventures, I'll introduce you!".

Sadly this is "no", not a "yes". It sounds like a "yes", it certainly does, but it's very likely a "no". Many entrepreneurs, mistaking the very polite "no" for a "yes", go on to firm after firm, pitching their idea, and taking encouragement from the continued "yes"s that they keep hearing.

Did the group you presented to mention any details about them giving you money at some specific point in time? Or that they would give you money after some very specific milestone, possibly after some "due diligence"?

This is a "maybe", not a "yes" - but a "maybe" is still certainly better than a "no", so this is a start. At this point you need to do exactly what you promised to do (or exceed your promises), with regard to milestones or deliverables.

We're not saying the funding situation is "hopeless", there IS some funding going on these days, but the problem of a "no" being mistaken for a "yes" can cost years of time and angst – we simply suggest that people recognize a "no" when they hear it.

I know a particular niche for search – a target market that is underserved by the big guys!

Great, but use one of the established players to implement it. Just because you know how to solve the search problems in a particular niche doesn't mean you need to run off and build a brand new engine! It's wonderful that you have this insight, but consider implementing it on an existing platform.

Granted, it can sometimes be a challenge to negotiate with a vendor in order to base your idea on their platform. Approaching a Tier 1 player directly may or may not be the best way to accomplish this; sometimes it's hard to get their attention. But if you have a niche with a paying customer or two, it will certainly help. There are also Tier 2 and Tier 3 players, as well as open source software, that might serve as a decent base.

If you look, there should be a way to deliver your vision on top of an established platform. Starting from scratch should really be a last resort. Sadly search engine writers rarely take this advice, choosing instead to reinvent the wheel, the axel, internal combustion and even pavement.

My vision goes way beyond "search", it also includes…

Yes, we believe you. Many of the new innovative ideas go well beyond search. You are wise to see that there is search tie-in, and also wise to have seen that you have a differentiator. This is all really good!

But why not base your enhanced solution on top of an existing platform? Or at least make it "interoperate" with an existing platform. Some technologies might require highly specialized data indices of some form, which cannot be easily jammed into a host search engine's proprietary format, so at that point there could be some real technical obstacles; but perhaps the system could still leverage other parts of an existing search system.

I already have paying customers!

This starts to get interesting - having paying customers is great! But why do you think that means you need to create a new search engine?

Again we would ask "why not build this on top of an existing engine"? Heck, package your technology up to work with 2 or 3 of the big players, and maybe exhibit at their trade shows.

And, by the way, many vendors offer generous reseller and referral programs. If you've demonstrated talent in attracting customers, you might consider that as a more valuable skill than "coding".

And finally... "Our IT Department Says They Can Write One!"

Granted, this last one was a bit dated - but now it's mutating and staggering back to life.

10 years ago IT departments in some large tech companies did have a tendency to write their own search engines. Sometimes innovative, but more often some glorified combination of Perl and "grep". This was particularly ironic since those same departments also reported being super busy, so we wondered when they would have time to create these "masterpieces". This was a bad idea back then, and most companies seem to have outgrown it.

But this trend is making a comeback, albeit in two new flavors:

IT departments now claim to be able to "whip up" Search Analytics tools. A little Perl, grep, awk and some duct tape... it practically writes itself! :-)

Some database fans have decided that MySQL is a perfectly fine search engine, it does have the "LIKE" operator after all!

We don't mean any slight to the tech folks out there - we're techies too! - and most of you certainly could reinvent Search Analytics. Our question would be "for goodness sakes, why would you want to!?" Search Analytics has matured to where it is more about Marketing, Sales and Content Management than technology. A lot of thought has been put into specific reports and presentation issues, role-based logins, etc.

Think about it. If you roll your own tool instead, those "Marketing people" in your company are going to want all kinds of reports... sorted this way and that... and then Sales will want reports, maybe almost the same reports, but with slightly different columns... and we haven't even begun to about the fonts! And no, they are not going to use the "report creator GUI front end" you've just added - a wonderful bit of code I'm sure - but on such a different wavelength.

Trust us, if your company is willing to spend money on a reasonably priced search analytics tool, count your blessings, this will save you time over rolling your own. And don't be fooled, Search Analytics is different than just tracking page views and general web traffic; make sure you get a product that has fully vetted this.

The other stories we hear lately are about MySQL being used to drive search oriented sites. For extremely rudimentary string matching it may be OK; and yes, we do know about the "LIKE operator", thanks! Rumor has it that one of the big photo sharing sites even uses it for tag matching. But what about synonyms? documents vs. database records? documents in different formats, navigators, etc? This is another area where an evolved software package will save you time.

Many search engine companies started as academic projects that were then turned into companies. Some of the companies even let the PHD or professor become the CEO of the company, with mixed success. Of course other engines are started by more traditional means.

Search engine companies usually sprout from one of these four areas:

School projects / University research

Existing business / entrepreneur decides to create an engine

Government intelligence / research

The Open Source Community

By our estimates, the first two sources are about split, with # 2 having a slight lead. Given the advanced nature of search engine software, it's not a huge surprise that it is so heavily weighted towards University research. With regard to door # 3, there could be many more that are classified. Source # 4 is incubating a lot of new ideas, and I wonder if someday it may overtake Universities as the seeds for innovation (or some type of symbiotic partnership)

I'd like to thank John Lehman and Marcelline Saunders for filling in some details. Some facts on the older engines were taken from Robert Kuhn's very thorough 1996 paper "A Survey of Information Retrieval Vendors" (Sun Microsystems) Some of newer facts were drawn from Wikipedia

So Did We Talk You Out of It?

No, I didn't think so! We do love to "talk search", so if you do decide to "roll your own", or if we've convinced you to build something great on top of one of the big guys, we'd love hear about it! We do believe there are still many innovations on the horizon, we just don't think each one justifies its own separate search engine.