I believe there are two ways search will improve significantly in the future. First, since talking is easier than typing, speech recognition will allow longer and more accurate input strings. Second, search will be informed by much more persistent user information, with search companies having very detailed understanding of searchers. Based on that, I expect:

A small oligopoly dominating the conjoined businesses of mobile device software and search. The companies most obviously positioned for membership are Google and Apple.

The continued and growing combination of search, advertisement/recommendation, and alerting. The same user-specific data will be needed for all three.

Enterprise search is greatly disappointing.My main reason for saying that is anecdotal evidence — I don’t notice users being much happier with search than they were 15 years ago. But business results are suggestive too:

Web search, while superior to the enterprise kind, is disappointing people as well. Are Google’s results any better than they were 8 years ago? Google’s ongoing hardwork notwithstanding, are they even as good?

Consumer computer usage is swinging toward mobile devices. I hope I don’t have to convince you about that one. 🙂

In principle, there are two main ways to make search better:

Understand more about the documents being searched over. But Google’s travails, combined with the rather dismal history of enterprise search, suggest we’re well into the diminishing-returns part of that project.

Understand more about what the searcher wants.

The latter, I think, is where significant future improvement will be found.

CMS/search (Content Management System) expert Alan Pelz-Sharpe recently decried “Shadow IT”, by which he seems to mean departmental proliferation of data stores outside the control of the IT department. In other words, he’s talking about data marts, only for documents rather than tabular data.

Notwithstanding the manifest virtues of centralization, there are numerous reasons you might want data marts, in the tabular and document worlds alike. For example:

Price/performance. Your main/central data manager might be too expensive to support additional large specialized databases. Or different databases and applications might have sufficiently different profiles so as to get great price/performance from different kinds of data managers. This is particularly prevalent in the relational world, where each of column stores, sequentially-oriented row stores, and random I/O-oriented row stores have compelling use cases.

Different SLAs (Service-Level Agreements). Similarly, different applications may have very different requirements for uptime, response time, and the like. (In the relational world, think of operational data stores.)

Different security requirements. Different subsets of the data may need different levels of security. This is particularly prevalent in the document world, where security problems are not as well-solved as in the tabular arena, and where it’s common for a search engine to index across different corpuses with radically different levels of sensitivity.

Integrated application and user interfaces. In the relational world, there’s a pretty clean separation between data management and interface logic; most serious business intelligence tools can talk to most DBMS. The document world is quite different. Some search engines bundle, for example, various kinds of faceted or parameterized search interfaces. What’s more, in public-facing search, a major differentiator is the facilities that the product offers for skewing search results.

At Lynda Moulton’s behest, I spoke a couple of times recently on the subject of where “semantic” technology is or isn’t likely to be important. One was at the Gilbane conference in early December. The slides were based on my previously posted deck for a June talk I gave on a text analytics market overview. The actual Gilbane slides may be found here.

My opinions about the applicability of semantic technology include:

The big bucks in web search are for “transactional” web search, and semantics isn’t the issue there. (Slides 3-4)

When UIs finally go beyond the simple search box — e.g. to clusters/facets or to voice — semantics should have a role to play. (Slide 5)

Public-facing site search depends — more than any other area of text analytics — on hand-tagging. (Slide 7)

Lynda Moulton, to put it mildly, disagrees with the Gartner Magic Quadrant analysis of enterprise search. Her preferred approach is captured in:

Coveo, Exalead, ISYS, Recommind, Vivisimo, and X1 are a few of a select group that are marking a mark in their respective niches, as products ready for action with a short implementation cycle (weeks or months not years).

By way of contrast, Lynda opines:

Autonomy and Endeca continue to bring value to very large projects in large companies but are not plug-and-play solutions, by any means. Oracle, IBM, and Microsoft offer search solutions of a very different type with a heavy vendor or third-party service requirement. Google Search Appliance has a much larger installed base than any of these but needs serious tuning and customization to make it suitable to enterprise needs.

I just found a year-old (almost) blog post from EMC executive Andrew Cohen that succinctly lays out his view (which he believes to mainly be a consensus stance) on e-discovery. Cohen is evidently both a lawyer and a honcho in document management system vendor EMC’s Compliance Division, which is probably relevant to interpreting his outlook, in the spirit of the old Kennedy School dictum that “Where you stand depends upon where you sit.”

Highlights included:

Information management is central to e-discovery.

In particular, auditability (my word) is central, if you want electronic documents to hold up as evidence in court.

Search is good enough, but it’s not the biggest issue in e-discovery.

E-mail archiving has reached the tipping point, and is increasingly a must-have, largely for its e-discovery benefits.

Attivio CEO Ali Riaz was previously CFO and COO of FAST. He tried to avoid involvement in the recent expose’ of his former employer. For his troubles he got a parking lot ambush, a big photograph, and some unflattering coverage. Read more

A Norwegian newspaper did an expose’ on FAST, dated June 28. Helpful search industry participants quickly distributed English translations to a variety of commentators, including me. TechCrunch posted a scan of part of the article.

The gist is that FAST followed a pattern very common in the packaged enterprise software industry: Read more

My lasttwo posts were based on the introductory slide to my talk The Text Analytics Marketplace: Competitive landscape and trends. I’ll now jump straight ahead to the talk’s conclusion.

Text analytics vendors participate in the same trends as other software and technology vendors. For example, relational business intelligence and data warehousing products are increasingly being sold to departmental buyers. Those buyers place particularly high value on ease of installation. And golly gee whiz, both parts of that are also true in text mining.

But beyond such general trends, I’ve identified six developments that I think could radically transform the text analytics market landscape. Indeed, they could invalidate the neat little eight-bucket categorization I laid out in the prior post. Each is highly likely to occur, although in some cases the timing remains greatly in doubt.