ClearForest/Reuters

Here’s something longer-lasting and weirder than Vertica’s “We sell turkeys” theme: Mark Logic, whose product is used primarily to help enterprises make their content more acceptable, doesn’t have a search engine on its own website.* Read more

Besides asking them technical questions, I surveyed Attensity and Clarabridge last week about text mining application trends, getting generously detailed answers from Michelle De Haaff of Attensity and Justin Langseth of Clarabridge. Perhaps the most important point to emerge was that it’s not just about particular apps. Enterprises are doing text mining POCs (Proofs of Concept) around specific apps, commonly in the CRM area, but immediately structuring the buying process in anticipation of a rollout across multiple departments in the enterprise.

It was tough to judge user demand at the recent Text Analytics Summit because, well, very few users showed up. And frankly, I wasn’t as aggressive at pumping vendors for trends as I am some other times. That said, I have talked with most text analytics vendors recently,* and here are my impressions of what’s going on. Any contrary – or confirming! — opinions would be most welcome.

*Factiva is the most significant exception. Hint, hint.

If you think about it, text analytics is a “secret ingredient” in search, antispam, and data cleaning,* and this dominates all other uses of the technology. A significant minority of the research effort at companies that do any kind of text filtering is – duh — text analytics. Cold comfort for specialist text analytics vendors, to be sure, but that’s the way it is.

*I.e., part of the “T” in “ETL” (Extract/Transform/Load).

Text-analytics-enhanced custom publishing will surely at some point become a must-have for business and technical publishers. However, it appears that we’re not quite there yet, as large publishers make do with simple-minded search and the like. In what I suspect is a telling market commentary, there’s no headlong rush among vendors to dump text mining for custom publishing, notwithstanding the examples of nStein and (sort of) ClearForest. I don’t want to be overly negative – either my friends at Mark Logic are doing just fine or else they’re putting up a mighty brave front – but I don’t think the nonspecialist publishing market is there yet. Read more

After missing what seems to have been an uninformative press conference anyway, I hooked up later with the Business Objects folks on the phone. I say that it was probably uninformative because in the short call, it was pointed out to me that they really weren’t at liberty to say much anyway. Here are a couple of tidbits I picked up even so.

Business Objects’ text mining partnerships have been more demo/sales-cycle than actual sales up until now. That said, they have a few deals each with Attensity and Inxight (but not with ClearForest, which pulled in its horns prior to being acquired by Reuters). I still think they’re the leading BI vendor in integrating with text mining, SAS perhaps aside (who if nothing else have a lot of fun using text mining for data cleaning). The working Inxight partnership, by the way, was all about the specific app of email compliance, with the demo being based on the publicly available Enron corpus.

Inxight’s visualization technology is in the form of an SDK anyway. So integrating it into BOBJ’s product line should be straightforward. Note: Through the Excelsius acquisition, BOBJ has been trying to gain competitive advantage in the cool-visualization area.

Inxight’s “federation” capability for search is pretty primitive (my term and opinion of course, not theirs). It takes in search result sets from various sources, then clusters and/or refilters them. What it does NOT do is the much harder task of taking actual relevancy rankings from various engines and somehow arbitrating between them. Nor, I’m guessing, does it even assign higher or lower weights to various corpuses or anything like that. Thus, it does not sound terribly competitive with the distributed search capabilities built into any state-of-the-art enterprise search engine.

ClearForest is being acquired by Reuters.That ClearForest is being bought is unsurprising.The company recently pulled in its marketing horns dramatically, a common sign of putting oneself up for sale.The Reuters move, meanwhile, can be seen as a sequel to the divestiture of its half of Factiva to former 50-50 partner Dow Jones.

If the two main parts of the text mining market are custom publishing and finding warning signs, then both could actually be a good fit with Reuters.The custom publishing part is obvious. As for early warning – well, maybe ClearForest will lose its competitive edge in consumer product warranty analysis or something, but a significant fraction of the early warning market is tied to news articles, web postings, and other things that are a good fit for Reuters.

But the really interesting (at least to me) possibilities arise in the core Reuters and Dow Jones business of supporting investment decisions. Read more

I tried to invite Jay Henderson so speak on the Text Analytics Summit marketing panel, but got no answer to my e-mail. The company phone directory didn’t work so well for him either. I sent e-mail to a general PR company e-mail address, and that didn’t get returned. And Ravi tells me he has had similar difficulties reaching them. Read more

Jay Henderson of ClearForest tells me that hedge funds are one of their more interesting growth areas. It’s about time.

I think a lot of the reason for investment firms not making more use of text analytics has been structural — Factiva, the (relatively speaking) mammoth joint venture of Reuters and Dow Jones, is forbidden by its parent companies from meeting investment firms’ needs. And that’s kind of a pity, as it’s probably the best-positioned firm to do so. It’s good to hear that the little guys are finally filling the gap.

So far as I can tell, Attensity’s strategy when the company was originally founded was rather like ClearForest’s strategy today – and vice-versa. That said, here’s where they seem to stand at this time:

Attensity wants to make text analytics very easy to integrate into business intelligence and data mining – at the moment, they’re not too focused on the differences between those two disciplines – and is trying to deliver the best possible fact extraction consistent with that charter.

ClearForest wants to provide really great information extraction — to the limits of what can be done without excessive knowledge engineering – and is trying to integrate as well as possible with other technologies, the better to serve the customers who need what they offer.

I talked again with Mark Logic, makers of MarkLogic Server, and they continue to have an interesting story. Basically, their technology is better search/retrieval through XML. The retrieval part is where their major differentiation lies. Accordingly, their initial market focus (they’re up to 46 customers now, including lots of big names) is on custom publishing. And by the way, they’re a good partner for fact-extraction companies, at least in the case of ClearForest.

Here, as best I understand, is the story of the custom publishing business. Read more