Text analytics application areas typically fall into one or more of three broad, often overlapping domains:

Understanding the opinions of customers, prospects, or other groups. This can be based on any combination of documents the user organization controls (email, surveys, warranty reports, call center logs, etc.) — in which case — or public-domain documents such as blogs, forum posts, and tweets. The former is usually called Voice of the Customer (VotC), while the latter is Voice of the Market (VotM).

Detecting and identifying problems. This can happen across many domains — VotC, VotM, diagnosing equipment malfunctions, identifying bad guys (from terrorists to fraudsters), or even getting early warnings of infectious disease outbreaks.

For several years, I’ve been distressed at the lack of progress in text analytics or, as it used to be called, text mining. Yes, the rise of sentiment analysis has been impressive, and higher volumes of text data are being processed than were before. But otherwise, there’s been a lot of the same old, same old. Most actual deployed applications of text analytics or text mining go something like this:

A bunch of documents are analyzed to ascertain the ideas expressed in them.

A count is made as to how many times each idea turns up.

The application user notices any surprisingly large numbers, and as result of noticing pays attention to the corresponding ideas.

Often, it seems desirable to integrate text analytics with business intelligence and/or predictive analytics tools that operate on tabular data is. Even so, such integration is most commonly weak or nonexistent. Apart from the usual reasons for silos of automation, I blame this lack on a mismatch in precision, among other reasons. A 500% increase in mentions of a subject could be simple coincidence, or the result of a single identifiable press article. In comparison, a 5% increase in a conventional business metric might be much more important.

But in fairness, the text analytics innovation picture hasn’t been quite as bleak as what I’ve been painting so far. Read more

A new Attensity Group has been created in a complex set of maneuvers. So far as I understand or guess, elements of the deal include:

The Attensity Group is being formed by the merger of three companies: Attensity, empolis, and Living-e. Frankly, I’d never heard of either empolis or Living-e until this merger. (In case you ever have to resort to the Wayback Machine, embolis’ URL was http://www.empolis.com/home.html and Living-e’s was http://www.living-e.com/us/index.php)

Existing investors (employees aside) have largely been bought out. Most of the stock is owned by Aeris, an investment vehicle for SAP co-founder Klaus Tschira. Living-e already was a Tschira investment.

Inxight managers have been brought in to run the whole thing. Specifically, Ian Bonner will be CEO, and Ian Hersey will be EVP of Products and Technology.

The former CEOs of Attensity and empolis will run the Americas and EMEA regions, under the Attensity and empolis names respectively, apparently with their prior sales organizations more or less intact.

A former CEO of Living-e will be their boss, but also run “Special Projects”, which adds up to a very odd title indeed: “Senior Vice President of Operations and Strategic Projects, Attensity Group”

The former CTOs of Attensity and empolis are CTOs of system software (“Natural Language Processing”) and application software respectively. This gets Attensity’s total CTO count up to 3, a level I’ve previously seen only at Teradata. I haven’t talked with David Bean yet, but his colleagues insist that he’s excited about his new role.

This whole deal has been underway since at least late last year. For example, Ian Bonner has been involved for that long. empolis and Living-e announced the pooling of their sales forces back in February.

Technically, the merger isn’t complete, as Living-e is a public company and all 100% of its shares haven’t been acquired yet. (But they will be Real Soon Now.)

Attensity, of course, was a venture-backed private company, with tired investors. empolis was owned by Bertelsmann, and was itself a roll-up of several smaller text analytics companies.

I was told on the phone empolis was doing something like €30-40 million. Attensity and Living-e were under $10 million each. That surprises me a bit, as I thought Attensity was in that range on commercial business alone, and was doing more than $10 million counting its government accounts.

It turns out that if I had been paying attention to the news filters I could have seen this coming. Read more

*In retrospect that was a silly comment, made soon after midnight while humans were generally either partying or asleep. But it’s the set-up for the rest of this post.

Sheer self-indulgence aside — “Happy Birthday To Me!!” — I see something blogworthy in that. Indeed, it reflects the emergence over the past 6 months or so of one particular Twitter community. Takeaways include: Read more

I had a brief chat with the Attensity guys at their Teradata Partners Conference booth – mainly CTO David Bean, although he did buck one question to sales chief Jeff Johnson. The business trends story remained the same as it was in June: The sweet spot for new sales remains Voice of the Customer/Voice of the Market, while on-premise/SaaS new-name accounts are split around 50-50 (by number, not revenue).

David’s thoughts as to why the SaaS share isn’t even higher – as it seems to be for Clarabridge* – centered on the point that some customers want to blend internal and external data, and may not want to ship the internal part out to a SaaS provider. Besides, if it’s tabular data, I suspect Attensity isn’t the right place to ship it anyway.

*Speaking of Clarabridge, CEO Sid Banerjee recently posted a thoughtful company update in this comment thread.

When I challenged him on ease of use, David said that Attensity is readying a Microstrategy-based offering, which is obviously meant to compete with Clarabridge and any of its perceived advantages head-on.

I chatted a bit with Attensity’s CTO David Bean and sales VP Jeff Johnson yesterday at the Text Analytics Summit. Jeff confirmed what has colleagues had already told me — most of the action is now in Voice of the Customer/Market, he expects a very strong June quarter, etc. But one thing I posted last week wasn’t quite right. Hosted implementations (i.e., SaaS) haven’t yet reached the 50% level at Attensity. However, they are indeed growing fast, and they’re all (or almost all) in the Voice of the Customer/Market area.

Jim D. of UPS asked in the comment thread to the recent Attensity update post how one should decide between Attensity and Clarabridge. I wrote an answer, and then decided to just split it out in a separate post. Here are five ideas about how to pick between Attensity and Clarabridge for the kind of Voice of the Customer/Market application both companies are focusing on.

1. Attensity is the older company than Clarabridge, and is good at more things. Is Clarabridge really good at everything you want them to be?

2. In particular, Attensity has more overall sophistication at linguistic extraction. Do any of the differences matter to you?

3. Both companies are working hard on ease of use, for multiple kinds of user (business user tweaking linguistic rules, IT user, etc.). Whose approach and feature set do you like better?

4. Usually, buying one of these products involves some professional services. Whose organization do you like better?

5. Attensity’s default database schema for its exhaustive extraction is pretty flat and normalized, as befits a happy Teradata partner. Clarabridge’s is more of a star schema, as befits a bunch of ex-Microstrategy guys. Either can be straightforwardly translated into the other, so you may not care — but do you?

One of the major dilemmas facing a group of people we all know is: How can humanities majors make money? Sure, they can become lawyers. And they can join the tech industry and write documentation. But what else?

Well, what about text analytics? Much of what I know about natural language processing (NLP) I learned from my friend Sharon Flank, who I met when she was a Slavic Linguistics PhD student at Harvard. My partner in first figuring out search engines — and later in running Elucidate — was my wife Linda Barlow, a 15-times-published novelist who’s also taught English at the college level. And Olivier Jouve’s education is in paleontology, although whether or not that’s a humanity is a sort of borderline definitional issue.

So I ask you all: Is text analytics a fruitful area for humanities majors to find lucrative careers? All insight would be appreciated. If the news is good enough, I’ll do my part in publicizing it to university placement offices and the like. Read more