Dave Ferrucci, lead architect for UIMA, shared some detailed views with me about UIMA adoption. WIth his permission, they are reproduced below. UIMA is still not getting a lot of attention from commercial text analytics vendors, but ultimately I think it will prevail, if just because nobody cares enough to start a war of dueling alternative standards.* So it’s something you should educate yourself about as it progresses.

*And IBM plans to convince me ASAP that even that assessment is too negative, which it well may be. Stay tuned.

So to sum up — 1. We seem to have fair amount of traction with the UIMA framework by communities that are very interested in plug-n-play with components from other providers. This includes the government, life sciences and research communities. 2. The UIMA standard, as opposed to the specific Java Framework implementation, developed under an SDO will broaden the opportunity and strengthen the case of adoption of UIMA as a standard for text and multi-modal analytics that allows interoperability across different frameworks and applications. It would of course be the case that the Java UIMA Framework would comply to the standard.

The complete email follows.

Curt,
Hi. While, we can’t really speak for the vendors, the adoption story is ongoing and to fully appreciate it I think it best to consider it in a bit more depth.

First is adoption of the UIMA Java Framework which we have posted in binary form (as part of an SDK) on the IBM alphaworks site in late 2004 (http://www.alphaworks.ibm.com/tech/uima) and then early this year posted the source on source forge (http://uima-framework.sourceforge.net/). On alphaworks we get a rough average of a couple hundred downloads/month from government, academia and industry. On sourceforge we also get a similar average although it seems to be tampering off of late. What ALL these folks are doing with the framework, we do not know. The forum on alphaworks is moderately active; there hasn’t been as much activity on the source forum so far. We see a lot of use of the UIMA SDK (which includes the Java Framework) by government, universities and research institutions/programs that are not in the business of selling a specific application but rather in the business of creating/customizing their own solutions. From these communities we see more activity on the alphaworks SDK forum, requests for talks, tutorials and white papers and involvement in large collaborative projects using UIMA. This makes sense to us. This is where we expect to see early adoption of the framework. Traditionally these communities do not see their value-add or core competency in developing infrastructure. Rather they want to spend their time on the analytics, task models, integration and solution level stuff. They are also more likely to experiment with 3rd party analytics because they are not focused on competing at the component level, but rather on solving their core problems, often in a collaborative environment. Building adoption here for an interoperability framework, I think, is a good first step.

It appears that text analysis vendors tend to build on their own internal frameworks, which have been in production for a some time and are intimately tied to their applications. Also, they may tend to consider their analytics a corner stone of their competitive advantage and therefore may not be prone to share them or suggest that someone else’s are better. Switching over to a different, externally provided, pluggable framework may not be a top priority. It is reasonable that over time vendors may adopt the UIMA Java framework as part of their internal implementation, but that depends on technical issues surrounding the cost/performance trade-offs relative to maintaining their current implementations and their interest in reusing 3rd party analytics. Vendors may be more immediately motivated to partner with IBM in the creation and/or use of a standard (which I will say more about below). Their hopes are to use the standard to better enable opportunities for strategic partnerships and to find more channels for their technology. The standard enables, for example, network or service-level interoperability across frameworks and applications not necessarily requiring a deeper implementation commitment.

Second is adoption of a UIMA standard for interoperability that accommodates different implementations/applications. We are considering the creation of an external working group under a Standards Development Organization (SDO), to define a standard specification for UIMA. This is independent of the open-source Java Framework implementation and specifies the data representations and abstract interfaces for defining compliant data and for communicating between text (and multi-modal) applications over, for example, a network protocol (e.g., SOAP). It will specify how to encode analysis data, how to publish network services that do annotation etc. We expect that this will be an attractive adoption point for communities that want to interoperate but not necessarily change their internal implementation and development environments. Perhaps the larger portion of text analysis vendor community fall into this camp.

So to sum up — 1. We seem to have fair amount of traction with the UIMA framework by communities that are very interested in plug-n-play with components from other providers. This includes the government, life sciences and research communities. 2. The UIMA standard, as opposed to the specific Java Framework implementation, developed under an SDO will broaden the opportunity and strengthen the case of adoption of UIMA as a standard for text and multi-modal analytics that allows interoperability across different frameworks and applications. It would of course be the case that the Java UIMA Framework would comply to the standard.

[…] existed for decades and explored to finer details by the research community. With the increasing adoption of enabling frameworks like UIMA, it is now easy to develop scalable solutions using Advanced […]