Often overlooked in treating data as either highly structured (databases) or completely unstructured (free text) is the fact that much useful business data is in “semi-structured” form: government filings, insurance claims, customer comment forms, etc. Semi-structured data are often documents broken up into free text fields. Individually, the content of these fields is highly variable, but given the context of the document, much more predictable than unstructured text. Without knowledge of that context, most search tools treat semi-structured data as free text, and are far less useful than they could be. But a little structure goes a long way: this talk will describe how, and show how semi-structured data can be interpreted, summarized, and applied to produce business value in several real-life examples.

Cindi Thompson

Deloitte Consulting LLP

Cindi leads the Text Analytics team in Deloitte’s Analytics Institute. She has over 12 years R&D and project management experience in industrial, consulting, and academic settings. Her research areas include text analytics, machine learning, and adaptive recommendation systems. She worked 4 years as a professor before joining PwC for 6 years as a Research Manager. Cindi has a PhD and MA in Computer Sciences from the University of Texas – Austin, and A BS in Computer Science from North Carolina State University.

Comments on this page are now closed.

Comments

HIROSHI MAEDA

02/13/2011 3:18am PST

It’s a great presentation. I got a lot of insights.Would it be possible to obtain the slide deck used for the presentation.