Do we need the semantic web?

What kinds of applications do we need a semantic web for? Is the semantic web practical? These questions (among others) were posed by Jamie Taylor of Metaweb Technologies to a group of panelists at the Text Analytics Summit last week. The panelists were no lightweights. They included Vladimir Zelevinsky from Endeca, Ron Kaplan from Microsoft, and Kathleen Dahlgren from Cognition. I found this to be one of the most engaging segments of the Summit.

First of all, many people define the semantic web as a “web of meaning” or a “web of data” that will allow computer applications to exploit the data directly. Check out the W3C webpage for more information about definitions. The panelists at the Summit got into an interesting discussion about parsing data sources for the semantic web. Here are a few of the highlights. Please note that I asked some additional questions after the panel, itself, so if you’re reading information you didn’t hear on the panel this is the reason.

What kind of applications is the Semantic Web good for? It depends what you want to know. For example, one of the panelists pointed out that you don’t need the semantic web to find a hardware store in Boston. However, more unique queries might require it. Most people have had the experience of knowing what they are looking for and using a five or six word query and still not finding it. The panelists pointed out that entities (people, places, things) were relatively easy to extract; it is the relationships between the entities that is harder. Vladimir Zelevinsky explained it like this in terms of information retrieval need/information retrieval technologies:

Known Item Search -> Keyword Search (e.g., Google – where you need to find what you know exists);

Unknown Relationship Search -> Semantic Web (where you are looking not for separate items in the repository, in this case the web, but for the connection(s) between them).

The semantic web could pay off in applications that require understanding the relationships between these entities. Ron Kaplan also noted that semantic web technology provides a standard way of merging data from different sources, and that will probably enable some useful new applications.

Scaling the semantic web. Everyone seemed to agree that manually tagging documents is a brittle exercise. Vladimir Zelevinsky from Endeca suggested putting a parser on each machine. He said that since you type slower than 1 sentence per second that at the moment of creation, semantics could be injected into the document. Of course, it is a bit more complex than this, but this was an interesting notion. Kathleen Dahlgren from Cognition said that NLP at scale was the wave of the future. NLP is complex but deeply distributed. Computers are getting faster and cheaper, and this can make it fast and scalable.

Is it practical? There is a huge amount of data out there and it keeps changing. There is also a lot of duplicate information on the web. Is it economically viable to think about parsing the web? Ron Kaplan said he had done a back of the envelope calculation using the following assumptions:

“The simple order-of-magnitude calculation goes as follows: There are roughly 2.5M seconds in a month, so an 8-core machine gives you 20M cpu seconds. If it takes 1 second on the average to process a sentence (an upper bound), then you can do 20M sentences per month. If a web page has on the average 20 sentences, you get 1M pages per month per machine. So, 1000 machines can do a billion pages per month. More if 1 second over estimates, less if 20 sentence/document underestimates.”

So this is economically feasible. If there is a need. And that remains the question. Is it economically viable and necessary to try to find the information in the long tail?

Post navigation

3 thoughts on “Do we need the semantic web?”

Up to the point summery of what was discussed in the panel. I think that the most appealing application of the semantic web is not really in finding the information in the long tail BUT in getting access to relevant knowledge in the stream of information we are exposed to in our personal and professional life. From this point of view I think it is already necessary.
You already came to the conclusion that it is economically feasible but the calculation in your post way over estimate this cost. A deep parser like, our platform Cogito, can process more than 120KB of text with a quadcore cpu (at least an order of magnitude faster than what stated by Ron Kaplan). So with the actual cost being significantly lower I guess that the semantic web is not only necessary but is is also “cheap” 🙂

This is an interesting summary. Some novel ways of generating semantics are now emerging that do not require hand or machine tagging. My favorite technique at this moment is what TipTop http://FeelTipTop.com appears to do. Spend a chunk of time on it and you’ll understand why their approach is the best one.