June 27, 2008

This Is Month 10 Of Our Company, Things Are Moving Faster And Fatser. If Things Look Too Smooth That Means We Are Not Moving Fast Enough.

Here’s a picture of one of our teams in action (L2R, director of usability and information design Jeff, director of strategy Edwin, head of technical development Souvik and head of new business Nick). Very Hugo Boss looking, which doesn’t happen every day. I think we will be running out of space soon. Every month we’re adding new people (5 this month), Each one of them is carefully picked and must be ir has potential to be the best in their respective discipline. It is the only way to build a great company. That way, I will never have to worry about mediocrity because the system will reject it. Our policy is to maintain ‘deadweight free’ so we can move fast and smart. I hope we can continue to remain agile when if we reach 100 people. The office is getting a lot of good energy lately, just good 'chi' everywhere.

One of our current projects requires us to look at semantics search. It is a very important area and for us is how that would change social media. Semantics search is hot but many are confused of what it is. It is basically a smarter search that allows a system to discern if there is a link between two words, such as money and happiness. We all know that the relationships between the two are complex and both direct and indirect, and can be categorized as such. Our brain seems to comprehend this easily (may not be so in this example), but for a computer it is a very difficult to understand it. We are all trying to solve this puzzle.

So how does semantics search allow technology to refine its search capability? It can automatically place pages into dynamic categories, or tag them without human intervention. Knowing what topic a page relates to is key for returning relevant content. It can offer related topics and keywords to help you narrow your search. Instead of offering you all the related keywords, it directly incorporates them back into the search with less weight than the user inputted ones. I am sceptical whether this will produce better results or just more different ones.
If the engine uses statistical analysis to retrieve it’s semantic matches to a keyword then it’s likely that keywords currently associated with hot news topics will bring those in as well.

A semantic search engine generally takes the sense of a word as a factor in its ranking algorithm or offers the user a choice as to the sense of a word or phrase. (This is not the same as what is known as Semantic Web which requires a lot of more human labor to tag everything) Can technology tells what is the content of a text? Or to be more precise: what are the core concepts which must be identified in order to grasp the essential meaning of a text? Or what is its emotion or tone? Or its credibility whether it is an opinion or results of research?
The challenge is to locate this central core of the text that holds the essentials of its meaning. This is step one before any attempt at interpretation. Generally only a very small proportion of these text - constitute the foundations of the concepts, if they were removed, the "textual construction" would collapse totally, and the meaning would be lost.

Content analysis consist of the following four steps:
1/ the identity of the main actors,
2/ the relations they have to each other,
3/ the hierarchy of these relations,
4/ the tone and style of the words.

Microsoft is moving fast on this and is rumored to buying Silicon Valley semantic search engine Powerset. The purchase price is rumored to over $100 million although no announcement has been made. Powerset has developed a technology that trys to understand the full meanings of phrases you type in while searching, and it returns results based on that understanding. Microsoft’s strategy is to close the gap with Google’s search engine.
Google was quick to dismiss Powerset’s semantic, or “natural language” approach as being only marginally interesting. Behind the scene Google has hired a team of semantic specialists to work on that approach. This is the next race for search and I think the winner will not be Microsoft or Google.

Google’s search results are still based primarily on the individual words you type into its search bar, and its approach does very little to understand the possible meaning created by joining two or more words together.
Powerset’s technology is still controversial because, while sexy in theory, the natural language approach is difficult to pull off in practice. Many wonder if the technology can ever be developed enough to be useful within a major search engine. Powerset was valued at a $42.5 mm after the first round of financing just two years ago. Investors expected that the technology it acquired from PARC would help this into a killer app. However, the technology has taken longer to develop than expected. Powerset has a high burn rate, and so a purchase at $100 million is considered a safe result for an area that is also seeing increasing competition (players such as Hakia, Twine and TextDigger are all using similar approaches). I need to dig deeper into Hakia and TextDigger in the coming months and will share my thoughts here.

Comments

This Is Month 10 Of Our Company, Things Are Moving Faster And Fatser. If Things Look Too Smooth That Means We Are Not Moving Fast Enough.

Here’s a picture of one of our teams in action (L2R, director of usability and information design Jeff, director of strategy Edwin, head of technical development Souvik and head of new business Nick). Very Hugo Boss looking, which doesn’t happen every day. I think we will be running out of space soon. Every month we’re adding new people (5 this month), Each one of them is carefully picked and must be ir has potential to be the best in their respective discipline. It is the only way to build a great company. That way, I will never have to worry about mediocrity because the system will reject it. Our policy is to maintain ‘deadweight free’ so we can move fast and smart. I hope we can continue to remain agile when if we reach 100 people. The office is getting a lot of good energy lately, just good 'chi' everywhere.

One of our current projects requires us to look at semantics search. It is a very important area and for us is how that would change social media. Semantics search is hot but many are confused of what it is. It is basically a smarter search that allows a system to discern if there is a link between two words, such as money and happiness. We all know that the relationships between the two are complex and both direct and indirect, and can be categorized as such. Our brain seems to comprehend this easily (may not be so in this example), but for a computer it is a very difficult to understand it. We are all trying to solve this puzzle.

So how does semantics search allow technology to refine its search capability? It can automatically place pages into dynamic categories, or tag them without human intervention. Knowing what topic a page relates to is key for returning relevant content. It can offer related topics and keywords to help you narrow your search. Instead of offering you all the related keywords, it directly incorporates them back into the search with less weight than the user inputted ones. I am sceptical whether this will produce better results or just more different ones.
If the engine uses statistical analysis to retrieve it’s semantic matches to a keyword then it’s likely that keywords currently associated with hot news topics will bring those in as well.

A semantic search engine generally takes the sense of a word as a factor in its ranking algorithm or offers the user a choice as to the sense of a word or phrase. (This is not the same as what is known as Semantic Web which requires a lot of more human labor to tag everything) Can technology tells what is the content of a text? Or to be more precise: what are the core concepts which must be identified in order to grasp the essential meaning of a text? Or what is its emotion or tone? Or its credibility whether it is an opinion or results of research?
The challenge is to locate this central core of the text that holds the essentials of its meaning. This is step one before any attempt at interpretation. Generally only a very small proportion of these text - constitute the foundations of the concepts, if they were removed, the "textual construction" would collapse totally, and the meaning would be lost.

Content analysis consist of the following four steps:
1/ the identity of the main actors,
2/ the relations they have to each other,
3/ the hierarchy of these relations,
4/ the tone and style of the words.

Microsoft is moving fast on this and is rumored to buying Silicon Valley semantic search engine Powerset. The purchase price is rumored to over $100 million although no announcement has been made. Powerset has developed a technology that trys to understand the full meanings of phrases you type in while searching, and it returns results based on that understanding. Microsoft’s strategy is to close the gap with Google’s search engine.
Google was quick to dismiss Powerset’s semantic, or “natural language” approach as being only marginally interesting. Behind the scene Google has hired a team of semantic specialists to work on that approach. This is the next race for search and I think the winner will not be Microsoft or Google.

Google’s search results are still based primarily on the individual words you type into its search bar, and its approach does very little to understand the possible meaning created by joining two or more words together.
Powerset’s technology is still controversial because, while sexy in theory, the natural language approach is difficult to pull off in practice. Many wonder if the technology can ever be developed enough to be useful within a major search engine. Powerset was valued at a $42.5 mm after the first round of financing just two years ago. Investors expected that the technology it acquired from PARC would help this into a killer app. However, the technology has taken longer to develop than expected. Powerset has a high burn rate, and so a purchase at $100 million is considered a safe result for an area that is also seeing increasing competition (players such as Hakia, Twine and TextDigger are all using similar approaches). I need to dig deeper into Hakia and TextDigger in the coming months and will share my thoughts here.