The Moz Blog

Web Semantics: What do we mean by 'Semantic Web'?

The author's posts are entirely his or her own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

You ever get into an argument with someone who didn't understand, or at least didn't agree with, your point of view, and they came back and said, "You're just arguing semantics"? Of course you have. You've probably used that little quip yourself. Most people don't realize how stupid we look when we say something like that (because one does not "argue semantics" -- but I've said it myself).

Now look at how often people mention the Semantic Web in various SEO contexts. Most SEOs have no idea of what they are talking about when it comes to this subject. I am no expert on the topic, either, but I know enough to understand that there are 2, possibly 3 relevant meanings for the expression. Most SEOs are thinking of the much ballyhooed and not-yet-implemented Latent Semantic Indexing concept when they use the expression "semantic Web". Latent Semantic Indexing, also called Latent Semantic Analysis, is an SEO buzz expression that caught fire last year.

The idea behind the application of Latent Semantic Indexing (often abbreviated as LSI) in Web search is that a search engine will parse a document, find all the nouns and verbs, and then associate them with related (substitution-useful) nouns and verbs. But the search engines will supposedly do this in a reasonable (if not always appropriate) context. For example, let's say you create a Web page discussing the care of hats. LSI, in theory, would help searchers using the query "cleaning felt caps" to find your page optimized for "care of hats".

Unfortunately, the technology does not yet exist to enable the search engines to do that kind of associative indexing. In fact, it would be more appropriate to refer to the process as "associative indexing" because that is really what we are talking about (in this context). The closest we have come to associative indexing in today's search engine technology is stemming, where words are indexed on the basis of their uninflected roots (plural forms, adverbial forms, and adjectival forms are reduced to their simplified noun and verb forms before indexing).

Many SEOs, overwhelmed by the madness of link passion, have incorrectly inferred that they are inducing a form of associative indexing through link anchor text. What they don't understand is that the anchor text of inbound links is virtually (that is, the search engines pretend the anchor text is) appended to the documents being linked to. So, if I create a Web page that says, "Michael is a gorilla SEO" and you link to it with the words "Michael is a foxy hound", you are not inducing the search engines to associate "foxy hound" with my name. All you are doing is appending your words to my words.

The connection between my name and "foxy hound" is induced by the proximity of the words in your anchor text. Technically, you get a weak proximic association if you just link to my page with "foxy hound". After all, my name is on my page and you just appended your text to my page. But proximic association, strong or weak, is not the same as associative indexing. We can also describe associative indexing as substitutive indexing. That is, we are indexing words which can be replaced by substitutions which preserve the original or nearly preserve the original context.

The words "dog" and "canine" are usually good substitutional equivalents. But if you search for "dog" pages, you won't find any "canine" pages that don't also use the word "dog" (either in on-page copy or inbound link anchor text). The same case holds true for "cat" and "feline". If the search engines were truly implementing LSI we would be able to substitute these words for each other and get the same or nearly the same results. The broad dissimilarity in search results for such closely semantically related words tells us that the implementation of Latent Semantic Indexing is just another SEO myth.

The Semantic Web, however, is not an SEO myth. It's a very real concept, but one which is only still in the formative phase. The standards for Semantic Web technology are still being worked out, and Webmasters like you and me are expected to play our part in constructing the Semantic Web. In the meantime, we can continue to play with words and expressions like weak semantic bonding but we're talking about many different topics.

The Semantic Web does not yet exist -- not in the broader sense that we're working in it and building it out. Most SEOs need to look at semantics from the human perspective, in terms of analyzing user search patterns, how to develop copy for on-page and off-page (primarily directory descriptions and link anchor text) content, and all aspects of basic keyword research. Understanding the context people use words in is vital to determining how to optimize for those words.

We can refer to the results of our work as a sort of Proto-Semantic Web, in that we're applying some principles of semantic bonding and semantic identification without integrating the underlying structure that the true Semantic Web will one day require -- and without the benefit of the tools that the search engines are developing or have yet to develop. As I promised a few months ago, I'll spend some more time on the subject of semantic bonding -- as I define it, not as anyone else defines it -- in my next couple of blog entries.

11 Comments

Wait. Hold on.
It's still about links and content right?
Someone notify me when it stops being about links and content.
:)
Sorry, but I feel like a lot of the jargon and analysis is really a protective barrier us SEOs use to justify ourselves - and I applaud every time someone stops and cuts through the jargon with simple explanations.
kudos to a great post.

Sorry Michael, but in this case you are 100% wrong.
Documents have ranked without having relevant anchor text or relevant page content.
Why was MSN ranking Traffic Power's site #5 for "Aaron Wall" for a while when (at that time) none of their links or indexed page content had my name on it?
If they are not using LSI they are using similar technologies, and I have seen pages rank for a variety of terms that they could only be ranking for via engines understanding some word relationships.

Why was MSN ranking Traffic Power's site #5 for "Aaron Wall" for a while when (at that time) none of their links or indexed page content had my name on it?
MSN continues to display absolutely no indications of LSI-induced rankings, Aaron. All the standard tests show complete lack of association between related concepts.
Since LSI is too complex for any search engine to implement with today's available technology, there is little point in continuing to argue that it must be in use despite what the experts have to say on the subject.
But feel free to point out where Dr. Garcia and his associates are contradicted in the technical literature. I will always be glad to review it.
And let me point out once again that this post was not about LSI. I mentioned LSI because people in the SEO industry inevitably (and needlessly) associate all semantic discussion with LSI.

I think that Dr. Garcia and some other IR-knowledgable folks have largely dismissed the idea that LSI/LSA is being used by the major commercial search engines. Their reasoning is that the SEs have less computationally expensive ways to make connections between words and meanings.
However, I think this is a great post - the semantic web as Tim Berners-Lee would like it to exist relies on a lot of effort from webmasters, but should it be successfully implemented (which, IMO is a long shot), it would prove to be of great value.

Great article. Great stuff. This is a useless yet happy post: I loved the article! Helped me understand what "stemming" meant in French ;) Thanks.
Oh, and lots of stuff about LSI and web semantics. Looking forward your next article!

I did mention context, but the purpose of this entry is not to discuss LSI (which has been needlessly dicussed to death in the SEO community).
I am simplifying wherever possible in order to train myself to write concise blog entries.

Nice to see some good, easy to understand info on these topics. Thanks Michael.
Can you give us a few words on what all of this means to the webmaster of an Info or Retail website? How can I use this to get more traffic or more relevant traffic?
Thanks again!

I think you oversimplified LSI. LSI allows for the mapping of relationships between concepts. A simple synonym based substitition search would be much less effecieint because it wouldn't return the proper results based on the context of the search.