The YouMoz Blog

Semantic Search: A New Context

This entry was written by one of our members and submitted to our YouMoz section.The author's views below are entirely his or her own and may not reflect the views of Moz.

Semantic search is actually just one facet of what’s possible using semantic technology. There are many more uses and implementations that are generally not discussed and frequently passed over altogether.

This doesn’t mean that they are any less valid, it’s just that the companies that are developing these technologies for the most part are primarily search engine companies looking to apply it to online applications over gargantuan databases of millions, if not billions, of websites.

Let’s have a look at some other practical uses of the technology that think slightly further out of the box.

404 error pages

How irritating is it when you hit a 404 page for an article which would have contained everything about the subject you were looking for? It’s very annoying, and the chances are it hasn’t been deleted forever but just moved and then not yet re-indexed (if it ever will be). If the article is very old then it can be extremely difficult to find in a website with a mass of content.

By using semantic technology we can do a number of things to aid that lost user. If the url has been rewritten to include the title and the referring page contains good content, we can come up with the most likely pages the should have been directed to. We simply compare the referring content and the referring link against the website’s database or XML site map. This helps to ensure that your users, even if lost, will rarely not find what they’re looking for.

Statistical Analysis

Increasingly, content is being tagged and the structure of content is improving thanks to the advent of web 2.0’s social standards and astute web masters/SEOs. There is in fact a veritable goldmine of data which is available for analysis by your website or blog statistics packages. Is it being used, though? Not so much (obligatory Borat quote dealt with).

Think of the data generally gathered by your statistics package: Referring website pages and search engine referrals, with the keywords of the query used.

These sources are both rich for use in semantic relationship analysis. The referring links are likely going to be from articles or opinion pieces of some type, whilst the search engine referrals will include the search query that was used by the user to find your page.

If this data is properly focused, we can show not just where your traffic is arriving from but what your traffic is arriving from. We can use the referring pages and search engine queries to focus on the context of the referring pages, the keyword densities, and break down traffic into categories and focuses. In the simplest case we can suggest the proportion of negative to positive response traffic. Tagging your articles and selecting keywords for SEO can be greatly eased by looking at this data and seeing what areas already perform well, and strengthening those. I believe there are many other uses in this area, but, as always, I want my readers to think a bit for themselves and come up with other possibilities. The point is that knowing the context your traffic puts you in is an invaluable resource.

DySeTagging (Dynamic Semantic Tagging [Dice - Tagging])

Dice Tagging is kind of a joke, I’m making up crap acronyms for fun because terms like Web 2.0 tend to make me cringe (yes I realize I’ve used it).

Anyway, this Dice stuff is clever. There are reportedly a number of groups working on something similar to what I’m going to talk about – including DARPA (the US's Defense Advanced Research Projects Agency). The premise is that the web server itself has a semantic module, and on the load of any web page or document it analyses the context of the page and generates tags to define it, which are then added to the header information.

This saves a lot of load on the poor search engine at the other end, on you at your end, and enables anyone to be responsible for their own tagging systems rather than having them assigned to you by an illiterate engine programmed by a kid on an OLPC.

So.. what?

Make up your own mind. As usual, I’m trying vainly to ignite some sparks in other developers and thinkers out there who can take the technology where it needs to go. I wish I had the time to spend on all the projects I thought of, but unfortunately I don’t, which is half the reason I have this blog now. A lot of what I write is playing Devil’s Advocate and is meant to produce a reaction! So, please give me some :)

The COO at the firm I work for has developed a program that analyzes semantics along with a bunch of other points of data (in bound links, keyword density, meta and title tags, anchor kwd, etc and so on). We are able to measure our client's scores for a given page and compare them to other pages for the keyword or key phrase they are trying to optimize for. We also have a tool that helps me as a copywriter measure the semantic score as I'm writing the content.

So we're very much believers in semantics, though really the most important factors for a page change depending on the landscape of the particular search in question.

My boss is working to make the software available for people to subscribe to. There's also possibilities of certain parts of the software being made available through partnerships with more established tools, but obviously I can't talk about that until it's final.