Symbols as word separators – a look inside the search engine logic

by Serge Bondar

Earlier or sooner, every search engine optimizer finds himself (or herself) standing in front of the question: what symbols are legal to use to split your key phrase into several keywords, what symbols are reserved as operators, and what separators are just overlooked by the SEs.

In order to settle down this matter, I have set up the experiment described below.

With the help of a simple password generator, I make up words that do not exist and do not make any sense. Queried with these terms, search engines must answer "did not match any documents". Even if those keywords are considered as typos and search engines suggest a correction (like "did you mean …"), they are not good enough for the experiment.

These unique keywords are then grouped into pairs, each pair is joined together with the help of a certain symbol, like underscore or dash. For instance:

erkrnfgieur_welinfweweiolyhyt-ytonihto

etc.

Then, these pairs of keywords are used on a test page which is published on the Web, and I wait for the search engine robots to come. After I make sure the pages have been crawled, I query the search engines for various combinations of unique keywords and separating symbols. If the search engine believes a character is a valid word separator, its response will contain both keywords independently. If this character is not a divider, search engine will only return the experimental page for the query "keyword1symbolkeyword2".

As it can be seen from the table below, MSN and Yahoo qualify all researched symbols as valid word separators. The responses to the query "keyword1symbolkeyword2" are identical to those for the query "keyword1 keyword2".

For Google, two characters are not dividers: underscore "_" and ampersand "&". The words separated by these symbols are not returned by Google, whereas the keyword combination containing this symbol is found by the engine.

Keyword

Google

Yahoo!

MSN

keyword1!keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1@keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1#keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1^keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1&keyword2

1

1

1

keyword1

-

1

1

keyword2

-

1

1

keyword1*keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1(keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1)keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1-keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1_keyword2

1

1

1

keyword1

-

1

1

keyword2

-

1

1

keyword1+keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1=keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1/keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1:keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1|keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1`keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1"keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1?keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1.keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1,keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1;keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1>keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1<keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1[keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1]keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1{keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1}keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1'keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

keyword1~keyword2

1

1

1

keyword1

1

1

1

keyword2

1

1

1

If some non-word character occurs at the beginning / at the end of the word, it is ignored by the search engines and they return the same response as if that symbol wasn't there. Yet there are two exceptions: the above mentioned "_" and "&" stop Google from finding words when appended.

Keyword

Google

Yahoo!

MSN

keyword1_keyword2

1

1

1

keyword1

-

1

1

keyword1_

-

1

1

keyword2

-

1

1

_keyword2

-

1

1

keyword1&keyword2

1

1

1

keyword1

-

1

1

keyword1&

-

1

1

keyword2

-

1

1

&keyword2

-

1

1

What kind of practical application could these results have? First of all, they resolve the concern of separating keywords in the URL, especially in the domain name. If the presence of keywords in the URL matters for the relevance calculation (and, as far as I notice, it does matter), the successful Google optimization depends on whether you separate your keywords in the URL with a dash "-" or an underscore "_". Thus, keywords in an URL like www.my-business.com are an additional ranking bonus when someone is searching for "my business", whereas www.my_business.com, www.mybusiness.com and other variants are not.

Many SEOs out there are misled by the fact that search engines highlight parts of the URLs in the SERPs, like www.myhappybusiness.com. Based on our research, I can state with confidence that this is nothing but Google's trick – all it does is highlighting the matches of character sequences inside a string. To calculate the relevance of a page, Google only takes the exact match of the key phrase written through a dash.

Our credits to the source/author of this article:

Author: Serge Bondar

Serge Bondar is a search engine researcher with a several years experience in SEO and millions of hits acquired for his customers' sites accumulated during his career. Since recently, he has been intensively contributing to the development of optimization advice for Web CEO optimization software. In his free time, he enjoys hiking, travelling and writing scientific papers devoted to life of the ants.