Search Engine Optimization Standards and Spam Discussion

Situation

Site 'A' is well written, content-rich, and exceptionally relevant for search keyword 'W'. Site 'B' is not as well written, not as content-rich, and is considered not as relevant. Site 'B' implements Search Engine Optimization (SEO) technology and a few borderline spam tricks and suddenly site 'B' outranks site 'A' for search 'W'. What this does is to lower the surfer satisfaction with the relevancy of the results from that search engine, certainly hurts the "user experience", and slaps the face of those working at the search engine company responsible for seeing that surfers actually see relevant content and are happy.

Is it any wonder that the search engines are tightening down on the "spam" rules? It is one matter to improve the quality, presentation, and general use of keyword phrases on a web page, and an entirely different matter to trick the engines into higher rankings without editing the site content.

It is clearly the position of the search engines that the role of the SEO practitioner is to improve the quality, quantity, clarity, and value of readily accessed content so that the search engines can select worthy sites based upon their proprietary relevancy factors. The SEO practitioner should help the search engines by making the sites more relevant and making relevant information clear and easy to access, and not by using spam techniques to artificially inflate the perceived relevancy of inferior sites. Don't disguise an inferior site - fix it. Don't make site 'B' appear to be more relevant than site 'A', actually make it more relevant.

While some search engines allow and reward off-page SEO technologies, such action is going to be short lived and of diminishing and exceptionally marginal benefit. Pages that are informative and contribute to the content, usability, and even the spiderability of any site will be increasingly rewarded and are the future of SEO.

For too long many SEO practitioners were involved in an "arms race", inventing more and more devious technology to trick the search engines and beat our competitors. With the aggressive anti-spam programs now emerging, the news is out -- if you want to get search engine rankings for your clients you have to play well within the rules. And those rules are absolutely "no tricks allowed".

Simply put, work on honest relevancy and win. All others will fade away.

NOTE: It was not always that way. In 2000 the "doorway page" was commonly used as a portal to dynamic content since the search engines were quite poor at locating and spidering that content. In fact, several major engines endorsed them, and some still allow them. Back in 2001 I even researched the impact of automated doorway generation software on my own site. Although there were some embarrassing errors in my code, the research proved that doorway pages could not outperform the same amount of effort in optimizing real site pages. In early 2002 the search engines did a reversal in their opinions of doorway pages, having improved their dynamic page spidering and indexing capabilities, and started enforcing the stance that doorway pages were spam. Several had previously made statements that doorway pages were spam, but now they were starting to enforce it. Today they are considered pure spam by most people. This shows you that what works today may not work tomorrow, and that if you play with fire you will regret it. [As a note, my "doorway" experiments of many years ago have continued to haunt me to this day even though everything I have done in the last 2 years is totally compliant with the rules as written by the major search engines. My advice is to always play in the center of the acceptable area and do not experiment with new ways to fool the engines and earn overnight rankings. In my case it really was research, but nobody cares.]

It is known that there are differing technologies and methodologies used by SEO practitioners. It is not the intent of a Code of Ethics to define HOW the code is met, but rather to objectively set the bounds of compliance. For instance, from an Ethics standpoint, it does not matter if you use Cloaking, Doorway, Hallway, Site Maps, or Shadow pages to optimize your site as long as the product meets the Code of Ethics. However, search engine acceptance will depend upon meeting these Codes plus SEO Standards appropriate (and today, unique) to each search engine. In general, if actions are in compliance with the Code of Ethics and are accepted by the search engines (thus obviously meeting their individual SEO Standards) then they are allowed.

But understand that things are changing, and that what is a trick allowed today will be blacklisted tomorrow. It is better to focus on "honest page" Search Engine Optimization than waste your time on something that will need to be abandoned soon.

These are general guidelines that may vary from search engine to search engine:

Keywords should be relevant, applicable, and clearly associated with page body content.

Keywords should be used as allowed and accepted by the search engines (placement, color, etc.)

Keywords should not be utilized too many times on a page (frequency, density, distribution, etc.)

Redirection technology (if used) should facilitate and improve the user experience. But understand that this is almost always considered a trick and is frequently a cause for removal from an index.

Redirection technology (if used) may not alter the URL (redirect), affect the browser BACK button (cause a loop), or display page information that is not the property of the site owner or allowed by agreement without sound technical justification (ie, language redirects).

Pages should not be submitted to the search engines too frequently.

NOTE: To be fair, each search engine must support at least the Robots Exclusion Standard. This is not always the case, but it should be.

Additional guidelines for a search engine or directory may further discuss "relevance", "spamming", "cloaking" or "redirection", usually as it relates to "user experience". In general, revising or adding content is good if it improves the user experience. This is the subjective area we must all interpret, and why rules change so often. Please refer to the individual search engine submission page for specific rules and guidelines.

The Players

There are three main players when it comes to Search Engine Optimization:

Clients - owners of the web site: Emphasis is on sales, holding users (sticky), and User Experience, with an emphasis on getting the visitor to take a desired action.

Search Engines: Emphasis is on providing a positive User Experience in great part through relevance (controlled by algorithms) and minimal negative impact as a result of "bait-and-switch" technologies.

SEO Firms: Obtain traffic for Client sites as a result of a search engine query. This involves understanding the SE ranking algorithm, beating the competing SEO firms optimizing other Clients for the same terms, and remaining within the "No Spam" boundaries (play just within the rules). SEO practitioners are paid by Clients (not Search Engines) and are rewarded for rankings at almost (until there is a risk) any price.

Unfortunately, if the rules change, sites may be dropped from SE's. If Algorithms change, sites may be lowered in the rankings. If competing SEO firms are successful in finding a new trick just within the rules, site rankings may fall. If new competing client sites enter the market, site rankings may drop. If the Client site uploads altered pages or changes server technology, site rankings may drop.

Processes

There are four main page-centric SEO processes used by search engine optimization firms:

Edit Client site pages: the revisions made to a client site's pages so that they may rank higher in the search engine. This is "honest" SEO work and involves editing real "honest" web site pages. This is the "bread-and-butter" of legitimate SEO firms and is the clear winner when it comes to obtaining meaningful and long-lasting rankings.

Man Made Pages: commonly a "doorway-like" technology (Shadow Page) that is keyword intensive and that if visited should present an "honest" site page. This is a labor intensive process where a copy of a real "honest" page is made, then that copy is altered to emphasize keywords found on the "honest" page (page presented). In some implementations, this page loads the page to be presented into a frameset, and some redirect. This is not to be confused with "web design" where additional content is added to the site and that is intended for human visitors. ANY man made page that is not intended for human visitors, no matter how great the content, is considered spam by all of the major search engines.

Machine Made Pages: commonly an "doorway-like" page where the content of the page is derived from other site content based upon keywords and compiled by a software tool. Some implementations generate pages using jibberish or templates that are easily detected by the search engines. This type of tool could literally generate thousands of additional pages in minutes. ANY machine generated page that is not intended for human visitors, no matter how great the content, is considered spam by all of the major search engines.

Cloaking: this is normally associated with sites doing IP and USER-AGENT serving where the internet server will present a page that will vary based upon the visitor characteristics. This technology is commonly used to present differing content to each search engine or browser, thus a search engine seldom sees the same content that is presented to a browser. ANY cloaked site that filters based upon whether the visitor is a spider or a human, no matter how great the content, is considered spam by all of the major search engines.

Editing Focus/Methodology

The primary methods used to improve search engine ranking are discussed on our site. This section lists a couple of areas that are affected by emerging standards, and thus are called-out for special notice.

Navigation: the use of links to encourage spiders to locate content within the web site, and to support "popularity" algorithms.

Content: the inclusion or focus on words, phrases, and themes associated with search engine query strings.

Transfers: pages that display (or transfer) to a real "honest" page. These pages commonly are keyword-rich and theme-based for search engines, yet provide a reformatted page for the browser (very much like a standard Frames implementation in conjunction with a search engine optimized no-frames area). This includes URL-switching where the displayed page has a browser address that is different from the URL in the search engine link (thus is a redirection), or where the browser back button causes a "loop".

Bad Practice Issues

It is clear that there are many different opinions about what constitutes a bad organic (natural) search engine optimization practice: "spam" and "cloaking" seem to be the leaders. I present these items as generally accepted BAD practices and encourage others to submit ideas for this list as well. Some of these SEO practices were once accepted by the search engines, but have become "bad" over time as the search engines have evolved to combat their individual notions of "spam".

Transparent, hidden, misleading, and inconspicuous links -- the use of any fully transparent image for a link, the use of hidden links (possibly in DIV/LAYERs), any link associated with a graphic without words / symbols that can be interpreted as even remotely representing the effect of taking the link, or inconspicuous links like 1x1 pixel graphics or the use of links on punctuation (<a href=link> </a><a href=real-link>real words</a><a href=link>.</a>) would be "spam" and a cause for removal from a search engine index.

"Machine generated" pages - Unconditionally spam. There are products on the market that make such pages unnecessary in any case.

Cloaking - this is a very deceptive process in all circumstances unless there is no impact (deletion, formatting, or insertion) on content delivered to the visitor different than to the search engine spiders. Where the stated objective of the tool [filtering by IP number or User Agent] is to facilitate the delivery of differing content based upon visitor/search engine identification processes the implementation of cloaking technology is considered BAD. Although not all engines can detect cloaked sites, and some may choose to allow it, cloaked sites are considered spam in most cases. Google has stated that they have a tool that can detect such pages and is removing cloaked sites from their index where deception is involved.

Spam is an even broader topic and runs from white-on-white to overloading the web with "free web pages/sites developed to provide volumes of links to a site to boost popularity". I think that this category needs significant definition, but it is the most easily defined in "black and white" rules.

A new area of spam involves "external" factors such as sites with numerous, unnecessary host names, excessive cross-linking of sites to artificially inflate perceived popularity, and the inclusion of obligated links as part of an affiliate program.

What the engines think is spam

Google: "Basically, Google's position is that we prefer no hidden links, no hidden text, no automatic tools used for positioning, and no cloaking. We prefer that Googlebot get the exact same page that users see. In general, you can assume that we're as conservative as possible. We don't like hidden links/text in divs/layers/iframes/css, or links that are inconspicuous or punctuation, for example. Similarly, we don't like cloaking or sneaky redirects in any form, whether it be user agent/ip-based, or redirects through javascript, meta refreshes, 302's, or 100% frames." Google also states that "We will not comment on the individual reasons a page was removed and we do not offer an exhaustive list of practices that can cause removal. However, certain actions such as cloaking, writing text that can be seen by search engines but not by users, or setting up pages/links with the sole purpose of fooling search engines may result in permanent removal from our index." More details are available at: Guidelines and SEO Issues. Also, for copyright violations use the Google DMCA page.

Inktomi: "Inktomi considers spam to be pages created deliberately to trick the search engine into offering inappropriate, redundant, or poor-quality search results."

Yahoo!: "...don't use keywords that don't match the content of your page; don't list keywords for the sake of listing keywords; and don't repeat the same keyword in your meta tags." More details are available at: Spam Policy.

AltaVista: A great many issues here that are common to the above, but they have a spam discussion note that:

"Our anti-spamming techniques are designed to find sites or subnets that submit large numbers of pages with essentially the same content, or that lead to the same content. A Web site that attempts to manipulate search results may be blocked from the AltaVista index. We are constantly working to maintain the highest quality index on the Web. Attempts to fill AltaVista's index with misleading or promotional pages lower the value of the index and make the search experience frustrating and inefficient. A poor search experience hurts everyone, large and small businesses and nonprofits as well as searchers.

Here are some specific examples of manipulation that may cause us to block a site from our index:

Pages with text that is not easily read, either because it is too small or is obscured by the background of the page,

Pages with off-topic or excessive keywords,

Duplication of content, either by excessive submission of the same page, submitting the same pages from multiple domains, or submitting the same content from multiple hosts,

Machine-generated pages with minimal or no content, whose sole purpose is to get a user to click to another page,

Pages that contain only links to other pages,

Pages whose primary intent is to redirect users to another page.

Attempts to fill AltaVista's index with misleading or promotional pages lower the value of the index for everyone. We do not allow URL submissions from customers who spam the index and will exclude all such pages from the index.

Summary

Sites that are not in "compliance" have already started to be filtered from the search engine indexes, and many more are sure to follow. Should all search engines see the opportunity this offers and to (in unison) enforce the same standards, then a great many web sites will be scrambling for "honest" SEO firms to optimize their sites. Likewise, this poses an opportunity for SEO practitioners to "ride this wave" and to set a standard for the future.

I encourage all that read this to be vocal with their staff, clients, and SEO providers on this trend and to work towards compliance.

HQ Hours of Operation:
8:30am to 5:30 pm Pacific timeDays of Operation:
Monday through Friday – email works other times in many casesSupport Operations:
M-F 9:00 to 5:00 Email Support FormTraining Facility:
Please see the training facility map