How Google Sees Your Website

Given that this is a site with an awesome tool for viewing a website (as a search engine would), it seemed only fitting that we should take things a step further. Many in the search optimization world believe they 'get it', but one can't be too sure.

A lot of the time we hear about things like 'links', 'meta tags' 'title' elements. But that's really a limited view of what search engines (like the almighty Googly) are doing when they assess your site. I thought it would be an interesting exercise to take a bit deeper walk through the woods.

Walk with me....

Pages v Websites

One of the first things we need to understand is there is indeed elements that differ between a page and the entire domain. This one is interesting in that many times I get the sense that SEO folks don't always understand that. In fact, most things that a search engine does is actually on a page level, not the domain level.

In fact, outside of links, the only really important areas that tend to be site-wide are Trust elements, classifications (topical etc), internal link ratios and geo-localized ones. By and large, a search engine actually see's your site on a page by page basis. This is the first important distinction to keep in mind.

Site-Level Signals

But what things does Google see on the site level?

Authority/Trust; this is the all-encompassing concept of not only what you do on your site (outbound links, web spam, sneaky redirects, thin content etc) but what is going on off-site (link spam, social spam etc). What level of trust does your website have in the eyes of the search engine? This is something that is incredible hard to build, but easy to lose.

Thin Content (formally known as Panda); while related to the above, it's worth having on it's own. Large amounts of thin content and/or dulpication could end up in dampening of sections or entire sites. We can also consider the GooPLA (Google Page Layout Algorithm) conepts here as well.

Classifications; while this does generally exist more on the page level, there are categorical elements for an entire website. These can also contain more granular elements as well. From an ecommerce site (or subdomain etc) to a given market. This is where strong architecture can become your best friend. Assist the search engine in understanding (and classifying) what your site and it's various parts are about.

Internal link ratios; in simplest terms you want to show a search engine the importance of pages through internal links. Linking to the most important pages the most, the least important the least. This can often lead to page mapping issues (wrong target page ranking).

Localization; another element this is more site-wide in nature is localization. Meaning; if applicable, where does this entity reside? What areas do they service? We can even consider geographic targets for sites not directly geo-location related. Elements here can include the top level domain, language etc.

Entities; an entity is a person, place or thing. One needs to look no further than Google's knowledge graph to see the importance they are seemingly placing on these over the years. If it is brands on the page (ecommerce) or citations in an informational piece, make them prominent. Also, Google seems bullish on authorship of late, so also consider the company entities (people) and how they can be leveraged. Your site can 'be' an entity as well as having sub-entities and associations throughout.

Domain history; Matt Cutts recently talked about how a domain that's REALLY had issues, might even carry penalties after someone else buys it. Given that, we do know to some extent Google can potentially look at the domain history in classifying a website. This of course plays back into the above 'trust' elements.

Page Level Signals

As I mentioned earlier, Google (and most search engines) often look at things on a page by page level, not site-wide. This is an important distinction that often SEO types seem to forget. They crawl PAGES.

What elements might a search engine look at on the page level?

Meta-data; the easiest one here is of course the header data from TITLE to Meta-descriptions (not a ranking factor) and even canonical and other tags. Some of these elements can be ranking factors while others (rel=canonical for example) can tell Google how to treat the page.

Classifications (and Localization); much like the site level elements pages themselves will fall under classifications. It could be content type, (informational, transactional etc), by intent (commercial, knowledge), localization (about a given region) etc. Ensure you communicate the intent and core elements of a given page.

Entities; Again, as with the site-level, entities can also be associated with a given page on the site. In fact, Google potentially now looks at associating query inference (medical symptoms) and an entity (the condition itself).Authority/trust (external links); beyond the above mentioned authorship elements, trust signals can also bee seen in what you link to (as well as citations). It can be a positive or a negative. Take the opportunity where you can to associate the page with other authoritative web identities.

Semantic signals; web pages have words right? Search engines love words, yea? Then be sure that some form of semantic analysis is going on for the page including categorization of content, related term/phrase ratios, citations and more.Linguistic indicators (language and nuances); of course as part of the above classification methods, the page can also help through closer demographic identification through the language on the page.

Prominence factors; another area that doesn't get a whole lot of attention but is know to show up over the years in various patents are elements such as; headings (h1-h5), bold, lists (and possibly italics). They don't likely weight heavily, but are indeed worth consideration.
Oh and I do not jest when I say there's more. We're sticking to some of the higher level parts to get the point across. We've likely already lost ½ the readers that started this piece. Thanks for sticking around.

The off-site stuff

While the focus of this offering was more about the on-site bits, Google also see's the off-site activities as part of it's perception. From authority and topicality to demographics and categorizations, it does come into play as far as how Google perceives you as an entity/set of entities.

These can include....

Link related factors;

PageRank (or relative nodal link valuation)

Link text (internal and external)

Link relevance (global and page)

Also; Temporal, Personalized PageRank and Semantic analysis

Temporal;

Link velocity

Link age

Entity citation frequency

social visibility

Authority/Trust;

Citations

Co-citations

TrustRank type signals

Reach;

Links

News

Social

Video

You get the idea. I don't really want to focus on the off-site elements as much today. We're looking at these as they do fall into 'how Google see's your website'. It's just not the focus...

Moving along...

The Spam Connection

Another loosely related element of how Google perceives the site is of course; adversarial information retrieval. Better known as search and destroy for web spam. While understanding ranking factors is a great idea, it is also good to know things that might get one dampened as well.

Web spam generally breaks into two categories;

Boosting; tactics used to increase a ranking (link spam for example)

Hiding; tactics used to mislead or trick the search engine (cloaking for example)

We're all now fairly familiar with Panda (type) and Penguin devaluations as well as manual actions such as the Unnatural Links messages. But one should also be cognisant of the myriad of other ways Google might be looking at your site, in terms of how spammy it is.

SEO is dead

Right? Ok, maybe not.

Your mission my friends, should you choose to accept it, is to develop an SEO strategy that covers off all the elements within this post. Because that my friend, is what the savvy search optimizer should be doing.

I could sit here and explain to you ways to leverage them all, but this is an article not a book.

My goal here today was to bring light to the complexity of our reality. If you're a website owner, SEO enthusiast or hard core optimizing guru. Never become myopic on how Google really works. See the forest, see the trees and even the leaves.

David Harry is an avid search geek with a penchant for all things IR (information retrieval) related. He's best known for his writings along the same lines; technical aspects of SEO. He is an SEO consultant for Verve Developments and runs the SEO Training Dojo

24 Responses to “How Google Sees Your Website”

Superb writing and very good points you shared with us. Few points noted manually, bookmarked for future read and shared with my groups and peers.
Additionally I think structured data and LSI keywords play great role on onsite crawling and proper object based indexing.

I really wants for web organic search engine traffic. Can you help me regarding this issue. I have figure out everything on my website like on-page and off-page settings. Also i get backlinks from directories but there is non any help. Please let me know any source of good seo partners.

Much obliged concerning a fantastic (and, acknowledging the point, sensibly brief) audit. I’m demonstrating this one to my manager the following time he asks me what we have to do with a specific end goal to “SEO our sites

Incredible diagram of the SEO status, a debt of gratitude is in order regarding the subtle elements! Regardless of how frequently they say seo is dead, it will never genuinely be.. unless individuals quit utilizing internet searchers.

Nice post and reaffirms what we’ve been doing however I’ve recently found a shift in googles thinking. For some reason I see sites that have no “traditional” seo elements ranking top three for dozens of highly competitive keywords which do not appear on the site, in the meta data or even in inbound links. Would love to discuss this with you. Feel free to email and will show you what I mean and see if you can figure it out.

This review is really detailed! I´m sure if you work on every item which is listed here, your page will reach a new level in the SERP. I also don´t think SEO is dead (I hear that way too often), but it´s not as easy as it was 10 years ago and many generic keywords are hard to attack with “natural” methods. So you always should think of a niche-strategy which sometimes helps much better than not ranking anywhere, because of too much concurrence.