How Search Engines See, Search, and Visit Your Website

There are several kinds of search engines. Some collect their own data through “robots” or “spiders”, software that crawls through the web, tracking the various links between pages and sites, gathering data which the search engine filters and adds to its database. These include Google and Yahoo. Some rely upon individuals and businesses to enter their website’s information manually, with a staff that checks and processes the information into their database. These are usually called directories.

Other search engines, called meta-search engines or metacrawlers, don’t gather their own information but search the databases of other search engines, letting them do the work.

And then there are combinations of all of these. For example, Google sends out robots to gather information. They also permit manual submissions. They also search other search engines to add information to their database. This combination creates a vast resource with checks and balances so they can offer a wide range of listings and not be dependent upon a single collection.

Once the search engine has the information from a web page, it applies software to clean it up and determine whether or not it is worth including in the database. Many robots and spiders are now sophisticated enough to do much of the filtering as they gather the content, saving time and effort on the other end. Therefore, you have to pass their tests in order to be considered for their listings.

Depending upon the search engine, when a search engine searches, these are some things it considers and gathers when visiting your pages:

Code Viability

Page Title

Links to your core pages (internal links)

Links to external web pages (external links)

Content – Text

Keywords: Word Frequency and Density

Meta Tags (page description)

Site history

Domain history

Search Engines Not Fooled By a Pretty Face

As you can see, a search engine doesn’t care about how pretty your website is and how many pictures or dazzling graphics you have. All it wants is information and that comes from content, titles, keywords, and meta tags. Style sheets are completely ignored as a search engine crawler digs through your site.

Web page designs now encompass more than just some words and a few pictures. A lot of designers are opting for Macromedia’s Flash, vblogging, podcasts, and other visual expressions to showcase their business and products with dramatic visual displays, video, sound, and slide shows. Photographers are definitely taking advantage of all the bells and whistles they can for visual impact.

Unfortunately, when it comes to getting picked up by search engines, if you don’t have text or some basic textual data on your page, it’s difficult for the search engines to gather information about your site. They don’t “look” at the site, they just crawl through it gathering data. No data, no collection, no listing. Even if you use splashy graphics, make sure there is some underlying code that tells the search engines what you do and why they should bother with you.

Another aspect often overlooked by web page designers when exploring their SEO options is how the blind, visually impaired, and disabled are not fooled by your site’s pretty face. Nor are handheld computers, cell phones, and other methods of accessing websites today. By making your website meet web standards for accessibility, search engines can easily explore your site and those with accessibility needs can easily explore your site and everyone is happy.

Layout Matters

Document structure and page layout (frames, tables etc.) affect how a search engine gathers information about your page. Frames and tables limit the search engine robot’s ability to index your site. The newer “iframe” is also ignored by robots. If you are using frames, have a “no frames” version so search engines and users with accessibility problems can still make use of your page.

If you use tables to layout your page, the layout may force the key information down lower in the page away from the searching eyes of the robots. Few search engine robots go beyond 50% of your site, so if the good content is low on the page, they will miss it. Meta tags, tags with information about your web page, are meant to be found within the HEAD of your document. If a search engine requires the meta tag information and it can’t find it, you lose.

Some search engines even score lower for non-web standard layouts, so getting your site out of tables or frames is very important to search engine success.

Validation Counts

Search engines won’t check your pages for proper tags and coding, but if it isn’t correct, it can lead the robot or spider in a wrong direction or confuse it. If it has difficulty moving through your pages, it will stop and look elsewhere.

A robot can spot a well designed page because it can get to the “good stuff” faster. A well-coded and validated site provides information to the search engine’s crawler that can help your site rise in the page ranks. Solid code allows the crawler to move easily through your site, which tells the search engine the site is designed with care and attention to detail and web standards. The easier you make the job for the robot, the more likely they will visit you, and return.

Link Popularity and Intrasite Linking – Connecting Web Pages

Internal linking or intrasite links, the process of creating links within your site and posts to other web pages on your site, helps guide a search engine through your site. Strong navigation, recent posts, related posts lists, and archives and site maps help create links between your web pages that a search engine crawler can follow to move from page to page to page through your site.

Link popularity, the number of external sites which link to you, is still critical to successful search engine page ranking. But it isn’t a matter of how many but who links to you. Search engines know the difference between lots of links and quality linking. They evaluate who is linking to you and if their links and content matches your content. If they don’t, it’s ignored. If it does, it scores.

Quality external referral links come from providing quality content worth linking to. Don’t fall for link exchange farms or link spamming. They don’t work. Google and other search engines are onto the scam long before you figure it out. Concentrate on providing quality content and spreading the word about your site and people will want to link to you because they like what they see.

Meta Tags, Page Titles, Descriptions, and Keywords

Meta tags which include keywords and descriptions are being used by some search engines, while others skip this information because of it has been abused. Still, it is good to add this information, just in case, to your web page head. Meta tags like titles are still being used to match keywords in the page title with other headings and content within your web pages to measure keyword density.

Keyword density and frequency is still a major part of cataloging your site. The search engine collects up the word content from the page title, headings, links, images, and text and sifts through for the most commonly used words and phrases. It can tell if a word is used too much, so you have to be careful about abusing keyword frequency. Keyword scams, like hidden lists of keywords, are well known and easily spotted by search engines, and they can result not only in not being indexed, but also being banned.

Domain History Scores

Now that many websites have been searched and that data stored in the Google databases, Google can go back and compare the new with the old. They can examine how the content and links have changed, scoring better for changes which indicates activity, monitoring and improving the website. Google can examine the history of the site and domain, checking the domain age, site owners and addresses, and comparing them against the new information to see if the site owner has changed or if there are any other changes in the domain. Changes in the domain information could indicate the sale of the site, which means the new owners may be untested, therefore the site might score lower. New domains will score lower, as there is no history of their intention to hang around, while older sites may score higher.

If you change your domain name, or the domain ownership changes, Google notices. What they do with that information is still confusing, but most sites drop in page rank temporarily. The oldest domain name ownerships receive higher scores because they’ve proven they can stick it out and last. See Secret Out – How Google Ranks Websites for more explanation.

[…] I’ve talked a lot about how Google ranks websites with their patented Page Rank program, analyzing the process from a variety of sources, with some hearsay evidence as well as facts. There is still a lot that we don’t know about how Google ranks web pages and how search engines read your web pages, but we are learning more every day. The more we learn, the more we can help Google rank our pages by providing them with clean, valid content and code. […]

[…] I’ve talked about Google, how search engines search your site, and how people use search engines to find your site, so check out Scribbling.net’s “Help the Googlebot Understand Your Website, a step-by-step look at how Google’s spider crawls through your blog or site gathering information and storing it in its database for the world to search and find YOU. When authoring a web site, keep in mind that the Googlebot is software, which means it has a set of capabilities and limitations and algorithms it uses to index content. There are lots of effective ways to trip up the Googlebot and make it impossible for it to index your content. Alternately, the Googlebot can index your site well, and then people will find it when searching for words it contains. […]

[…] An accessible website has another major benefit. Search engines “read” your site like the blind and visually impaired, looking for text in links, images, and content. Well-designed accessible sites help search engines visit your site and gather the information they need to rank the page in search results. You want good SEO (Search Engine Optimization), then meet the standards for website accessibility. […]

[…] In “How Search Engines See, Search, and Visit Your Website”, I wrote: …a search engine doesn’t care about how pretty your website is and how many pictures or dazzling graphics you have. All it wants is information and that comes from content, titles, keywords, and meta tags. Style sheets are completely ignored as a search engine crawler digs through your site. […]

[…] In “Judging Blogs by their Post Content Styles” and “Horse Sex and What is Dictating Your Blog’s Content?”, I helped people understand that their blog’s are judged by their content, and you should figure out what is dictating your blog’s content: Search engines need content in order to rate your site. This is a fact. A bunch of photographs are pretty to look at, but they don’t offer much content to fill up search engine databases. The same for links. Links help the interconnectedness of the web and by linking to other sites, you increase their link popularity, but links alone also won’t help your SEO status. […]

[…] bots or doesn’t get indexed as it is seen by people. Text and content in iFrames for example do not get indexed by Search Engines and text stored as images is not seen as text by search engines. Use good old fashioned text […]

[…] the Code Behind the Page and Conquering Site Validation Errors. And take time to understand how search engines see, search, and visit your website so you can take the necessary steps to make that visit […]