How AltaVista Works

The articles below appeared in the Search Engine Update newsletter and have important information not yet added to this page. Please review them to find out about any new developments. Further below, you will also find a list of other articles about this search engine that may be of interest.

Yahoo says that the AltaVista brand and web site may survive once it completes its acquisition of Overture (which owns AltaVista).

Overview

AltaVista is primarily a crawler-based search engine, using its own technology to create a web page index that forms the bulk of the listings that appear in response to a search from the AltaVista home page. However, the company also makes available other types of listings, such as human-powered directory information from LookSmart, paid listings from Overture (GoTo) and news headlines from Moreover.

This page covers the major search indexes that AltaVista provides and how your information may appear within them, with a particular focus on AltaVista’s crawler-based web page index.

Overview Of Results Page

The results page is dominated by listings from AltaVista’s own web page index, though paid listings from either Overture or AltaVista are also included. We’ll examine the placement of paid listings first.

Placement Of Paid Listings

Immediately below the search box, under either the “Featured Sites” or “Products and Services” headings, you may see up to three different links. These are paid listings that come from either AltaVista or from Overture. At the bottom of the results page, the heading may also appear again, with up to three more listings. Information on purchasing listings is covered in the Paid Placement section, below.

Web Page Index Results

After any paid listings, matches from AltaVista’s crawler-based web page index make up the bulk of the search results page. These matches appear under the “We found NUMBER results” heading, where NUMBER is the actual number of results found for that search.

Information about how the web page listings are gathered and ranked is covered in the Web Page Index section, below.

Blended Search Links

Sometimes, a “Blended Search” link will be the first listing in the web page index results section. These currently appear in response to some shopping or news related searches. Selecting the link brings back information from AltaVista’s shopping search or from the Moreover news index. See the AltaVista Shopping and AltaVista News sections below, for more information.

Here an example of how such a link appeared in a search for “DVD Players” in October 2001:

This section deals specifically with how pages get listed within the web pages area of AltaVista’s results page, as outlined in the Web Page Results section, above.

Being Crawled

Even if you never submit to AltaVista (as explained in Submitting, below), AltaVista’s crawler may still locate your site, follow links within it and add pages it finds to its web page index. The Encouraging Crawlers page has tips on site architectural changes you can do to help improve the odds that your pages will be picked up naturally by the AltaVista crawler. You can expect AltaVista’s crawler to revisit your site about every four weeks, to check for changes and new pages.

Free Submitting

AltaVista maintains an Add URL page that allows you to ensure that key pages from your web site are listed quickly within the index. Any page submitted may appear within four to six weeks, assuming it isn’t spam. You can reach the page via the URL below:

When you access the Add URL page, it will display a submission code that must be entered. The code is a series of letters and numbers, but because they are displayed in a graphic format, automatic submission tools cannot read the information. This has been done because AltaVista found that the vast majority of automated submissions were for low quality pages.

There is no limit on how many pages you may submit per day, per web site. The only caveat is that after submitting five pages, the system will force you to generate a new submission code. It is perfectly acceptable to then use the new code to do another batch of five URLs, and so on, until you are done.

The “ransom note” look of the submission code is designed to combat against text recognition programs, but that also means the codes can be hard for even humans to read. Should you be unable to read the code, you can refresh the Add URL page to have a new one generated, which may be clearer.

If you have problems getting listed, you can contact AltaVista via the page below (choose URL Listing Support from the drop down box):

AltaVista operates both a self-service paid inclusion program (“Express Inclusion”) for those who wish to submit less than 500 web pages and a bulk program (“Trusted Feed”) for those with many pages they’d like to submit on a cost-per-click basis. Both programs are described further, below.

Should you use these programs? Perhaps, if you have new pages you absolutely must get listed right away or you have pages that you absolutely must get refreshed on a weekly basis. However, you may also find that purchasing listings via Overture can get you on the AltaVista search results page quickly and with a guarantee to appear for terms relevant to your site. Explore that option, as well (see the Paid Placement section, below).

Also, remember that AltaVista adds pages for free, through its regular crawling. It also still promises to revisit pages in its index each month. That means you may already have plenty of pages listed, and they should stay there despite the addition of the paid inclusion program. Nor will those non-paid pages be downgraded in relevancy, AltaVista says. The Encouraging Crawlers page has tips on site architectural changes you can do to help improve the odds that your pages will be picked up naturally by the AltaVista crawler.

Is it anything goes for submitting with the paid program? Nope. AltaVista says it won’t accept pages that do things such as use invisible or tiny text, those that attempt to mislead, mirror pages, doorway-style pages with no real content or those that only link or redirect to other pages.

What about cloaking? Cloaking is allowed via the Trusted Feed program, if you are using an XML feed or other mechanism to represent the content of the destination page.

Express Inclusion

The program allows you to submit up to 500 URLs, which will be visited on a weekly basis, for up to six months. This means that a brand new page submitted to AltaVista through the program should show up in about week, or changes to existing pages should be reflected in about a week.

Notice I said changes will show up in “about a week,” not weekly. While the spider will revisit weekly, it could take longer than this for the changes to be reflected in the index. This is because the index might be refreshed before any new changes can be reported by the spider. Hard to understand? Consider this example:

On Friday, the paid inclusion spider visits your page. Then on Monday, you make a change. On Thursday, AltaVista refreshes its index, unaware of your change, because it happened after the paid inclusion spider made its visit. Now its Friday again, and the spider returns to your site on its weekly schedule and finds the changes. However, the changes don’t appear until AltaVista again updates its index on the following Thursday.

In all, 10 days have elapsed from when your page was changed to when the changes appeared in AltaVista, despite the fact that the paid inclusion spider did visit you each week, as promised.

In the worst case, you’d be looking at up to two weeks before the changes appear, should the spidering and index refresh cycles align badly. On the flip side, the cycles could also align so that you might find your changes appear in AltaVista in less than a week.

AltaVista’s pricing runs on a six-month basis. However, for comparison purposes with other programs, yearly pricing is shown below:

$78 first

$58 (URLs 2-10)

$38 (URLs 11-500)

Prices above are for non-porn URLs. There are also additional charges you can pay to be included in country-specific versions of AltaVista.

Should you enroll in the inclusion program and then decide not to renew, your URL is not supposed to be dropped, AltaVista says. It simply won’t be revisited as often as the program allows.

NOTE: infoSpider is the company that is handling the paid inclusion system for AltaVista. infoSpider is also a sister company to submission services of WorldSubmit, ProBoost, and ProBoostGold. However, you needn’t use any of these other services to take advantage of AltaVista’s program.

Trusted Feed

The Trusted Feed program allows businesses with large web sites to submit 500 or more URLs via an XML (Extensible Markup Language) feed directly to AltaVista’s index.

Webmasters have the option of submitting meta data including custom titles, keywords and descriptions for each URL. Information submitted in a Trusted Feed replaces or supplants the information ordinarily gathered by AltaVista’s crawler.

While the Trusted Feed program allows webmasters to influence key components of each URL, AltaVista insists that the underlying pages will still be subject to the same relevancy algorithms as all other pages in the index. The meta data contained in a Trusted Feed is just one of many factors used to compute relevance, according to AltaVista.

Trusted Feed customers are provided with extensive reporting tools, designed to reveal both how much traffic particular pages receive from AltaVista, and where the traffic is coming from. Reports can be generated for overall traffic patterns, top queries, top URLs, “clicks related to this word,” “clicks related to this URL,” and other types of information.

Reports can be downloaded into Excel spreadsheets for further analysis. Since Trusted Feed sites are refreshed once a week, the reports will provide valuable feedback to webmasters, allowing them to check position and tweak pages to achieve higher traffic.

The program is particularly aimed to benefit sites that are traditionally difficult to crawl, such as those using frames or dynamically generated content. The program can even be used to submit URLs to AltaVista from sites that block search engine crawlers with the robots.txt protocol.

Pricing for the Trusted Feed program is on a cost-per-click model, varying from US $0.15 to $0.60 depending on the category of content. There is also a minimum monthly spending for CPC prices in the lower end of the range.

The Trusted Feed program essentially allows cloaking, in that the pages AltaVista indexes are different from those that the human visitor sees. However, AltaVista will compare the Trusted Feed meta data with the destination pages themselves. It will also conduct periodic spot-checks of pages, comparing them with Trusted Feed meta data. If the pages appear to be significantly different in meaning, then a spam penalty may be applied.

Contractually, AltaVista can also reject any page if a Trusted Feed customer doesn’t comply with AltaVista’s policies for site submission, which are the same for both paying and non-paying webmasters.

This section deals specifically with what factors influence whether pages rank well within the web pages area of AltaVista’s results page, as outlined in the Web Page Results section, above.

On The Page Factors

These are key factors occurring within a page’s content that influence whether a page will rank well for a particular search term:

The term appears in the title of the web page.

The term appears in the meta description and keyword tags.

The term appears in the beginning of the body copy.

The number of search terms present on a page and their proximity are also considered for ranking purposes. In general, the ranking algorithm has been tweaked to help ensure that pages containing exact phrases searched for will rise to the top of the results. It’s also true — generally — that after this, pages with ALL of the search terms will be listed, then pages with ANY of the terms. Bear in mind that beyond this, other ranking factors such as those described above and below play a role.

Root Page Boost

AltaVista tends to place a stronger emphasis on a web site’s root page (see Encouraging Crawlers), especially in response to single word and popular searches.

Link Analysis

AltaVista makes use of link analysis to boost page rankings. The More About Link Analysis page explains this concept in more depth, plus it has tips on gaining important links to build your reputation in link analysis systems.

Spamming

AltaVista may impose relevancy penalties or remove pages from its web index altogether, if the pages are found to be spamming the service. These are things AltaVista considers spamming:

Using invisible text or text too small to read.

Repeating keywords over and over, for no good reason.

Being misleading about a page’s content in the meta description tag.

Stuffing pages with keywords unrelated to the page’s actual content.

Submitting identical or near-identical pages, either from the same site or from mirror sites

Historically, using a meta refresh tag with a setting of less than 30 seconds has also caused pages to be dropped for spamming.

Spam penalties include

Identical, near-identical pages, pages using meta-refresh, and those with excessive keyword repetition are automatically excluded from the index, if detected.

Suspicious pages are placed on a report, then reviewed by a human being. If your page was listed in AltaVista, then disappeared, it may be that it was considered spam upon review and removed.

All pages from a site may be removed and further submissions blocked.

To report spam, you can use the page below (choose Spam Reporting from the drop down box):

Only the first 100K of text on a page is indexed. After that, only links are indexed, up to a maximum of 4MB. Since most web pages are under 100K, these limitations should not be a problem for most webmasters.

Pages heavy with text in a small font size may not get listed. Avoid using font size 2 or lower as the dominant size for your body copy.

AltaVista considers words in the meta description and keywords tags to be additional words on the page, just as if they appeared on the page in ordinary text.

Web Page Index: Listings Format

Text from the description meta tag is used for page descriptions. If no meta description tag exists, then AltaVista will create an “abstract” based on text from the body copy of a web page. It may also use only a portion of a meta description tag as well as body copy, to form a description. The article below describes this in more detail:

AltaVista allows allows for “Listing Enhancements,” which are logos, icons, taglines or text links that you choose. You can only have these enhancements for pages that are submitted through the AltaVista paid inclusion program (described above), and you must pay an extra fee for them. You can learn more about the program directly from AltaVista, via the URL below:

AltaVista automatically identifies the language of a web page and also tries to recognize those with pornographic content, as explained below:

Language Detection

AltaVista automatically categorizes web pages by language. Its spider tries to determine the language of a web page at the time it is spidered.

The technology is dictionary-based. AltaVista looks at a page to see if the bulk of the words match those of a particular language.

There is no way for a webmaster to specify which language a page should be assigned to, not even using the Content-Type meta tag, as explained on the More About Countries And Languages page.

AltaVista also translates the text it finds into Unicode, which can store characters for all languages, not just Western European ones. This allows a single index to serve users all over the world. A user can perform a search in English, then one in Chinese, without having to leave the service and go to a Chinese-only edition.

Porn Detection / Filter Mode

AltaVista provides a filtering mode for its users. Potentially objectionable pages are filtered in three ways. First, AltaVista’s spider tags pages as objectionable, if it finds certain words and phrases used in particular ways. Second, the search retrieval software uses a filtering process developed in partnership with SurfWatch to catch anything that makes it past the spider-based filter. Finally, AltaVista allows users to report on any pages that may have slipped through the first two barriers, via the page below:

If AltaVista has sold no paid listings for a particular word through its own in-house program, then up to the top six listings from Overture for that word will appear in the paid listings areas, as explained above.

If AltaVista has sold its own listings, then Overture listings may still appear, but there will be fewer of these (such as positions 1-4 or 1-5).

In addition, if AltaVista has sold a term via an exclusive deal, then NO Overture listings will appear. In other words, if you bought “cars” with AltaVista through an exclusive deal, then only your paid listing for “cars” would appear.

To appear in any location where Overture links are shown, you need to be listed with Overture and among its top bidders. The How Overture (GoTo) Works page explains more about being listed with Overture.

AltaVista Directory

AltaVista provides access to its own version of the LookSmart directory to those few who browse categories from the AltaVista home page.

To appear in AltaVista’s directory information, you need to be listed with LookSmart. See the How LookSmart Works page general tips on doing this.

AltaVista says it refreshes its listings from LookSmart every day, so once you are added to LookSmart, you should appear in AltaVista’s version within a day or two.

You may find that the order of sites in a category at AltaVista is different from that at the corresponding category at LookSmart. AltaVista says this situation is supposed to be rectified in the near future, when it begins using LookSmart ranking criteria.

AltaVista Shopping

AltaVista maintains a shopping search engine that brings back product pricing from online web merchants. However, this service is being discontinued in Fall 2001. Instead, shopping search results will be provided by another company. When more details about how integration is to occur, they will be added to this page.

AltaVista News

AltaVista maintains a news search engine that presents articles from sites across the web. AltaVista users access this information primarily through Blended Search Links from the main AltaVista results page or by going directly to the AltaVista News web site, which can be found here:

AltaVista has dropped Moreover in place of its own system to gather content (that system, by the way, is largely using the AllTheWeb news crawler). Prisma refinement links are also offered with news searching (and have been moved to the right-hand side of the screen for ordinary web searching).

AltaVista Multimedia Index

AltaVista makes multimedia content — images, audio files and video files, available to searches from its multimedia indexes, as explained below. Users access the multimedia index either through the tabs on AltaVista’s results page (as explained above) or by visiting the various multimedia services directly.

Some webmasters do not want their images, sounds or video files to be listed with AltaVista’s multimedia indexes. If you want your information excluded, follow the instructions below:

AltaVista provides crawler-based image listings, such as .gif and .jpeg files, that come from across the web. The AltaVista spider cannot actually “see” what’s inside of these images. Instead, it makes educated guesses.

For example, if you look for “eagles” using AltaVista image search, you’ll get pictures of eagles or graphics with the word “eagles” in them. However, AltaVista didn’t retrieve these pictures because it could recognize what an eagle looks like or because it could read the text inside an image.

Instead, AltaVista (and most crawler-based image search engines) remain mostly blind to what the actual image shows. Instead, they rely on the words that appear around the image or in the file name of the image to understand its content. So, pictures with the word “eagles” in the file name or pictures that appear on web pages that make use of the word “eagles” in the HTML text give AltaVista the clues it needs to display results.

In addition to crawling the web, AltaVista also presents images that come from partner sites such as Getty and Corbis.

AltaVista also uses its image search technology to add images to listings at its non-US editions. See the Image Enhanced Results section, below, for more about this.

MP3/Audio Search

AltaVista’s MP3/Audio Search service provides sound listings that come from crawling both web and FTP sites, plus those available from partner sites such as CDNow and MP3.com. If you have a music site you would like to add, you can use the form below:

AltaVista’s Video Search features content that comes from crawling the web as well as video files provided by news, entertainment and financial broadcast companies. Video Search be reached directly via the URL below:

AltaVista operates a variety of non-US editions targeting countries worldwide. These editions offer their users access to both “worldwide” and country-specific search results. To locate AltaVista’s various country-specific editions, see the links listed at the bottom of the AltaVista.com home page.

Worldwide results come from AltaVista’s global search index. This is the same index that AltaVista.com uses. So, if you are listed with AltaVista.com (as explained in the Getting Listed section), you will be listed with the worldwide results offered by any AltaVista edition.

Country-Specific Results

Country-specific results used to come from special country-specific indexes maintained by AltaVista’s various editions. These are being phased out, as explained briefly in the article below:

When AltaVista’s plans stabilize, any new specific tips about being listed in regional editions will be added to this page. Until then, follow these two key rules:

Write pages in the language of the country you are targeting.

Have pages hosted under a domain name that matches the country you are targeting.

The More About Countries And Languages page provides many additional tips on how to prepare content for country-specific search engines like those operated by AltaVista.

You can also use the paid inclusion program described above to specify that your country-specific URLs should be included in the country-specific search results for a particular AltaVista edition.

Finally, AltaVista UK does have a deal with Overture to carry paid listings. As of Sept. 2001, At AltaVista UK, the two links displayed under the “Featured Sites” heading correspond to the top two listings at Overture UK for the term you searched on. In addition, the three links displayed under the “Try these resources” heading at the bottom of the results page will correspond to listings three through five from Overture UK.

Image Enhanced Results

Virtually all of AltaVista’s non-US editions began including images as part of its regular search results in April 2001. Known internally as Image Enhanced Results, or IER, this is where some listings in search results have images associated with them. Try a search for “london” at AltaVista UK or “eiffel tower” at AltaVista France to see examples of this.

AltaVista automatically chooses which listings should get images and exactly what images it will use. Specifically, here’s what happens behind the scenes:

AltaVista displays textual results, as usual. So, in a search for “buckingham palace,” it will display pages it considers tops for those words, using all its usual criteria.

Next, AltaVista will see whether any of the top results have pictures that qualify as a match for the search terms. First of all, that means the page needs an image on it. No images, no chance of having an image displayed. Next, the image needs to be in .jpeg or .gif format.

Assuming you make it past those barriers, you then need an image on your page that’s associated with the search terms. This could mean that the search terms are in the image’s file name — “buckingham-palace.jpg” would be an example of this. Next, it can also mean that the search terms appear in the HTML copy near the image. So, perhaps you have the words “buckingham palace” as part of a paragraph describing a picture on your page.

Finally, AltaVista says that images should be in full color and not too large in file size, though how large wasn’t specified. I’ve seen images from 3K up to 40K, but staying near the lower end of that scale seems better.