Blog

How the search engines work

Search engines are the only way that users can get to the information they need from the seemingly infinite pages available on the Internet. Search engines need to be fast and accurate in producing results, so they are always on the cutting-edge of technology. The premise and theory behind search engines is pretty simple, they search the internet in various ways and then index what they find based on keywords and tags that the engine assigns to each page they index. As pages are scanned and scrutinized for data, the page moves around in the index in various ways. Also, whenever the page is updated or changed, the index of that page is altered accordingly.

There are three basic tasks that are fundamental to how the search engines work, they are as follows:

The engines search the Internet for keywords or phrases in the written content of each webpage on every site they find.

An index, of the keywords and where to find them, is maintained with a central database.

The users of the engine can use the keywords or a combination of the keywords to find specific pages in the index. All the indexed pages maintained in the central database are searched for the best keyword matches and the results are displayed to the user, based on that correlation.

There are three basic types of search engine models. There are engines powered by robots (also called spiders or crawlers), engines powered by human submissions and a combination of the two methods. Most modern engines use the latter of the three types, as to keep the most efficient and open-ended index possible.

The Crawlers or spiders are programs that read through an entire website, its meta tags (a HTML tag that can be used by the owner of the webpage to provide keywords and concepts) and follows through all the links in the website. Crawlers usually start their information gathering from a popular website and slowly spread out by following all the links provided on the website. All the information collected by the crawler is submitted to a central system where it is indexed accordingly. Crawlers periodically re-visit websites to update the information that they previously collected, so as to ensure website changes are taken into account in the index.

In human submitted search engines, humans submit the information on a webpage. This is then indexed and cataloged based on their input and suggestion.This type of index is harder to maintain and without automation, it is becoming less used and reliable. This type of search engine index is quickly becoming obsolete and is being phased out.

The marriage of the previous two methods is a lot more efficient and helpful, but it is dictated mainly by the robot automation. Humans can submit a page for indexing, but the robots have the final say on where it gets placed in the index. Depending on the factors used in ranking and how the algorithm works, the site may end up on page 1 or page 1000.

Each different engine on the market will produce different results for any given keyword search. The reason why the results given by different search engines vary is because of the differences in the algorithms used by them. The algorithm that a search engine uses determines matches between the information provided by the indices and the words/phrases searched by the user. Some engines, for example, the Google search leaves out articles such as ‘a’, ‘an’ and ‘the’ while others like AltaVista, index every single word including irrelevant words found on a webpage.

Different weights are assigned to the words based on whether they appear in the title, subheadings or the body of the page and the number of times a particular word appears in the webpage is stored. This extra information cataloging improves the efficiency of the search engines. The search engines found in the Internet have their own logic when it comes to assigning the ‘weights’. This also contributes to the differences found in the order and the lists of the results produced by the various search engines. Most search engines these days have been modeled to discourage the use of a keyword repeatedly on the webpage. This is known as keyword stuffing.

When a user types in key word or phrase into the search engine, the catalog that stores the indices is searched and the speed at which the results are displayed depends purely on the algorithms used by the search engine providers. Search engines play an important role as most of the pages on the Internet do not have titles that are relevant to the subject and it is impossible to find a website just by typing in the address bar. Understanding how the search engines work is important to both searchers and website owners, hopefully this article has helped a few people to understand them better.