Anatomy of a Search Engine

At the most basic level, search engines share these three logical components:

Spider and/or Indexer (AKA "data prep")

A binary Fulltext Index (AKA "the index")

The engine that runs the searches and gives back results (AKA "the engine")

Each one of these systems is dependent on the previous system in order to function. A search engine can't run searches if there is no fulltext index. And there won't be any fulltext index if the documents were never fetched and indexed.

Modern search engines have further subdivided the data prep, index and search functions into additional subsystems, in order to achieve better modularity and extreme scalability.

Traditional Monolithic Search

What's NOT a Search Engine

Note that it is technically possible to search in just one step by scanning the source material line by line every time a search term is entered. This is very slow and inefficient and we do not consider these systems to be true search engines.

Examples of these linear scan based "pseudo-search-engines" include:

The Unix find and grep utilities

The SQL "LIKE" operator

The "Search" menu option in applications like Microsoft Word

In addition to being very slow (relative to the fulltext index based designs), these simpler pseudo engines typically don't have advanced capabilities like stemming or thesaurus support.