A "file system" realm is used to search a web site that is hosted on the same physical server as the search engine. This type of realm discovers and indexes web pages by directly opening files and folders on the web server, instead of requesting pages over the Internet using the web crawler.

Features and Benefits

A "file system" realm has these features and benefits:

Can be used on web hosts which do not allow sockets privileges. The web crawler requires sockets privileges and therefore will not work on such hosts, whereas the file system crawler works everywhere.

Can be used for sites in which not all web pages are linked to one another. The web crawler can only discover web pages by following links, and on such sites it will miss any pages that aren't linked directly or indirectly from the main page. The file system crawler walks the file system directly and discovers each and every file.

File system realms are more efficient. For large, text-intensive sites whose index files must be refreshed often, the file system realm will be the best choice.

Disadvantages

A "file system" realm has the following limitations. For these reasons it is not selected as the default realm type:

Can only be used for web sites on the same physical server as the search engine.

Will index *all* files which match the given file extensions. On many sites this will include test directories and old content that the webmaster does not wish to be searched. Careful review and pruning of the index files is required in these cases.

Active content, like Perl, ASP, and PHP, involves files which appear differently when accessed via the file system versus via the web. The file system stores the source code of a program, while the web transmits the output of the program. The content gathered by indexing the file system may be much different than the content seen by a visitor when requesting the file over the web. Thus, these types of realms may return inaccurate search results when used to search active content.

Furthermore, active content may display different data based on query string, i.e. page.pl?article=1 and page.pl?article=2. The web crawler will correctly request both pages and store their contents as separate web pages, while the file system crawler will only be able to index the source code of the page.pl Perl script.

Creating a Realm

Follow these steps to create a "file system" realm:

Go to the FDSE admin page and log in.

Choose the Manage Realms link from the navigational menu.

On the Manage Realms page, choose the first link labeled Create New Realm.

Select the fourth realm type, labeled Website Realms - File System Discovery. Enter the appropriate URL and file system path for the web site to be indexed. FDSE will attempt to default these values for proper indexing of the local site, but sometimes FDSE is not able to accurately auto-detect the folder path. In those cases you will have to ensure that an accurate path is entered.

Click the Create New Realm button to save the realm.

Return to the Manage Realms page. The new realm should be listed with zero pages. Choose the Rebuild link to build the index.