Notes

HtDig has a known security hole in the latest beta version 3.2.0b3, currently downloadable from the site
http://htdig.org. There is a fix in the latest stable version, 3.1.16, and in the code snapshot. The previous stable
version, 3.1.15, also had the security hole. This beta version with known security problems has apparently been available for
download since 2001-10-15. According to these notes, "This hole can allow remote users to read any file on your system that
the UID running your webserver can read."

HtDig selects the files to index by gathering links from one or more starting URLs. It will gather links that are on the
same site as the starting ones by matching a simple set of string patterns.

Webglimpse can index files by Site, essentially the same as HtDig; by Directory (all files within a specified directory
on the server, whether or not they are linked); and by Tree (all files with a certain number of 'mouse clicks' or 'hops'
away from one or more starting points. Webglimpse can also include or exclude files by regexp patterns and can accept
information about synonymous virtual domains and alias directories in order not to gather duplicate links.

According to the 'Features and Requirements' page on the http://htdig.org website, " Both HTML documents and plain text
files can be searched. Searching of other file types will be supported in future versions.". However, there are
references to searching PDF files in the FAQ area; this may refer only to the beta version which currently is released
with a security hole. Possibly by getting the new beta code snapshot you might successfully be able to index PDF using the xpdf
add-on.

Webglimpse supports indexing any file that can be filtered to text by an external program. Free and reliable external
programs are known for PDF, MSWord, and all compressed file formats. By pre-filtering files before indexing (and
filtering on download) searches are quite fast even on these filetypes. Pre-filtering also saves a great deal of space
when indexing remote files. Several scripts to filter HTML tags are provided, including ones which convert HTML
character codes such as á = aacute; for effective searching in non-English languages.