Using SWISH-E To Index Your Site

SWISH-Enhanced is a fast, powerful, flexible, free, and easy to use system for indexing collections of Web pages or other text files. Once indexed, you can perform quick searches on your Web pages using the index file. It is currently installed on CGI101, so if you're a customer, you don't need to install it. If you're not a customer, check with your ISP to see if SWISH-E is already installed; if not, you can downloaded it from SunSITE.

There are three parts to making your web site searchable with SWISH-E. First, you have to create a configuration file that SWISH-E will read to index your site. Then you have to actually index the site. And lastly, you have to have a CGI that will perform the search and return results.

Step 1. Create the Config File

To create an index of your pages, SWISH-E reads a configuration file to determine which pages should (or should not) be indexed. You should download (or copy) the following file to your own account:

The paths to your web directory should be fixed, so you should replace /home/yourusername/public_html with the actual (full) path to your web files.

Nothing else should need changing unless you want to fine-tune your search engine (such as omitting files with certain names, etc.). If you read through the config file you'll see the different options, plus help for each one. Any line that starts with a "#" is a comment, and many options are commented out by default.

The sample config file is also set up so that it only indexes .html files. If you want to index other files, for example .txt or .shtml files, you'll need to change the following line near the bottom of the config file:

IndexOnly .html

And add the suffixes you want, for example:

IndexOnly .html .txt .shtml

2. Index The Site

Once your config file is saved, you'll have to run swish-e to create the index file. This can be done from the unix command line like so:

/usr/local/bin/swish-e -c /home/yourusername/public_html/swish.conf

If all goes well, the index file will be created at the location specified by the IndexFile directive in the conf file. The first time you run this, you'll also want to chmod 644 swish.conf to make it readable by your CGIs.

You'll need to re-index your pages whenever you make changes to them. You can either do this manually every few weeks or so (depending on the frequency of the changes), or you may want to create a cron job to re-index your site nightly. (I recommend this, because it lets you change your pages without worrying about the index.) To set it up in cron, type

crontab -e

to edit your cron file. You'll be put into an editor (which will be whatever your default editor is - possibly pico or vi). You'll then add the following line:

I don't recommend this for indexing your own site, especially if you have bandwidth limitations on your account, because the spider traffic will eat up some of (or a lot of, depending on the size of your site) your web traffic quota.