Welcome to Web Indexer. This program enables you to index your website,
or group of web pages, to produce a HTML document which lists the HTML
files, with a description of each file. You can tell the program to ignore
certain files/directories by either making the description of the HTML page
to be "IGNOREINDEX", for one file.... or put a file called ".ignore_index"
in a directory if you wish it to ignore ALL files in that directory.

README : file explaining the usage of the web indexer
web_index.pl : The Indexer program
web_index.conf : A Sample default configuration for the program
Recurse.pm : Module to process files recursively through dirs
images/ : a directory holding graphics needed for the graphics versionweb_index2.0.zip or
web_index2.0.tar.gz : The full package
(Save this link to get the file... there is no ftp available)

How does it get the description?

You tell the program various ways to look for the description in each
HTML document. The three current methods require you to add the following
HTML code to your HTML page... (preferebly in the

area)
where [description] is a description of the page (of course).

<WINDEX "[description]">

<META NAME="description" CONTENT="[description]">

<TITLE>[description]</TITLE>

Smart checking: Look for 2, if it isn't found look for 3.

= just used for this web indexer (#2 is preferable)

= HTML3.0 complient tag which not only this program uses but other search engines/web spiders.

= The standard HTML tag

Configuration of the program

There are two ways to run this program, using command line arguments
(e.g. web_index.pl -d /usr/home/dion ... ) or via the configuration
file (e.g. see web_index.conf).

COMMAND LINE ARGS:
-----------------
h = Show the Usage Help
w = Get the description via <WINDEX "description here">
m = Get the description via
<META NAME="description" CONTENT="description here">
t = Get the description via <TITLE>Grab this part</TITLE>
i = If the description = IGNOREINDEX then ignore the file
I = If a ".ignore_index" file is in a directory ignore
all files in the directory and move ot the next
T = Text output ONLY (Not using the graphics)
c = Read configuration from web_index.conf
d [dir] = Start indexing from [dir]
u [url] = Set the base URL to http://[url]
C [file] = Read configuration from [file]
o [file] = HTML filename to output too
EXAMPLE: To setup an index of my web pages (starting at dion) i would
% web_index.pl -iI -d /usr/home/dion/www -u /dion
which would produce HTML: /dion/web_index.html
USING THE CONFIGURATION FILE (Default: web_index.conf)
------------------------------------------------------
If you wish to use the defauly web_index.conf then you would call
% web_index.pl -c.
If you want to specify a different filename then use
% web_index.pl -C /path/to/file.conf
The config file... and the different variables you can set
1. -> ROOT_INDEX_DIRECTORY: /usr/home/dion/www
Set the directory for the indexer to start looking through to
compile it's Site Index
2. -> ROOT_URL: /dion
Set the base URL for the index (basically the URL which points to
the ROOT_INDEX_DIRECTORY)
3. -> IMAGES_URL: /images
Set the relative URL where the images are stored (e.g. if you have
your images at /images you would have
the above setting)
4. -> OUTPUT_FILE: /usr/home/dion/www/windex.html
Set the HTML doc which will have the Index in it
5. -> GET_DESCRIPTION_FROM: w or m or t
Here you select the method for the program to get the [description]
w = description via <WINDEX "[description]">
m = description via <META NAME="description" CONTENT="[description]">
t = description via <TITLE>[description]</TITLE>
if you leave it blank it will try to use "m" and if it doesn't get
a match it will try "t"
6. -> IGNORE_DIRECTORIES: yes
If you have "yes" there then if you make a file ".ignore_index" in
a directory the program will ignore all files in it and move to the
next.
7. -> IGNORE_FILES: yes
If you have "yes" there then if you make a description "IGNOREINDEX"
(e.g. if you are
using the "m" method) then that file will be ignored
8. -> TEXT_OUTPUT_ONLY: yes
If you have "yes" there then if will not print out any nice images
that are in the "images/" directory. Personally i like the images :)
9. -> HTML_HEADER
[put html here]
END_HTML_HEADER
All the HTML inbetween the two tags HTML_HEADER, and END_HTML_HEADER
will be printed at the top of the HTML output file (OUTPUT_FILE)
10. -> HTML_FOOTER
[put html here]
END_HTML_FOOTER
All the HTML inbetween the two tags HTML_FOOTER, and END_HTML_FOOTER
will be printed at the bottom of the HTML output file (OUTPUT_FILE)
Configuration of web_search.cgi
Now to setup the Web Search part of the package. It is also simple.
1. Place the program in a place where http:// can get to it.
E.g. in /cgi-bin or in your web directory.
2. Now make sure the web_index.pl has it's FORM ACTION pointing to the cgi

3. Edit the web_search.cgi itself and change the following:
$image_dir = "images";
to point to the directory that points to the where the "images/" one
is
4. Change the PrintHeader, and PrintFooter function to customise the HTML
that you want and change the following which holds the default search:
<INPUT TYPE="hidden" NAME="IGNORE" VALUE="yes">
<INPUT TYPE="hidden" NAME="boolean" VALUE="OR">
<INPUT TYPE="hidden" NAME="case" VALUE="Insensitive">
5. Edit the web_search.html file
Change all the <A HREF> and <IMG> tags to point to your images etc.
Change the again points to the web_search.cgi. Now set the
following variables:
<INPUT TYPE="hidden" NAME="DOC_ROOT" VALUE="/usr/home/dion/www">
<INPUT TYPE="hidden" NAME="URL_ROOT" VALUE="/dion">
<INPUT TYPE="hidden" NAME="IGNORE" VALUE="yes">
These are the same as for the web_index.pl
6. Celebrate you are done :)
* ------------------------------------------------------------------------ *
* If you have any questions or comments contact Dion Almaer *
* ------------------------------------------------------------------------ *
* Email Address | [email protected] *
* WWW Page | /dion *
* ------------------------------------------------------------------------ *
* -=< M E M B E R S E R V I C E S I N T E R N A T I O N A L >=- *
* ------------------------------------------------------------------------ *