Hi,
Result page as below:
1. Welcome To TNB (highlight matches)
URL: http://demo2/tnb/ Score: 100% Date: - Size: 22 kB
2. Welcome To TNB (highlight matches)
URL: http://demo2/tnb/ Score: 53% Date: - Size: 21 kB
3. Welcome To TNB (highlight matches)
URL: http://demo2/tnb/ Score: 33% Date: - Size: 24 kB
....
I know the solution for this was given before from one of the posting
(only clue is that it is in the mailing list) but I just can't find
it after hours of searching. So, I hope that someone can help me out.
The changes I made in the conf.pl are as below:
$DOCUMENT_ROOT = 'd:/iPlanet/Servers/https-demo2/Project/tnb/';
$BASE_URL = 'http://demo2/tnb/';
$CGIBIN = 'http://demo2/cgi-bin/';
$INSTALL_DIR = 'd:/iPlanet/Servers/https-demo2/Project/cgi-bin/';
@EXT = ("jsp","html","pdf");
$HTTP_START_URL = 'http://demo2/tnb/';
$HTTP_MAX_PAGES = 2000;
$HTTP_SERVER_ROOT = 'd:/iPlanet/Servers/https-demo2/Project';
http://demo2 points to http://demo2/Project (done by web server).
So the base url is actually http://demo2/Project/tnb/.
Need to do this as the web server I am using has some admin files
inside the root and I don't want to mix it with the web files.
I found out that the url are actually correct but somehow got redirected
in the get_url function (tools.pl), called by crawl_http indexer_web.pl)
with some printouts.
By the way, why do we need get_url? It's just a few lines but I don't think
I understand the code.
Any clues for my problem?
Thank you very much.
_________________________________________________________________
Chat with friends online, try MSN Messenger: http://messenger.msn.com