I am doing this on the web, so I need my indexing to store the descriptions?
Users will just be searching for words on the website, and I want a document
summary or excerpt to appear below the links to the documents that contain
the words they are looking for. Does this make sense. They will not be
entering any switches when they search on the web. I put in the HTML2 lines
& still get bad directive for those 3 lines. I have been going at this for
4 days now :(
Thanks a bunch for your email!
-----Original Message-----
From: Jeffrey.Grunstein@ny.frb.org [mailto:Jeffrey.Grunstein@ny.frb.org]
Sent: Tuesday, November 19, 2002 2:17 PM
To: dena.wolf@orcinc.com; swish-e@sunsite.berkeley.edu
Subject: Re: [SWISH-E] how to get a description
Try this in your config file:
IndexContents HTML2 .html
IndexContents HTML2 .htm
StoreDescription HTML2 <BODY> 100000
# To index PDF files as well, try something like this...
FilterDir /opt/sfw/bin
FileFilter .pdf pdftotext "'%p' -"
IndexContents TXT .pdf
StoreDescription TXT 250000
This will store the BODY tag text of all files that end in .htm and .html,
using the HTML2 parser.
If you're running a slower machine and performance is an issue, lower the
100,000 number to somthing
smaller. If you have mostly smaller HTML files, this number can be lower
and you won't lose any content
when the descriptions are stored.
The command you listed looks like something you'd use to create the index.
As long as your config
file is right, you don't need to do anything else to store your
descriptions. You just need the right switches
when doing your search.
Try doing a search like this once you've created the new index file:
cgi-bin/swish-e -w <your search string> -f index.swish -x '%t -
%p\n%d\nlast updated %D\trank %r\tsize %l bytes\n\n'
This will actually return a lot more info than just the description. The
%d part shows the description.
Take a look at
http://www.swish-e.org/current/docs/SWISH-RUN.html#Searching_Command_Line_Ar
guments
and scroll down to the
section titled "-x formatstring (extended output format)".
"Wolf, Dena"
<dena.wolf@orcinc. To: Multiple recipients of
list <swish-e@sunsite.berkeley.edu>
com> cc:
Sent by: Subject: [SWISH-E] how to
get a description
swish-e@sunsite.be
rkeley.edu
11/19/2002 01:33
PM
Please respond to
dena.wolf
Two questions; Ive been reading the past archives that deal with this and
am
understanding a little but don't know if I am doing this at all right.
My indexing is working and I am getting results now. Now what I am trying
to do is to get a chunk of the body of the document in the results page
that
has say 40 words of the document body in it that includes the search word
or
not.
In my config file:
IndexFile index.swish
#MetaNames keywords description
IndexReport 3
FollowSymLinks no
IgnoreTotalWordCountWhenRanking yes
ReplaceRules replace "/export/home/orcsolar/html/" "http://www.orcinc.com/"
ReplaceRules remove "html/"
IgnoreLimit 50 1000
FileRules pathname contains members
IndexComments 0
IndexOnly .html .doc .xls .htm .ppt .txt .pdf
IndexContents HTML* .html .htm
StoreDescription HTML <body> 40
NoContents .gif .xbm .au .mov .mpg .ps
I added the IndexContents line & the StoreDescription line. I get a bad
directive error for both of those 2 new lines. Why? I checked that there
is
no space.
Also, in my index command line, how do I add something to make the
description run (assuming i get the indexing to work).
Right now my line says: cgi-bin/swish-e -c cgi-bin/orcsolar/config -i html
-v -f index.swish
Can I put -p swishdescription somewhere in that line? If so where?
I'm sorry I am having so much trouble trying to get all this to work.
Thanks
for your help.
Dena Wolf
Web Developer
Organization Resources Counselors, Inc.
212-852-0387
E-mail: dena.wolf@orcinc.com
URL: http://www.orcinc.com