exclusion list

How should I manage it?

&nbsp

specter

2:07 pm on Mar 18, 2006 (gmt 0)

Hi everyone,

I have a script to create a little search engine,and I would create a filter in order to prevent that certain terms are considered suitables for the search.This in order to avoid to get search results for "stop words" such as "any" "and" "for" and so on. The script provides a length filter that cuts off all typed words shorter than a certain length,but it can't work at all as there are,in my case,short terms that,instead are suitable for the search,such as "kit" or "cap" and so on...So I must give up this idea.

Alternatively,the script provides also a smut filter that matches the input words with the ones included in an exclusion list (external file)and if a match is found,it censores the results; Obviously I would need to change this output:I need, instead,if a match is found,simply the input word/s is/are ignored for the search. This allows that if someone types in a "stop word" it is not considerd for the search but if it is included in a record it is shown (otherwise,all records containing stop words such us "and" or "for" would be censored!...) Could someone help me to edit properly this filter please? Any helpful reply will be very appreciated.

perl_diver

Personally, I would not allow search words less than three in length (at to it so of a I no etc etc etc), words that are three bytes (or letters) in length are acceptable.

I am not trying to sound like a know-it-all, I am far from the best educated perl coder, but your code is very poorly written and has a number of security problems.

Moby_Dim

7:26 pm on Mar 18, 2006 (gmt 0)

Sorry, I do not understand : if you do not search for word "and" and a record contains "blood, sweat and tears", why this record'd be "censored", if you search for "blood" for example?

specter

11:35 pm on Mar 18, 2006 (gmt 0)

Sorry, I do not understand : if you do not search for word "and" and a record contains "blood, sweat and tears", why this record'd be "censored", if you search for "blood" for example?

You touched the central point of the question:

It's just what I don't want happens.But for the current filter setting it's so.This because the smut filter works either ignoring any smut word typed in by the user,and prevents that any record containing smut words is shown.I need,instead,that the filter only stops the typed words,loosing any effectiveness on the records returning.

To better clarify my need,and answering to Perl_diver,I cannot rely on the length filter,because,in my case there are three letters words that could be suitable for the search...

Next, suppose you have a search phrase entered by a user, e.g. "with the beatles" (you may use case sensitive search or ignore this (this regex needs more time to work), of course; we do not touch this question now.)

if($sw) { #... no need to search for this one next } else { ...do your search } }

If you have "with", "the"... etc.... in your stopwords file, $sw flag will be raised twice, and you'll search for "beatles" only in your db (or file). So, you'll find "Beatles for sale" in any case too, regardless "for" is a stopword for sure.

If you need to search for exact phrase, you need not check for stop words at all imho.

perl_diver

8:49 pm on Mar 19, 2006 (gmt 0)

To better clarify my need,and answering to Perl_diver,I cannot rely on the length filter,because,in my case there are three letters words that could be suitable for the search...

What length filter? If you don't want words shorter than three letters use grep() on your origianl array of search terms:

if($sw) { #... no need to search for this one next } else { ...do your search } }

Right? Where should I put it precisely in the above search array in order to avoid synthax errors?

Moby_Dim

8:59 pm on Mar 21, 2006 (gmt 0)

Specter, i do not know the meaning of some variables in your snippet (note here that #use strict pragma is very important for code maintainability); some mistakes are there too (e.g. foreach $word (@words) loop). Easier to create a new search procedure to replace "...do your search" words.

specter

8:22 pm on Mar 22, 2006 (gmt 0)

Well,

I'm a bit confused... Perl is quite unknown for me,so I need a further help... here is the whole script:

###############################

#!/usr/local/bin/perl

$base = '/home/bla/public_html/search/base.txt';

# Change this to the PATH (not the URL) of the head.txt file # (include the filename)

$headfile = '/home/bla/public_html/search/head.txt';

# Change this to the PATH (not the URL) of the foot.txt file # (include the filename)

$footfile = '/home/bla/public_html/search/foot.txt';

# Change this to the PATH (not the URL) of the respond.txt file # (include the filename)

$respondfile = '/home/bla/public_html/search/respond.txt';

# Change this to the PATH (not the URL) of the smut.txt file # Any word found in smut.txt is assumed to be adult material # therefore you can control what is censored and what isn't # (include the filename)

$smutfile = '/home/bla/public_html/search/smut.txt';

# Change this to the URL of this script # (include the filename)

$scripturl = 'http://www.blabla.com/cgi-bin/search.cgi';

# Edit this one to choose the font for the search results # DO NOT use " or any special characters # Use below for an example of what is allowed # Also do not set a font size as the script does this automatically

$font = 'FACE=arial,helvetica COLOR=000000';

# Change this to the minimum search word length # This is to exclude searches for "the", "and", "a", etc.

$minword = '3';

# Enter the maximum number of characters you want to allow for # the 'title' field for new site submissions

$maxtitle = '50';

# Enter the maximum number of characters you want to allow for # the 'description' field for new site submissions

$maxdescription = '150';

# Enter the maximum number of characters you want to allow for # the 'keywords' field for new site submissions

$maxkeywords = '50';

# How many URLs do you want displayed on the New URLs page

$numnew = '3';

# If you want to use flock to avoid corrupt files by double access # leave this line as is...if you don't then change the 1 to a 0

$uselock = '1';

# If you want to automatically send an autorespond e-mail to visitors # who submit their URL to the database then leave this line as is # If you don't, then change the 1 to a 0