stopwords in exact title searches

stopwords in exact title searches

We’ve been struggling with configuring title searches that contain stopwords (e.g. “The Help”). I found an old post in which Demian was indicating that you could get some weird results when mixing stopworded and non-stopworded fields in
searchspecs.yaml (https://sourceforge.net/p/vufind/mailman/message/34143745/) but I’m not sure how to find out which fields are stopworded and which aren’t. Is this configured somewhere?
Are there other settings that affect how stopwords are handled? Right now it seems to make no difference whether the search terms are in quotes or not – the stopwords are ignored.

I thought that perhaps uncommenting the “ExactSettings” section of the searchspecs.yaml file would be helpful for this issue, but this causes the search to return zero results.

Thanks in advance,

Katie McGrath

eiNetwork

Pittsburgh, PA

The information contained in this e-mail, and any attachment, is confidential and is intended solely for the use of the intended recipient. Access, copying or re-use of the e-mail or any attachment, or any information contained therein, by any other person
is not authorized. If you are not the intended recipient please return the e-mail to the sender and delete it from your computer. Although we attempt to sweep e-mail and attachments for viruses, we do not guarantee that either are virus-free and accept no
liability for any damage sustained as a result of viruses.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________
Vufind-tech mailing list
[hidden email]https://lists.sourceforge.net/lists/listinfo/vufind-tech

Re: stopwords in exact title searches

Note that in the fieldType definitions at the top, some contain solr.StopFilterFactory and some do not. Logically enough, all the field types containing solr.StopFilterFactory will delete stopwords. You then
can look at the type attributes of the field definitions to figure out whether or not a given field deletes stopwords.

When you use solr.StopFilterFactory, the only thing you can really configure is which words are considered to be stopwords. There is no way to temporarily turn it off, because what the filter does is literally
delete the stopwords before they can be stored in your index – so the data can’t be found because it is simply not there – there is just a placeholder that indicates “some deleted word was here” for the purposes of determining word positions in phrases.

The reason that mixing stopworded and non-stopworded fields in a Dismax search causes problems is that the system gets confused trying to reconcile deleted and non-deleted words in the index and doesn’t always
bring back complete or appropriate result sets.

As far as a solution is concerned, you might want to try reducing or completely removing the list of stopwords in your biblio/conf/stopwords.txt and then reindexing your records. Ideally it would be nice to set
up a second test instance of VuFind that you could compare in real-time against the current instance. You might find that the search results are just as good or better if you leave the stopwords alone, and if so that may be an easier solution. Here at Villanova
we ended up greatly reducing our stopword list from the defaults shipped with VuFind, and it solved some problems (though we still do consider “the” to be a stopword). The only cost, apart from the inevitable changes in relevance ranking, is that your index
will be a little bit larger since it will be storing more words.

I hope this is helpful, but please let me know if I can do anything more to help! You might also find it useful to check out the archives of the solr-user list (or put out a question there) to see if there have
been any new innovations in dealing with stopwords; with VuFind 4.0, we will be upgrading to Solr 6, and I haven’t yet had time to look at all of the new features of the last couple of Solr releases. Perhaps there are some new options I’m not yet aware of.

We’ve been struggling with configuring title searches that contain stopwords (e.g. “The Help”). I found an old post in which Demian was indicating that you could get some weird results when mixing stopworded and non-stopworded fields in
searchspecs.yaml (https://sourceforge.net/p/vufind/mailman/message/34143745/)
but I’m not sure how to find out which fields are stopworded and which aren’t. Is this configured somewhere? Are there other settings that affect how stopwords are handled? Right now it seems to make no difference whether the search terms are in quotes
or not – the stopwords are ignored.

I thought that perhaps uncommenting the “ExactSettings” section of the searchspecs.yaml file would be helpful for this issue, but this causes the search to return zero results.

Thanks in advance,

Katie McGrath

eiNetwork

Pittsburgh, PA

The information contained in this e-mail, and any attachment, is confidential and is intended solely for the use of the intended recipient. Access, copying or re-use of
the e-mail or any attachment, or any information contained therein, by any other person is not authorized. If you are not the intended recipient please return the e-mail to the sender and delete it from your computer. Although we attempt to sweep e-mail and
attachments for viruses, we do not guarantee that either are virus-free and accept no liability for any damage sustained as a result of viruses.