This is a Solr specific question but let me answer it. You can clickAnalysis tab at your Solr dashboard and check Index and Query analyseswhether they are same or not. You will get a analyzer by analyzer debugoutput at that panel.

As stated it's a solr question... But I give you a hint (I don't haveaccess to the server right now)... Stemming is different for Spanish as forEnglish... If I remember correctly I had to use the hunspell tokenizer setfor Spanish.... Or something similar to that..

Sorry I can't be more precise.... But you've got now a better startingpoint from what I had :)Greetz

This is a Solr specific question but let me answer it. You can clickAnalysis tab at your Solr dashboard and check Index and Query analyseswhether they are same or not. You will get a analyzer by analyzer debugoutput at that panel.

Also in order for Spanish accents to be propperly stemmed... Something hadto be set to ISO Latin .... And a propper file had to be supplied tosolr....

I'm on a tablet and can't access the server to look....

On Feb 13, 2018 10:03 PM, "BlackIce" <[EMAIL PROTECTED]> wrote:

Hi,

As stated it's a solr question... But I give you a hint (I don't haveaccess to the server right now)... Stemming is different for Spanish as forEnglish... If I remember correctly I had to use the hunspell tokenizer setfor Spanish.... Or something similar to that..

Sorry I can't be more precise.... But you've got now a better startingpoint from what I had :)Greetz

This is a Solr specific question but let me answer it. You can clickAnalysis tab at your Solr dashboard and check Index and Query analyseswhether they are same or not. You will get a analyzer by analyzer debugoutput at that panel.

My guess is you haven't reindexed after changing filter configuration, which is required for index-time filters.

Regarding your fieldType, you can drop the lowercase and ASCII folding filters and just keep the ICU folder, it will work for pretty much any character set. It will normalize case, Scandinavian digraphs (AE), probably Dutch digraphs (IJ) as well. But also deal with German oe ü, ringel s and all regular Latin accents including Spanish tilde ~, circumflex etc.

If a there is a language specific normalizer/folder, use that instead of ICU because there can be differences in how accents should be normalized across languages.

And do not forget to reindex and use the same normalizers index- and query-time.

Checked and confirmed, even Dutch digraph Ĳ is folded properly, as well as the upper case dotless Turkish i and the Spanish example you provided is folded properly.

Correction for German (before Nagel corrects me), ö and ü are not normalized by ICU folder according to German rules. Their accents are stripped instead of transforming them into oe and ue respectively. It makes the case of language specific folders, especially when dealing with Scandinavian or German. Dutch and Latin can be folded just by removing their accents.

Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext