Doug Cutting wrote:
> David Spencer wrote:
>
>> Doug Cutting wrote:
>>
>>> And one should not try correction at all for terms which occur in a
>>> large proportion of the collection.
>>
>>
>>
>> I keep thinking over this one and I don't understand it. If a user
>> misspells a word and the "did you mean" spelling correction algorithm
>> determines that a frequent term is a good suggestion, why not suggest
>> it? The very fact that it's common could mean that it's more likely
>> that the user wanted this word (well, the heuristic here is that users
>> frequently search for frequent terms, which is probabably wrong, but
>> anyway..).
>
>
> I think you misunderstood me. What I meant to say was that if the term
> the user enters is very common then spell correction may be skipped.
> Very common words which are similar to the term the user entered should
> of course be shown. But if the user's term is very common one need not
> even attempt to find similarly-spelled words. Is that any better?
Yes, sure, thx, I understand now - but maybe not - the context I was
something like this:
[1] The user enters a query like:
recursize descent parser
[2] The search code parses this and sees that the 1st word is not a term
in the index, but the next 2 are. So it ignores the last 2 terms
("recursive" and "descent") and suggests alternatives to
"recursize"...thus if any term is in the index, regardless of frequency,
it is left as-is.
I guess you're saying that, if the user enters a term that appears in
the index and thus is sort of spelled correctly ( as it exists in some
doc), then we use the heuristic that any sufficiently large doc
collection will have tons of misspellings, so we assume that rare terms
in the query might be misspelled (i.e. not what the user intended) and
we suggest alternativies to these words too (in addition to the words in
the query that are not in the index at all).
>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org