lucene-java-user mailing list archives

Re: Performance of hit highlighting and finding term positions for a specific document

Date

Wed, 31 Mar 2004 02:36:54 GMT

Kevin A. Burton wrote:
> I'm playing with this package:
>
> http://home.clara.net/markharwood/lucene/highlight.htm
>
> Trying to do hit highlighting. This implementation uses another
> Analyzer to find the positions for the result terms.
> This seems that it's very inefficient since lucene already knows the
> frequency and position of given terms in the index.
>
> My question is whether it's hard to find a TermPosition for a given term
> in a given document rather than the whole index.
>
> IndexReader.termPositions( Term term ) is term specific not term and
> document specific.
As far as I know it's not currently possible to get this information from a standard lucene
index.
> Also it seems that after all this time that Lucene should have efficient
> hit highlighting as a standard package. Is there any interest in seeing
> a contribution in the sandbox for this if it uses the index positions?
I've been meaning to look into good ways to store token offset information to allow for very
efficient highlighting and I believe Mark may also be looking into improving the highlighter
via
other means such as temporary ram indexes. Search the archives to get a background on some
of the
idea's we've tossed around ('Dmitry's Term Vector stuff, plus some' and 'Demoting results'
come to
mind as threads that touch this topic).
Regards,
Bruce Ritchie
http://www.jivesoftware.com/