This isn't just a cutover from term vectors to postings right? It actually scores each passage as if it were its own hit/document matching a search? Ie the passage ranking/selection differs from the two existing highlighters.

Michael McCandless
added a comment - 06/Aug/12 10:43 Wow This looks very nice!
Should we move EMPTY into DocsAndPositionsEnum?
This isn't just a cutover from term vectors to postings right? It actually scores each passage as if it were its own hit/document matching a search? Ie the passage ranking/selection differs from the two existing highlighters.
I like the EMPTY_INDEXREADER (so MTQs do no rewrite work).

In this first patch its used both as a sentinel for a stopping condition and as
a placeholder for "term doesnt exist in this segment". The former i think is
no longer necessary and the latter is probably overkill.

This isn't just a cutover from term vectors to postings right? It actually scores each passage as if it were its own hit/document matching a search? Ie the passage ranking/selection differs from the two existing highlighters.

Right: I think its different in a number of ways. I hope it should be really fast: but
again I didnt even bother benchmarking yet.

Robert Muir
added a comment - 06/Aug/12 12:32
Should we move EMPTY into DocsAndPositionsEnum?
maybe it can be either moved or removed if the code is fixed
In this first patch its used both as a sentinel for a stopping condition and as
a placeholder for "term doesnt exist in this segment". The former i think is
no longer necessary and the latter is probably overkill.
This isn't just a cutover from term vectors to postings right? It actually scores each passage as if it were its own hit/document matching a search? Ie the passage ranking/selection differs from the two existing highlighters.
Right: I think its different in a number of ways. I hope it should be really fast: but
again I didnt even bother benchmarking yet.
Its also limited in some ways since its just a prototype.

I get some improvements here in performance (for non-prox queries) by hacking up luceneutil to
test queries with postingshighlighter+offsets vs fastvectorhighlighter+vectors.

However, I don't think this will be realistically useful until we have the new block layout from the pfor branch:
prox queries are hurt by the interleaving in the stream (just like if you use payloads), unrelated to highlighting.

I tried to do more experiments like 'wikibig' in luceneutil but i ran out of disk space.

Once we have the block layout landed lets revisit this: it gives a much smaller index, faster indexing,
and I think will work well when thats sorted out.

Robert Muir
added a comment - 07/Aug/12 05:20 I get some improvements here in performance (for non-prox queries) by hacking up luceneutil to
test queries with postingshighlighter+offsets vs fastvectorhighlighter+vectors.
However, I don't think this will be realistically useful until we have the new block layout from the pfor branch:
prox queries are hurt by the interleaving in the stream (just like if you use payloads), unrelated to highlighting.
I tried to do more experiments like 'wikibig' in luceneutil but i ran out of disk space.
Once we have the block layout landed lets revisit this: it gives a much smaller index, faster indexing,
and I think will work well when thats sorted out.