[ https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430322#comment-13430322
]
Sivan Yogev commented on LUCENE-4258:
-------------------------------------
Working on the details, it seems that we need to add a new layer of information for stacked
segments. For each field that was added with REPLACE_FIELDS, we need to hold the documents
in which a replace took place, with the number of the latest generation that had the replacement.
Name this list the "generation vector". That way, TermDocs provided by StackedSegmentReader
for a certain term is a special merge of that term's TermDocs for all stacked segments. The
"special" part about it is that we ignore occurrences from documents in which the term's field
was replaced in a later generation.
An example. Assume we have doc 1 with title "I love bananas" and doc 2 with title "I love
oranges", and the segment is flushed. We will have the following base segment (ignoring positions):
bananas: doc 1
I: doc1, doc 2
love: doc 1, doc 2
oranges: doc2
Now we add to doc 1 additional title field "I hate apples", and replace the title of doc 2
with "I love lemons", and flush. We will have the following segment for generation 1:
apples: doc 1
hate: doc 1
I: doc 1, doc 2
lemons: doc 2
love: doc 2
generation vector for field "title": (doc 2, generation 1)
TermDocs for a few terms:
* title:bananas : {1}, uses the TermDocs of the base segment and not affected by the field
title generation vector.
* title:oranges : {}, uses the TermDocs of the base segment, doc 2 title affected for generations
< 1, and the generation is 0.
* title:lemons : {2}, uses the TermDocs of generation 1. Doc 2 title affected for generations
< 1, but the term appears in generation 1.
* title:love : {1,2}, uses the TermDocs of both segments. Doc 2 title affected for generations
< 1, but the term appears in generation 1.
I propose to initially use PackedInts for the generation vector, since we know how many generations
the curent segment has upon flushing. Later we might consider special treatment for sparse
vectors.
> Incremental Field Updates through Stacked Segments
> --------------------------------------------------
>
> Key: LUCENE-4258
> URL: https://issues.apache.org/jira/browse/LUCENE-4258
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/index
> Reporter: Sivan Yogev
> Original Estimate: 2,520h
> Remaining Estimate: 2,520h
>
> Shai and I would like to start working on the proposal to Incremental Field Updates outlined
here (http://markmail.org/message/zhrdxxpfk6qvdaex).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org