Details

Description

Enable near realtime search in Lucene without external
dependencies. When RAM NRT is enabled, the implementation adds a
RAMDirectory to IndexWriter. Flushes go to the ramdir unless
there is no available space. Merges are completed in the ram
dir until there is no more available ram.

IW.optimize and IW.commit flush the ramdir to the primary
directory, all other operations try to keep segments in ram
until there is no more space.

Removed http://reader.imagero.com/uio/ code as it routinely corrupted the log. It's replacement is RandomAccessFile. Added MultiThreadSearcherPolicy that is used to created a multi threaded Searcher. Transaction multithreading has been removed because it makes it hard to debug. It will be optional in the future. Many bugs have been fixed. TestSearch tests for deletes. Index directories are now of the form "2_index", the index id and the suffix "_index".

Jason Rutherglen
added a comment - 24/Jun/08 00:21 lucene-1313.patch
Removed http://reader.imagero.com/uio/ code as it routinely corrupted the log. It's replacement is RandomAccessFile. Added MultiThreadSearcherPolicy that is used to created a multi threaded Searcher. Transaction multithreading has been removed because it makes it hard to debug. It will be optional in the future. Many bugs have been fixed. TestSearch tests for deletes. Index directories are now of the form "2_index", the index id and the suffix "_index".

Depends on LUCENE-1312 and LUCENE-1314. More bugs fixed. Deletes are committed to indexes only intermittently which improves the update speed dramatically. MaybeMergeIndexes now runs via a background timer.

Will remove writing a snapshot.xml file per transaction in favor of a human readable log. Creating and deleting these small files is a bottleneck for update speed. This way a transaction writes to 2 files only. The merges happen in the background and so never affect the transaction update speed. I am not sure how useful it would be, but it is possible to have a priority based IO system that favors transactions over merges. If a transaction is coming in and a merge is happening to disk, the merge is stopped and the transaction IO runs, then the merge IO continues.

I am not sure how to handle Documents with Fields that have a TokenStream as the value as I believe these cannot be serialized. For now I assume it will be unsupported.

Also not sure how to handle analyzers, are these generally serializable? It would be useful to serialize them for a more automated log recovery process.

Jason Rutherglen
added a comment - 24/Jun/08 21:31 lucene-1313.patch
Depends on LUCENE-1312 and LUCENE-1314 . More bugs fixed. Deletes are committed to indexes only intermittently which improves the update speed dramatically. MaybeMergeIndexes now runs via a background timer.
Will remove writing a snapshot.xml file per transaction in favor of a human readable log. Creating and deleting these small files is a bottleneck for update speed. This way a transaction writes to 2 files only. The merges happen in the background and so never affect the transaction update speed. I am not sure how useful it would be, but it is possible to have a priority based IO system that favors transactions over merges. If a transaction is coming in and a merge is happening to disk, the merge is stopped and the transaction IO runs, then the merge IO continues.
I am not sure how to handle Documents with Fields that have a TokenStream as the value as I believe these cannot be serialized. For now I assume it will be unsupported.
Also not sure how to handle analyzers, are these generally serializable? It would be useful to serialize them for a more automated log recovery process.

OceanSegmentReader implements reuse of deletedDocs bytes in conjunction with LUCENE-1314

Snapshot logging happens to a rolling log file

CRC32 checking added to transaction log

Added TestSystem test case that performs adds, updates and deletes. TestSystem uses arbitrarily small settings numbers to force the various background merges to happen within a minimal number of transactions

Transactions with over N documents encoded into a segment (via RAMDirectory) to the transaction log rather than serialized as a Document

Began work on LargeBatch functionality, needs test case. Large batches allow adding documents in bulk (also performing deletes) in a transaction that goes straight to an index bypassing the transaction log. This provides the same speed as using IndexWriter to perform bulk Document processing in Ocean.

Started OceanDatabase which will offer a Java API inspired by GData. Will offer optimistic concurrency (something required in a realtime search system) and dynamic object mapping (meaning types such as long, date, double will be mapped to a string term using some Solr code). A file sync is performed after each transaction, will add an option to allow syncing after N transactions like mysql. This will improve realtime update speeds.

Future:

Support for multiple servers by implementing distributed API and replication using LUCENE-1336

Test case that is akin to TestStressIndexing2 mainly to test threading

Jason Rutherglen
added a comment - 17/Jul/08 14:35 lucene-1313.patch
Depends on LUCENE-1314
OceanSegmentReader implements reuse of deletedDocs bytes in conjunction with LUCENE-1314
Snapshot logging happens to a rolling log file
CRC32 checking added to transaction log
Added TestSystem test case that performs adds, updates and deletes. TestSystem uses arbitrarily small settings numbers to force the various background merges to happen within a minimal number of transactions
Transactions with over N documents encoded into a segment (via RAMDirectory) to the transaction log rather than serialized as a Document
Started wiki page http://wiki.apache.org/lucene-java/OceanRealtimeSearch linked from http://wiki.apache.org/lucene-java/LuceneResources . Will place documentation there.
Document fields with Reader or TokenStream values supported
Began work on LargeBatch functionality, needs test case. Large batches allow adding documents in bulk (also performing deletes) in a transaction that goes straight to an index bypassing the transaction log. This provides the same speed as using IndexWriter to perform bulk Document processing in Ocean.
Started OceanDatabase which will offer a Java API inspired by GData. Will offer optimistic concurrency (something required in a realtime search system) and dynamic object mapping (meaning types such as long, date, double will be mapped to a string term using some Solr code). A file sync is performed after each transaction, will add an option to allow syncing after N transactions like mysql. This will improve realtime update speeds.
Future:
Support for multiple servers by implementing distributed API and replication using LUCENE-1336
Test case that is akin to TestStressIndexing2 mainly to test threading
Add LargeBatch test to TestSystem
Facets
Looking at adding GData compatible XML over HTTP API. Possibly can reuse the old Lucene GData code.
Integrate tag index when it's completed
Add LRU record cache to transaction log which will be useful for faster replication

I took an inital look at your code last night. Didn't actually execute anything, just followed method calls around to see what it was up to.

My first comment is sort of boring, but there are virtually no javadocs for the core classes such as TransactionSystem, Batch and Index. It would be great if there was a bit at the class level exaplaining what classes they interact with and how. It would also be very helpful if there was method level javadocs for at least the top level commit related logic.

One thing that early cought my attention is this method in TransactionSystem:

Am I supposed to call this method for each query (as suggested by the method name) or is this a factory method used to update my own Searcher instance after committing documents to the index (as suggested by the code)?

It's not such a big deal, but I personally think you should refactor the instanceOf to a Policy.searcherFactory method, or perhaps even a SearcherPolicyVisitor. Actually, this goes for a few other places in the module too: you have used instanceOf and unchecked casting a bit more extensive to solve problems than what I would have. But as it does not seem to be used in places where it would be a costrly thing to do these comments are mearly about code readability and gut feelings about future problems.

I'm a bit concerned about the potential loss of data while documents only resides in InstantiatedIndex or RAMDirectory. I think I'd like an option on some sort of transaction log that could be played up in case of a crash. I think the easiset way would be to convert all documents to be pre-analyzed (field.tokenStream) before passing them on to the instantiated writer. I don't know how much resources that might consume, but it would make me feel safer.

Karl Wettin
added a comment - 02/Sep/08 09:05 Hi Jason,
I took an inital look at your code last night. Didn't actually execute anything, just followed method calls around to see what it was up to.
My first comment is sort of boring, but there are virtually no javadocs for the core classes such as TransactionSystem, Batch and Index. It would be great if there was a bit at the class level exaplaining what classes they interact with and how. It would also be very helpful if there was method level javadocs for at least the top level commit related logic.
One thing that early cought my attention is this method in TransactionSystem:
public OceanSearcher getSearcher() throws IOException {
Snapshot snapshot = snapshots.getLatestSnapshot();
if (searcherPolicy instanceof SingleThreadSearcherPolicy) {
return new OceanSearcher(snapshot);
} else {
return new OceanMultiThreadSearcher(snapshot, searchThreadPool);
}
}
Am I supposed to call this method for each query (as suggested by the method name) or is this a factory method used to update my own Searcher instance after committing documents to the index (as suggested by the code)?
It's not such a big deal, but I personally think you should refactor the instanceOf to a Policy.searcherFactory method, or perhaps even a SearcherPolicyVisitor. Actually, this goes for a few other places in the module too: you have used instanceOf and unchecked casting a bit more extensive to solve problems than what I would have. But as it does not seem to be used in places where it would be a costrly thing to do these comments are mearly about code readability and gut feelings about future problems.
I'm a bit concerned about the potential loss of data while documents only resides in InstantiatedIndex or RAMDirectory. I think I'd like an option on some sort of transaction log that could be played up in case of a crash. I think the easiset way would be to convert all documents to be pre-analyzed (field.tokenStream) before passing them on to the instantiated writer. I don't know how much resources that might consume, but it would make me feel safer.
karl

Thanks for taking a look at the code! Yes the methods need javadoc, I was waiting to see if I had settled on them, and because I started building new code on top I guess the methods have settled so I need to add javadoc to them.

If you are using TransactionSystem then the getSearcher method would be called for each query. I have developed OceanDatabase which makes the searching transparent and implements optimistic concurrency (version number stored in the document). I believe most systems will want to use OceanDatabase, however the raw TransactionSystem which is more like IndexWriter will be left as well. I have been working on OceanDatabase and have neglected the javadocs of TransactionSystem.

I modeled the searcherPolicy instanceof code on the MergeScheduler type of system where there is a marker interface that the subclasses implement. I don't mind changing it, or if you want to you can as well. I considered it a minor detail though and admittedly did not spend much time on it. You are welcome to change it.

The transaction log is replayed on a restart of the system. It repopulates a RamIndex (uses RAMDirectory) on startup based on the max snapshot id of the existing indexes, and replays the transaction log from there.

I looked at converting documents to a token stream, the problem is, if the field is stored, it creates redundant storage of the data in the transaction log. Ultimately I could not find anything to be gained from storing a token stream. Also if it was converted, what would happen with stored fields? The issue with replaying the document later though is not having the Analyzer. In the distributed object code patch LUCENE-1336 I made Analyzer Serializable. I think it's best to serialize the Analyzer, or create a small database of serialized analyzers that can be called upon during the transaction log recovery process. Because I am not entirely sure about the ramifications of serializing the Analyzer, for example, how much data a serialized Analyzer may have. Perhaps other have some ideas or feedback about serializing analyzers.

In conclusion, I'll add more javadocs. Please feel free to ask more questions!

Jason Rutherglen
added a comment - 02/Sep/08 12:05 Hi Karl,
Thanks for taking a look at the code! Yes the methods need javadoc, I was waiting to see if I had settled on them, and because I started building new code on top I guess the methods have settled so I need to add javadoc to them.
If you are using TransactionSystem then the getSearcher method would be called for each query. I have developed OceanDatabase which makes the searching transparent and implements optimistic concurrency (version number stored in the document). I believe most systems will want to use OceanDatabase, however the raw TransactionSystem which is more like IndexWriter will be left as well. I have been working on OceanDatabase and have neglected the javadocs of TransactionSystem.
I modeled the searcherPolicy instanceof code on the MergeScheduler type of system where there is a marker interface that the subclasses implement. I don't mind changing it, or if you want to you can as well. I considered it a minor detail though and admittedly did not spend much time on it. You are welcome to change it.
The transaction log is replayed on a restart of the system. It repopulates a RamIndex (uses RAMDirectory) on startup based on the max snapshot id of the existing indexes, and replays the transaction log from there.
I looked at converting documents to a token stream, the problem is, if the field is stored, it creates redundant storage of the data in the transaction log. Ultimately I could not find anything to be gained from storing a token stream. Also if it was converted, what would happen with stored fields? The issue with replaying the document later though is not having the Analyzer. In the distributed object code patch LUCENE-1336 I made Analyzer Serializable. I think it's best to serialize the Analyzer, or create a small database of serialized analyzers that can be called upon during the transaction log recovery process. Because I am not entirely sure about the ramifications of serializing the Analyzer, for example, how much data a serialized Analyzer may have. Perhaps other have some ideas or feedback about serializing analyzers.
In conclusion, I'll add more javadocs. Please feel free to ask more questions!
Jason

The patch includes RealtimeIndex a basic class for performing atomic
transactional realtime indexing and search. A single thread
periodically flushes to disk the ram index. It relies onLUCENE-1516.

We need to benchmark this, specifically 1) realtime w/ramdir
transaction 2) realtime w/queued documents transaction 3) normal
indexing. Realtime w/ramdir encodes the transaction to a
RAMDirectory which is added to the RAM writer using
IW.addIndexesNoOptimize. Option 1 may be slower than option 2,
however if the system is replicating it may be the only option?

Long term I believe we need to implement searching over the
IndexWriter ram buffer (if possible). However I am not sure how
option 1 and replication would work with it?

Jason Rutherglen
added a comment - 01/Apr/09 22:48 - edited The patch includes RealtimeIndex a basic class for performing atomic
transactional realtime indexing and search. A single thread
periodically flushes to disk the ram index. It relies on
LUCENE-1516 .
We need to benchmark this, specifically 1) realtime w/ramdir
transaction 2) realtime w/queued documents transaction 3) normal
indexing. Realtime w/ramdir encodes the transaction to a
RAMDirectory which is added to the RAM writer using
IW.addIndexesNoOptimize. Option 1 may be slower than option 2,
however if the system is replicating it may be the only option?
Long term I believe we need to implement searching over the
IndexWriter ram buffer (if possible). However I am not sure how
option 1 and replication would work with it?

Jason, your last patch looks like it's taking the "flush first to RAM Dir" approach I just described as the next step (on the java-dev thread), right? Which is great!

So this has no external dependencies, right? And it simply layers on top of LUCENE-1516.

I'd be very interested to compare (benchmark) this approach vs solely LUCENE-1516.

Could we change this class so that instead of taking a Transaction object, holding adds & deletes, it simply mirrors IndexWriter's API? Ie, I'd like to decouple the performance optimization of "let's flush small segments ithrough a RAMDir first" from the transactional semantics of "I process a transaction atomically, and lock out other thread's transactions". Ie, the transactional restriction could/should layer on top of this performance optimization for near-realtime search?

Michael McCandless
added a comment - 02/Apr/09 08:53 Jason, your last patch looks like it's taking the "flush first to RAM Dir" approach I just described as the next step (on the java-dev thread), right? Which is great!
So this has no external dependencies, right? And it simply layers on top of LUCENE-1516 .
I'd be very interested to compare (benchmark) this approach vs solely LUCENE-1516 .
Could we change this class so that instead of taking a Transaction object, holding adds & deletes, it simply mirrors IndexWriter's API? Ie, I'd like to decouple the performance optimization of "let's flush small segments ithrough a RAMDir first" from the transactional semantics of "I process a transaction atomically, and lock out other thread's transactions". Ie, the transactional restriction could/should layer on top of this performance optimization for near-realtime search?

Jason Rutherglen
added a comment - 06/Apr/09 17:25 So this has no external dependencies, right?
Yes.
I'd be very interested to compare (benchmark) this approach
vs solely LUCENE-1516 .
Is the .alg using the NearRealtimeReader from LUCENE-1516 our
best measure of realtime performance?
the transactional restriction could/should layer on
top of this performance optimization for near-realtime search?
The transactional system should be able to support both methods.
Perhaps a non-locking setting would allow the same RealtimeIndex
class support both modes of operation?

We'll need to integrate the RAM based indexer into IndexWriter
to carry over the deletes to the ram index while it's copied to
disk. This is similar to IndexWriter.commitMergedDeletes
carrying deletes over at the segment reader level based on a
comparison of the current reader and the cloned reader.
Otherwise there's redundant deletions to the disk index using
IW.deleteDocuments which can be unnecessarily expensive. To make
external we would need to do the delete by doc id genealogy.

Jason Rutherglen
added a comment - 06/Apr/09 22:11 We'll need to integrate the RAM based indexer into IndexWriter
to carry over the deletes to the ram index while it's copied to
disk. This is similar to IndexWriter.commitMergedDeletes
carrying deletes over at the segment reader level based on a
comparison of the current reader and the cloned reader.
Otherwise there's redundant deletions to the disk index using
IW.deleteDocuments which can be unnecessarily expensive. To make
external we would need to do the delete by doc id genealogy.

Is the .alg using the NearRealtimeReader from LUCENE-1516 our
best measure of realtime performance?

So far, I think so? You get to set an update rate (delete + add) docs, eg 50 docs/sec, and a pause time between NRT reopens.

Still, it's synthetic. If you guys (LinkedIn) have a way to fold in some realism into the test, that'd be great, if only "our app ingests at X docs(MB)/sec and reopens the NRT reader X times per second" to set our ballback.

> the transactional restriction could/should layer on
> top of this performance optimization for near-realtime search?

The transactional system should be able to support both methods.
Perhaps a non-locking setting would allow the same RealtimeIndex
class support both modes of operation?

Sorry, what are both "modes" of operation?

I think there are two different "layers" here – first layer optimizes NRT by flushing small segments to RAMDir first. This seems generally useful and in theory has no impact to the API IndexWriter exposes (it's "merely" an internal optimization). The second layer adds this new Transaction object, such that N adds/deletes/commit/re-open NRT reader can be done atomically wrt other pending Transaction objects.

We'll need to integrate the RAM based indexer into IndexWriter
to carry over the deletes to the ram index while it's copied to
disk. This is similar to IndexWriter.commitMergedDeletes
carrying deletes over at the segment reader level based on a
comparison of the current reader and the cloned reader.
Otherwise there's redundant deletions to the disk index using
IW.deleteDocuments which can be unnecessarily expensive. To make
external we would need to do the delete by doc id genealogy.

Right, I think the RAMDir optimization would go directly into IW, if we can separate it out from Transaction. It could naturally derive from the existing RAMBufferSizeMB, ie if NRT forces a flush, so long as its tiny, put it into the local RAMDir instead of the actual Dir, then "deduct" that size from the allowed budget of DW's ram usage. When RAMDIr + DW exceeds RAMBufferSizeMB, we then merge all of RAMDir's segments into a "real" segment in the directory.

Michael McCandless
added a comment - 07/Apr/09 08:57
> I'd be very interested to compare (benchmark) this approach
> vs solely LUCENE-1516 .
Is the .alg using the NearRealtimeReader from LUCENE-1516 our
best measure of realtime performance?
So far, I think so? You get to set an update rate (delete + add) docs, eg 50 docs/sec, and a pause time between NRT reopens.
Still, it's synthetic. If you guys (LinkedIn) have a way to fold in some realism into the test, that'd be great, if only "our app ingests at X docs(MB)/sec and reopens the NRT reader X times per second" to set our ballback.
> the transactional restriction could/should layer on
> top of this performance optimization for near-realtime search?
The transactional system should be able to support both methods.
Perhaps a non-locking setting would allow the same RealtimeIndex
class support both modes of operation?
Sorry, what are both "modes" of operation?
I think there are two different "layers" here – first layer optimizes NRT by flushing small segments to RAMDir first. This seems generally useful and in theory has no impact to the API IndexWriter exposes (it's "merely" an internal optimization). The second layer adds this new Transaction object, such that N adds/deletes/commit/re-open NRT reader can be done atomically wrt other pending Transaction objects.
We'll need to integrate the RAM based indexer into IndexWriter
to carry over the deletes to the ram index while it's copied to
disk. This is similar to IndexWriter.commitMergedDeletes
carrying deletes over at the segment reader level based on a
comparison of the current reader and the cloned reader.
Otherwise there's redundant deletions to the disk index using
IW.deleteDocuments which can be unnecessarily expensive. To make
external we would need to do the delete by doc id genealogy.
Right, I think the RAMDir optimization would go directly into IW, if we can separate it out from Transaction. It could naturally derive from the existing RAMBufferSizeMB, ie if NRT forces a flush, so long as its tiny, put it into the local RAMDir instead of the actual Dir, then "deduct" that size from the allowed budget of DW's ram usage. When RAMDIr + DW exceeds RAMBufferSizeMB, we then merge all of RAMDir's segments into a "real" segment in the directory.

Jason Rutherglen
added a comment - 08/Apr/09 00:22 Latest realtime code, transactions are removed.
Needs to be benchmarked
There could be concurrency issues around deletes that occur
while directories are being flushed to disk.
It's Java JARed to include the files and directory structure.
The patch relies on LUCENE-1516 which if included would make the
changes incomprehensible

Still, it's synthetic. If you guys (LinkedIn) have a way
to fold in some realism into the test, that'd be great, if only
"our app ingests at X docs(MB)/sec and reopens the NRT reader X
times per second" to set our ballback.

The test we need to progress to is running the indexing side
endlessly while also reopening every X seconds, then
concurrently running searches. This way we can play with a bunch
of settings (mergescheduler threads, merge factors, max merge
docs, etc), use the python code to generate a dozen cases,
execute them and find out what seems optimal for our corpus.
It's a bit of work but probably the only way each Lucene user
can conclusively say they have the optimal settings needed for
their system. Usually there is a baseline QPS that is desired,
where the reopen delay may be increased to accommodate a lack of
QPS.

The ram dir portion of the NRT indexing increases in speed when
more threads are allocated but those compete with search
threads, another issue to keep in mind.

Jason Rutherglen
added a comment - 08/Apr/09 21:43 Still, it's synthetic. If you guys (LinkedIn) have a way
to fold in some realism into the test, that'd be great, if only
"our app ingests at X docs(MB)/sec and reopens the NRT reader X
times per second" to set our ballback.
The test we need to progress to is running the indexing side
endlessly while also reopening every X seconds, then
concurrently running searches. This way we can play with a bunch
of settings (mergescheduler threads, merge factors, max merge
docs, etc), use the python code to generate a dozen cases,
execute them and find out what seems optimal for our corpus.
It's a bit of work but probably the only way each Lucene user
can conclusively say they have the optimal settings needed for
their system. Usually there is a baseline QPS that is desired,
where the reopen delay may be increased to accommodate a lack of
QPS.
The ram dir portion of the NRT indexing increases in speed when
more threads are allocated but those compete with search
threads, another issue to keep in mind.
It might be good to add some default charting to
contrib/benchmark?

The test we need to progress to is running the indexing side
endlessly while also reopening every X seconds, then
concurrently running searches

Do you have a sense of what we'd need to add to contrib/benchmark to make this test possible? LUCENE-1516 takes the first baby step (adds a "NearRealtimeReaderTask").

Usually there is a baseline QPS that is desired,
where the reopen delay may be increased to accommodate a lack of
QPS.

Right – that's the point I made on java-dev about the "freedom" we have wrt NRT's performance.

The ram dir portion of the NRT indexing increases in speed when
more threads are allocated but those compete with search
threads, another issue to keep in mind.

Well, single threaded indexing speed is also improved by using RAM dir. Ie the use of RAM dir is orthogonal to the app's use of threads for indexing?

It might be good to add some default charting to
contrib/benchmark?

I've switched to Google's visualization API (http://code.google.com/apis/visualization/) which is a delight (they have a simple-to-use Python wrapper). It'd be awesome to somehow get simple charting folded into benchmark... maybe start w/ shear data export (as tab/comma delimited line file), and then have a separate step that slurps that data in and makes a [Google vis] chart.

Michael McCandless
added a comment - 09/Apr/09 09:19
The test we need to progress to is running the indexing side
endlessly while also reopening every X seconds, then
concurrently running searches
Do you have a sense of what we'd need to add to contrib/benchmark to make this test possible? LUCENE-1516 takes the first baby step (adds a "NearRealtimeReaderTask").
Usually there is a baseline QPS that is desired,
where the reopen delay may be increased to accommodate a lack of
QPS.
Right – that's the point I made on java-dev about the "freedom" we have wrt NRT's performance.
The ram dir portion of the NRT indexing increases in speed when
more threads are allocated but those compete with search
threads, another issue to keep in mind.
Well, single threaded indexing speed is also improved by using RAM dir. Ie the use of RAM dir is orthogonal to the app's use of threads for indexing?
It might be good to add some default charting to
contrib/benchmark?
I've switched to Google's visualization API ( http://code.google.com/apis/visualization/ ) which is a delight (they have a simple-to-use Python wrapper). It'd be awesome to somehow get simple charting folded into benchmark... maybe start w/ shear data export (as tab/comma delimited line file), and then have a separate step that slurps that data in and makes a [Google vis] chart.

I added an IndexWriter.getRAMIndex method that returns a
RAMIndex object that can be updated and flushed to the
underlying writer. I think this is better than adding more
methods to IndexWriter and it separates out the logic of the RAM
based near realtime index and the rest of IW.

Package protected IW.addIndexesNoOptimize(DirectoryIndexReader[]
readers) is added which is used by RAMIndex.flush. I thought
this functionality could work for LUCENE-1589 as a public
method, however because of the way IndexWriter performs merges
using segment infos, handling generic IndexReader classes (which
may not use segmentinfos) would then be difficult in the
addIndexesNoOptimize case.

I think RAMIndex.flush to the underlying writer is not
synchronized. If the IW is using ConcurrentMergeScheduler then
the heavy lifting is performed in the background and so should
not delay adding more documents to the RAMIndex.

IW.getReader returns the normal IW reader and the RAMIndex
reader if there is one.

The RAMIndex writer can be obtained and modified directly as
opposed to duplicating the setter methods of IndexWriter such as
setMergeScheduler.

Jason Rutherglen
added a comment - 17/Apr/09 20:27 I added an IndexWriter.getRAMIndex method that returns a
RAMIndex object that can be updated and flushed to the
underlying writer. I think this is better than adding more
methods to IndexWriter and it separates out the logic of the RAM
based near realtime index and the rest of IW.
Package protected IW.addIndexesNoOptimize(DirectoryIndexReader[]
readers) is added which is used by RAMIndex.flush. I thought
this functionality could work for LUCENE-1589 as a public
method, however because of the way IndexWriter performs merges
using segment infos, handling generic IndexReader classes (which
may not use segmentinfos) would then be difficult in the
addIndexesNoOptimize case.
I think RAMIndex.flush to the underlying writer is not
synchronized. If the IW is using ConcurrentMergeScheduler then
the heavy lifting is performed in the background and so should
not delay adding more documents to the RAMIndex.
IW.getReader returns the normal IW reader and the RAMIndex
reader if there is one.
The RAMIndex writer can be obtained and modified directly as
opposed to duplicating the setter methods of IndexWriter such as
setMergeScheduler.

The RAMIndex deletes approach changed to be like IndexWriter.
The deletes are queued in lists, then applied on RI.flush.

There is redundancy between IW.delete* and RI.delete*, perhaps
we don't need RI.delete*?

We need more multithreaded tests, probably based on
TestIndexWriter to see if we can trigger issues in regards to
deletes that occur while RI is calling IW.addIndexesNoOptimize.

If RI.delete* is removed, do we need a separate RAMIndex class
to add documents to or is there a more transparent way for NRT
ramdir to work? Perhaps we can add an IW.flushToRamDir (whereas
IW.flush writes to the IW directory) method that flushes the
rambuffer to the RAMIndex? Some of the the issues are around
swapping out the RAMDir once it's segments are flushed to IW. If
we took this approach would we need a IW.getReaderRAM method
that instead of flushing to disk flushes to the ramdir? The
other problem with the IW.flushToRamDir system is the loss of
concurrency where a large rambuffer may be flushing to disk
while the user really wants to small incremental NRT RI based
updates at the same time.

Jason Rutherglen
added a comment - 20/Apr/09 21:09
The RAMIndex deletes approach changed to be like IndexWriter.
The deletes are queued in lists, then applied on RI.flush.
There is redundancy between IW.delete* and RI.delete*, perhaps
we don't need RI.delete*?
We need more multithreaded tests, probably based on
TestIndexWriter to see if we can trigger issues in regards to
deletes that occur while RI is calling IW.addIndexesNoOptimize.
If RI.delete* is removed, do we need a separate RAMIndex class
to add documents to or is there a more transparent way for NRT
ramdir to work? Perhaps we can add an IW.flushToRamDir (whereas
IW.flush writes to the IW directory) method that flushes the
rambuffer to the RAMIndex? Some of the the issues are around
swapping out the RAMDir once it's segments are flushed to IW. If
we took this approach would we need a IW.getReaderRAM method
that instead of flushing to disk flushes to the ramdir? The
other problem with the IW.flushToRamDir system is the loss of
concurrency where a large rambuffer may be flushing to disk
while the user really wants to small incremental NRT RI based
updates at the same time.

For this patch I'm debating whether to add a package protected
IndexWriter.addIndexWriter method. The problem is, the RAMIndex
blocks on the write to disk during IW.addIndexesNoOptimize which
if we're using ConcurrentMergeScheduler shouldn't happen?
Meaning in this proposed solution, if segments keep on piling up
in RAMIndex, we simply move them over to the disk IW which will
in the background take care of merging them away and to disk.

I don't think it's necessary to immediately write ram segments
to disk (like the current patch does), instead it's possible to
simply copy segments over from the incoming IW, leave them in
RAM and they can be merged to disk as necessary? Then on
IW.flush any segmentinfo(s) that are not from the current
directory can be flushed to disk?

Jason Rutherglen
added a comment - 24/Apr/09 19:31 For this patch I'm debating whether to add a package protected
IndexWriter.addIndexWriter method. The problem is, the RAMIndex
blocks on the write to disk during IW.addIndexesNoOptimize which
if we're using ConcurrentMergeScheduler shouldn't happen?
Meaning in this proposed solution, if segments keep on piling up
in RAMIndex, we simply move them over to the disk IW which will
in the background take care of merging them away and to disk.
I don't think it's necessary to immediately write ram segments
to disk (like the current patch does), instead it's possible to
simply copy segments over from the incoming IW, leave them in
RAM and they can be merged to disk as necessary? Then on
IW.flush any segmentinfo(s) that are not from the current
directory can be flushed to disk?
Just thinking out loud about this.

I agree: it should be fine from IndexWriter's standpoint if some
segments live in a private RAMDir and others live in the "real" dir.

In fact, early versions of LUCENE-843 did exactly this: IW's RAM
buffer is not as efficient as a written segment, and so you can gain
some RAM efficiency by flushing first to RAM and then merging to disk.

I think we could adopt a simple criteria: you flush the new segment to
the RAM Dir if net RAM used is < maxRamBufferSizeMB. This way no
further configuration is needed. On auto-flush triggering you then
must take into account the RAM usage by this RAM Dir. On commit,
these RAM segments must be migrated to the real dir (preferably by
forcing a merge, somehow).

A near realtime reader would also happily mix "real" Dir and RAMDir
SegmentReaders.

This should work well I think, and should not require a separate
RAMIndex class, and won't block things when the RAM segments are
migrated to disk by CMS.

Michael McCandless
added a comment - 24/Apr/09 20:03
I don't think it's necessary to immediately write ram segments to disk
I agree: it should be fine from IndexWriter's standpoint if some
segments live in a private RAMDir and others live in the "real" dir.
In fact, early versions of LUCENE-843 did exactly this: IW's RAM
buffer is not as efficient as a written segment, and so you can gain
some RAM efficiency by flushing first to RAM and then merging to disk.
I think we could adopt a simple criteria: you flush the new segment to
the RAM Dir if net RAM used is < maxRamBufferSizeMB. This way no
further configuration is needed. On auto-flush triggering you then
must take into account the RAM usage by this RAM Dir. On commit,
these RAM segments must be migrated to the real dir (preferably by
forcing a merge, somehow).
A near realtime reader would also happily mix "real" Dir and RAMDir
SegmentReaders.
This should work well I think, and should not require a separate
RAMIndex class, and won't block things when the RAM segments are
migrated to disk by CMS.

I think we could adopt a simple criteria: you flush the
new segment to the RAM Dir if net RAM used is <
maxRamBufferSizeMB. This way no further configuration is needed.
On auto-flush triggering you then must take into account the RAM
usage by this RAM Dir.

So we're ok with the blocking that occurs when the ram buffer is
flushed to the ramdir?

On commit, these RAM segments must be migrated to the
real dir (preferably by forcing a merge, somehow).

This is pretty much like resolveExternalSegments which would be
called in prepareCommit? This could make calls to commit much
more time consuming. It may be confusing to the user why
IW.flush doesn't copy the ram segments to disk.

A near realtime reader would also happily mix "real" Dir
and RAMDir SegmentReaders.

Agreed, however the IW.getReader MultiSegmentReader removes
readers from another directory so we'd need to add a new
attribute to segmentinfo that marks it as ok for inclusion in
the MSR?

Jason Rutherglen
added a comment - 24/Apr/09 21:52 I think we could adopt a simple criteria: you flush the
new segment to the RAM Dir if net RAM used is <
maxRamBufferSizeMB. This way no further configuration is needed.
On auto-flush triggering you then must take into account the RAM
usage by this RAM Dir.
So we're ok with the blocking that occurs when the ram buffer is
flushed to the ramdir?
On commit, these RAM segments must be migrated to the
real dir (preferably by forcing a merge, somehow).
This is pretty much like resolveExternalSegments which would be
called in prepareCommit? This could make calls to commit much
more time consuming. It may be confusing to the user why
IW.flush doesn't copy the ram segments to disk.
A near realtime reader would also happily mix "real" Dir
and RAMDir SegmentReaders.
Agreed, however the IW.getReader MultiSegmentReader removes
readers from another directory so we'd need to add a new
attribute to segmentinfo that marks it as ok for inclusion in
the MSR?

So we're ok with the blocking that occurs when the ram buffer is
flushed to the ramdir?

Well... we don't have a choice (unless/until we implement IndexReader impl to directly search the RAM buffer). Still, this should be a good improvement over the blocking when flushing to a real dir.

This is pretty much like resolveExternalSegments which would be
called in prepareCommit? This could make calls to commit much
more time consuming. It may be confusing to the user why
IW.flush doesn't copy the ram segments to disk.

Similar... the difference is I'd prefer to do a merge of the RAM segments vs the straight one-for-one copy that resolveExternalSegments does.

commit would only become more time consuming in the NRT case? IE we'd only flush-to-RAMdir if it's getReader that's forcing the flush? In which case, I think it's fine that commit gets more costly. Also, I wouldn't expect it to be much more costly: we are doing an in-memory merge of N segments, writing one segment to the "real" directory. Vs writing each tiny segment as a real one. In fact, commit could get cheaper (when compared to not making this change) since there are fewer new files to fsync.

Agreed, however the IW.getReader MultiSegmentReader removes
readers from another directory so we'd need to add a new
attribute to segmentinfo that marks it as ok for inclusion in
the MSR?

Michael McCandless
added a comment - 24/Apr/09 22:13
So we're ok with the blocking that occurs when the ram buffer is
flushed to the ramdir?
Well... we don't have a choice (unless/until we implement IndexReader impl to directly search the RAM buffer). Still, this should be a good improvement over the blocking when flushing to a real dir.
This is pretty much like resolveExternalSegments which would be
called in prepareCommit? This could make calls to commit much
more time consuming. It may be confusing to the user why
IW.flush doesn't copy the ram segments to disk.
Similar... the difference is I'd prefer to do a merge of the RAM segments vs the straight one-for-one copy that resolveExternalSegments does.
commit would only become more time consuming in the NRT case? IE we'd only flush-to-RAMdir if it's getReader that's forcing the flush? In which case, I think it's fine that commit gets more costly. Also, I wouldn't expect it to be much more costly: we are doing an in-memory merge of N segments, writing one segment to the "real" directory. Vs writing each tiny segment as a real one. In fact, commit could get cheaper (when compared to not making this change) since there are fewer new files to fsync.
Agreed, however the IW.getReader MultiSegmentReader removes
readers from another directory so we'd need to add a new
attribute to segmentinfo that marks it as ok for inclusion in
the MSR?
Or, fix that filtering to also accept IndexWriter's RAMDir.

I'm confused as to how we make DocumentsWriter switch from
writing to disk vs the ramdir? It seems like a fairly major
change to the system? One that's hard to switch later on after
IW is instantiated? Perhaps the IW.addWriter method is easier in
this regard?

the difference is I'd prefer to do a merge of the RAM
segments vs the straight one-for-one copy that
resolveExternalSegments does.

Yeah I implemented it this way in the IW.addWriter code. I agree
it's better for IW.commit to copy all the ramdir segments to one
disk segment.

I started working on the IW.addWriter(IndexWriter, boolean
removeFrom) where removeFrom removes the segments that have been
copied to the destination writer from the source writer. This
method gets around the issue of blocking because potentially
several writers could concurrently be copied to the destination
writer. The only issue at this point is how the destination
writer obtains segmentreaders from source readers when they're
in the other writers' pool? Maybe the SegmentInfo can have a
reference to the writer it originated in? That way we can easily
access the right reader pool when we need it?

Jason Rutherglen
added a comment - 24/Apr/09 22:48 I'm confused as to how we make DocumentsWriter switch from
writing to disk vs the ramdir? It seems like a fairly major
change to the system? One that's hard to switch later on after
IW is instantiated? Perhaps the IW.addWriter method is easier in
this regard?
the difference is I'd prefer to do a merge of the RAM
segments vs the straight one-for-one copy that
resolveExternalSegments does.
Yeah I implemented it this way in the IW.addWriter code. I agree
it's better for IW.commit to copy all the ramdir segments to one
disk segment.
I started working on the IW.addWriter(IndexWriter, boolean
removeFrom) where removeFrom removes the segments that have been
copied to the destination writer from the source writer. This
method gets around the issue of blocking because potentially
several writers could concurrently be copied to the destination
writer. The only issue at this point is how the destination
writer obtains segmentreaders from source readers when they're
in the other writers' pool? Maybe the SegmentInfo can have a
reference to the writer it originated in? That way we can easily
access the right reader pool when we need it?

I'm confused as to how we make DocumentsWriter switch from
writing to disk vs the ramdir? It seems like a fairly major
change to the system? One that's hard to switch later on after
IW is instantiated? Perhaps the IW.addWriter method is easier in
this regard?

When we create SegmentWriteState (which is supposed to contain all
details needed to tell DW how/where to write the segment), we'd set
its directory to the RAMDir? That ought to be all that's needed
(though, it's possible some places use a private copy of the original
directory, which we should fix). DW should care less which Directory
the segment is written to...

> the difference is I'd prefer to do a merge of the RAM
> segments vs the straight one-for-one copy that
> resolveExternalSegments does.

Yeah I implemented it this way in the IW.addWriter code. I agree
it's better for IW.commit to copy all the ramdir segments to one
disk segment.

OK. Maybe we modify resolveExternalSegments to accept a "doMerge"?

I started working on the IW.addWriter(IndexWriter, boolean
removeFrom) where removeFrom removes the segments that have been
copied to the destination writer from the source writer. This
method gets around the issue of blocking because potentially
several writers could concurrently be copied to the destination
writer. The only issue at this point is how the destination
writer obtains segmentreaders from source readers when they're
in the other writers' pool? Maybe the SegmentInfo can have a
reference to the writer it originated in? That way we can easily
access the right reader pool when we need it?

I don't think we need two writers? I think one writer, sometimes
flushing to RAMDir, is a clean solution?

Michael McCandless
added a comment - 25/Apr/09 10:02
I'm confused as to how we make DocumentsWriter switch from
writing to disk vs the ramdir? It seems like a fairly major
change to the system? One that's hard to switch later on after
IW is instantiated? Perhaps the IW.addWriter method is easier in
this regard?
When we create SegmentWriteState (which is supposed to contain all
details needed to tell DW how/where to write the segment), we'd set
its directory to the RAMDir? That ought to be all that's needed
(though, it's possible some places use a private copy of the original
directory, which we should fix). DW should care less which Directory
the segment is written to...
> the difference is I'd prefer to do a merge of the RAM
> segments vs the straight one-for-one copy that
> resolveExternalSegments does.
Yeah I implemented it this way in the IW.addWriter code. I agree
it's better for IW.commit to copy all the ramdir segments to one
disk segment.
OK. Maybe we modify resolveExternalSegments to accept a "doMerge"?
I started working on the IW.addWriter(IndexWriter, boolean
removeFrom) where removeFrom removes the segments that have been
copied to the destination writer from the source writer. This
method gets around the issue of blocking because potentially
several writers could concurrently be copied to the destination
writer. The only issue at this point is how the destination
writer obtains segmentreaders from source readers when they're
in the other writers' pool? Maybe the SegmentInfo can have a
reference to the writer it originated in? That way we can easily
access the right reader pool when we need it?
I don't think we need two writers? I think one writer, sometimes
flushing to RAMDir, is a clean solution?

One separate optimization we should make with NRT is to not close the doc store (stored fields, term vector) files when flushing for an NRT reader.

We do close them now, which then makes merging quite a bit more costly.

The trickiness with this optimization is we'd need to be able to somehow share an IndexInput & IndexOutput; or, perhaps we can open an IndexInput even though an IndexOutput has the same file open (Windows may prevent this, though I think I've seen that it will in fact allow it).

Once we do that optimization, then with this RAMDir optimization we should try to have the doc store files punch straight through to the real directory, ie bypass the RAMDir. The doc stores are space consuming, and since with autoCommit=false we can bypass merging them, it makes no sense to store them in the RAMDir.

We should probably do this optimization as a "phase 2", after this one.

Michael McCandless
added a comment - 25/Apr/09 11:39 One separate optimization we should make with NRT is to not close the doc store (stored fields, term vector) files when flushing for an NRT reader.
We do close them now, which then makes merging quite a bit more costly.
The trickiness with this optimization is we'd need to be able to somehow share an IndexInput & IndexOutput; or, perhaps we can open an IndexInput even though an IndexOutput has the same file open (Windows may prevent this, though I think I've seen that it will in fact allow it).
Once we do that optimization, then with this RAMDir optimization we should try to have the doc store files punch straight through to the real directory, ie bypass the RAMDir. The doc stores are space consuming, and since with autoCommit=false we can bypass merging them, it makes no sense to store them in the RAMDir.
We should probably do this optimization as a "phase 2", after this one.

When we create SegmentWriteState (which is supposed to
contain all details needed to tell DW how/where to write the
segment), we'd set its directory to the RAMDir? That ought to be
all that's needed (though, it's possible some places use a
private copy of the original directory, which we should fix). DW
should care less which Directory the segment is written to...

Agreed that DW can write the segment to the RAMDir. I started
coding along these lines however what do we do about the RAMDir
merging? This is why I was thinking we'll need a separate IW?
Otherwise the ram segments (if they are treated the same as disk
segments) would quickly be merged to disk? Or we have two
separate merging paths?

If we have a disk IW and ram IW, I'm not sure how the docstores
to disk part would work though I'm sure there's some way to do
it.

Jason Rutherglen
added a comment - 27/Apr/09 19:37 When we create SegmentWriteState (which is supposed to
contain all details needed to tell DW how/where to write the
segment), we'd set its directory to the RAMDir? That ought to be
all that's needed (though, it's possible some places use a
private copy of the original directory, which we should fix). DW
should care less which Directory the segment is written to...
Agreed that DW can write the segment to the RAMDir. I started
coding along these lines however what do we do about the RAMDir
merging? This is why I was thinking we'll need a separate IW?
Otherwise the ram segments (if they are treated the same as disk
segments) would quickly be merged to disk? Or we have two
separate merging paths?
If we have a disk IW and ram IW, I'm not sure how the docstores
to disk part would work though I'm sure there's some way to do
it.
modify resolveExternalSegments to accept a "doMerge"?
Sounds good.

we should make with NRT is to not close the doc store
(stored fields, term vector) files when flushing for an NRT
reader.

Agreed, I think this feature is a must otherwise we're doing
unnecessary in ram merging.

we'd need to be able to somehow share an IndexInput &
IndexOutput; or, perhaps we can open an IndexInput even though
an IndexOutput

I ran into problems with this before, I was trying to reuse
Directory to write a transaction log. It seemed theoretically
doable however it didn't work in practice. It could have been
the seeking and replacing but I don't remember. FSIndexOutput
uses a writeable RAF and FSIndexInput is read only why would
there be an issue?

Jason Rutherglen
added a comment - 27/Apr/09 19:50 we should make with NRT is to not close the doc store
(stored fields, term vector) files when flushing for an NRT
reader.
Agreed, I think this feature is a must otherwise we're doing
unnecessary in ram merging.
we'd need to be able to somehow share an IndexInput &
IndexOutput; or, perhaps we can open an IndexInput even though
an IndexOutput
I ran into problems with this before, I was trying to reuse
Directory to write a transaction log. It seemed theoretically
doable however it didn't work in practice. It could have been
the seeking and replacing but I don't remember. FSIndexOutput
uses a writeable RAF and FSIndexInput is read only why would
there be an issue?

To implement this functionality in parallel (and perhaps make
the overall patch cleaner), writing doc stores directly to a
separate directory can be a different patch? There can be an
option IW.setDocStoresDirectory(Directory) that the patch
implements? Then some unit tests that are separate from the near
realtime portion.

Jason Rutherglen
added a comment - 27/Apr/09 19:59 doc store files punch straight through to the real
directory
To implement this functionality in parallel (and perhaps make
the overall patch cleaner), writing doc stores directly to a
separate directory can be a different patch? There can be an
option IW.setDocStoresDirectory(Directory) that the patch
implements? Then some unit tests that are separate from the near
realtime portion.

Agreed that DW can write the segment to the RAMDir. I started
coding along these lines however what do we do about the RAMDir
merging? This is why I was thinking we'll need a separate IW?
Otherwise the ram segments (if they are treated the same as disk
segments) would quickly be merged to disk? Or we have two
separate merging paths?

Hmm, right. We could exclude RAMDir segments from consideration by
MergePolicy? Alternatively, we could "expect" the MergePolicy to
recognize this and be smart about choosing merges (ie don't mix
merges)?

EG we do in fact want some merging of the RAM segments if they get too
numerous (since that will impact search performance).

> we should make with NRT is to not close the doc store
> (stored fields, term vector) files when flushing for an NRT
> reader.

Agreed, I think this feature is a must otherwise we're doing
unnecessary in ram merging.

OK, let's do this as a separate issue/optimization for NRT. There are
two separate parts to it:

Ability to store doc stores in "real" directory (looks like you
opened LUCENE-1618 for this part).

Ability to "share" IndexOutput & IndexInput

I ran into problems with this before, I was trying to reuse
Directory to write a transaction log. It seemed theoretically
doable however it didn't work in practice. It could have been
the seeking and replacing but I don't remember. FSIndexOutput
uses a writeable RAF and FSIndexInput is read only why would
there be an issue?

Hmm... seems like we need to investigate further. We could either
"ask" an IndexOutput for its IndexInput (sharing the underlying RAF),
or try to separately open an IndexInput (which may not work on
Windows).

To implement this functionality in parallel (and perhaps make
the overall patch cleaner), writing doc stores directly to a
separate directory can be a different patch? There can be an
option IW.setDocStoresDirectory(Directory) that the patch
implements? Then some unit tests that are separate from the near
realtime portion.

Michael McCandless
added a comment - 27/Apr/09 20:56
Agreed that DW can write the segment to the RAMDir. I started
coding along these lines however what do we do about the RAMDir
merging? This is why I was thinking we'll need a separate IW?
Otherwise the ram segments (if they are treated the same as disk
segments) would quickly be merged to disk? Or we have two
separate merging paths?
Hmm, right. We could exclude RAMDir segments from consideration by
MergePolicy? Alternatively, we could "expect" the MergePolicy to
recognize this and be smart about choosing merges (ie don't mix
merges)?
EG we do in fact want some merging of the RAM segments if they get too
numerous (since that will impact search performance).
> we should make with NRT is to not close the doc store
> (stored fields, term vector) files when flushing for an NRT
> reader.
Agreed, I think this feature is a must otherwise we're doing
unnecessary in ram merging.
OK, let's do this as a separate issue/optimization for NRT. There are
two separate parts to it:
Ability to store doc stores in "real" directory (looks like you
opened LUCENE-1618 for this part).
Ability to "share" IndexOutput & IndexInput
I ran into problems with this before, I was trying to reuse
Directory to write a transaction log. It seemed theoretically
doable however it didn't work in practice. It could have been
the seeking and replacing but I don't remember. FSIndexOutput
uses a writeable RAF and FSIndexInput is read only why would
there be an issue?
Hmm... seems like we need to investigate further. We could either
"ask" an IndexOutput for its IndexInput (sharing the underlying RAF),
or try to separately open an IndexInput (which may not work on
Windows).
To implement this functionality in parallel (and perhaps make
the overall patch cleaner), writing doc stores directly to a
separate directory can be a different patch? There can be an
option IW.setDocStoresDirectory(Directory) that the patch
implements? Then some unit tests that are separate from the near
realtime portion.
Yes, separate issue ( LUCENE-1618 ).

We could exclude RAMDir segments from consideration by
MergePolicy? Alternatively, we could "expect" the MergePolicy to
recognize this and be smart about choosing merges (ie don't mix
merges)?

Is this over complicating things? Sometimes we want a mixture of
RAMDir segments and FSDir segments to merge (when we've decided
we have too much in ram), sometimes we don't (when we just want
the ram segments to merge). I'm still a little confused as to
why having a wrapper class that manages a disk writer and a ram
writer isn't cleaner?

Jason Rutherglen
added a comment - 27/Apr/09 21:20 We could exclude RAMDir segments from consideration by
MergePolicy? Alternatively, we could "expect" the MergePolicy to
recognize this and be smart about choosing merges (ie don't mix
merges)?
Is this over complicating things? Sometimes we want a mixture of
RAMDir segments and FSDir segments to merge (when we've decided
we have too much in ram), sometimes we don't (when we just want
the ram segments to merge). I'm still a little confused as to
why having a wrapper class that manages a disk writer and a ram
writer isn't cleaner?

Michael McCandless
added a comment - 27/Apr/09 22:16
Sometimes we want a mixture of
RAMDir segments and FSDir segments to merge (when we've decided
we have too much in ram),
I don't think we want to mix RAM & disk merging?
EG when RAM is full, we want to quickly flush it to disk as a single
segment. Merging with disk segments only makes that flush slower?
I'm still a little confused as to
why having a wrapper class that manages a disk writer and a ram
writer isn't cleaner?
This is functionally the same as not mixing RAM vs disk merging,
right (ie just as "clean")?

Yonik raised a good question on LUCENE-1618, which is what gains do we really expect to see by using RAMDir for the tiny recently flushed segments?

It would be nice if we could approximately measure this before putting more work into this issue – if the gains are not "decent" this optimization may not be worthwhile.

Of course, we are talking about 100s of milliseconds for the turnaround time to add docs & open an NRT reader, so if the time for writing/opening many tiny files in RAMDir vs FSDir differs by say 10s of msecs then we should pursue this. We should also consider that the IO system may very well be quite busy (doing merge(s), backups, etc.) and that'd make it slower to have to create tiny files.

A simpler optimization might be to allow using CFS for tiny files (even when CFS is turned off), but built the CFS in RAM (ie, write tiny files first to RAMFiles, then make the CFS file on disk). That might get most of the gains since the FSDir sees only one file created per tiny segment, not N.

Michael McCandless
added a comment - 28/Apr/09 15:46 Yonik raised a good question on LUCENE-1618 , which is what gains do we really expect to see by using RAMDir for the tiny recently flushed segments?
It would be nice if we could approximately measure this before putting more work into this issue – if the gains are not "decent" this optimization may not be worthwhile.
Of course, we are talking about 100s of milliseconds for the turnaround time to add docs & open an NRT reader, so if the time for writing/opening many tiny files in RAMDir vs FSDir differs by say 10s of msecs then we should pursue this. We should also consider that the IO system may very well be quite busy (doing merge(s), backups, etc.) and that'd make it slower to have to create tiny files.
A simpler optimization might be to allow using CFS for tiny files (even when CFS is turned off), but built the CFS in RAM (ie, write tiny files first to RAMFiles, then make the CFS file on disk). That might get most of the gains since the FSDir sees only one file created per tiny segment, not N.

Yonik raised a good question on LUCENE-1618, which is what gains do we really expect to see by using RAMDir for the tiny recently flushed segments?

I raised it more because of the direction the discussion was veering (write through caching to a RAMDirectory, and RAMDirectory being faster to search). I do believe that RAMDirectory can probably improve NRT, but it would be due to avoiding waiting for file open/write/close/open/read (as Mike also said)... and not any difference during IndexSearcher.search(), which should be irrelevant due to the relative size differences of the RAMDirectory and the FSDirectory. Small file creation speeds will also be heavily dependent on the exact file system used.

Yonik Seeley
added a comment - 28/Apr/09 16:02 Yonik raised a good question on LUCENE-1618 , which is what gains do we really expect to see by using RAMDir for the tiny recently flushed segments?
I raised it more because of the direction the discussion was veering (write through caching to a RAMDirectory, and RAMDirectory being faster to search ). I do believe that RAMDirectory can probably improve NRT, but it would be due to avoiding waiting for file open/write/close/open/read (as Mike also said)... and not any difference during IndexSearcher.search(), which should be irrelevant due to the relative size differences of the RAMDirectory and the FSDirectory. Small file creation speeds will also be heavily dependent on the exact file system used.

EG when RAM is full, we want to quickly flush it to disk
as a single segment. Merging with disk segments only makes that
flush slower?

I assume it's ok for the IW.mergescheduler to be used which may
not immediately perform the merge to disk (in the case of
ConcurrentMergeScheduler)? When implementing using
addIndexesNoOptimize (which blocks) I realized we probably don't
want blocking to occur because that means shutting down the
updates.

Also a random thought, it seems like ConcurrentMergeScheduler
works great for RAMDir merging, how does it compare with
SerialMS on an FSDirectory? It seems like it shouldn'y be too much
faster given the IO sequential access bottleneck?

Jason Rutherglen
added a comment - 28/Apr/09 22:05 EG when RAM is full, we want to quickly flush it to disk
as a single segment. Merging with disk segments only makes that
flush slower?
I assume it's ok for the IW.mergescheduler to be used which may
not immediately perform the merge to disk (in the case of
ConcurrentMergeScheduler)? When implementing using
addIndexesNoOptimize (which blocks) I realized we probably don't
want blocking to occur because that means shutting down the
updates.
Also a random thought, it seems like ConcurrentMergeScheduler
works great for RAMDir merging, how does it compare with
SerialMS on an FSDirectory? It seems like it shouldn'y be too much
faster given the IO sequential access bottleneck?

I assume it's ok for the IW.mergescheduler to be used which may
not immediately perform the merge to disk (in the case of
ConcurrentMergeScheduler)?

Only if we "accept" requiring MergePolicy to be aware that some
segments are in RAMDir and some are in the "real" Dir and to "act
accordingly", ie 1) don't mix the dirs when merging, 2) when RAM is
"full" merge every single RAM segment into a single "real Dir" segment
(requires IW to provide exposure on how much RAM DW's buffer is
currently consuming), 3) properly "maintain" the RAM segments (ie,
merge RAM -> RAM somehow) so that searchers don't search too many RAM
segments.

I think this approach is probably best: you're right that allowing CMS
to manage these RAM segments is nice since it'll happen in the BG and
will not block updates.

It does mean, though, that the RAM usage semantics of IW is no longer
so "crisp" as flushing today ("once RAM is full, stop world & flush it
to disk, then resume") but I think that's acceptable and perhaps
preferable since world is no longer stopped to flush RAM -> disk.

Though one trickiness is... if a large RAM -> RAM merge takes place,
we temporarily double the RAM consumption. I think MergePolicy simply
shouldn't do that. Ie at not point should it be merging a very large
%tg of the RAM segments. It should instead merge RAM -> disk.

This'd also mean advanced users that implement their own MergePolicy
must realize when IW is used with NRT reader that additional smarts is
recommended wrt

When implementing using
addIndexesNoOptimize (which blocks) I realized we probably don't
want blocking to occur because that means shutting down the
updates.

Right, this is one of the strong reasons to do the "internal" approach
vs "external" one.

Also a random thought, it seems like ConcurrentMergeScheduler
works great for RAMDir merging, how does it compare with
SerialMS on an FSDirectory? It seems like it shouldn'y be too much
faster given the IO sequential access bottleneck?

By far the biggest win of CMS over SMS is in the first merge, because
it does not block the further addition of docs. Thus an app can
continue indexing into RAM buffer (consuming CPU & RAM resources)
while a BG thread consumes RAM + IO resources. This is very much a
win.

Beyond the first merge...in theory, modern IO systems have concurrency
(eg the NCQ in a single SATA drive) so you should "gain" by having
several threads performing IO at once. The OS & hard drives attempt
to re-order the request in a more optimal way (like an elevator,
sweeping floors). I haven't explictly tested this with Lucene...

I believe SSDs handle concurrent requests very well since under the
hood most of them are multi-channel basically RAID0 devices (eg Intel
X25M has 10 channels).

Michael McCandless
added a comment - 29/Apr/09 10:03
I assume it's ok for the IW.mergescheduler to be used which may
not immediately perform the merge to disk (in the case of
ConcurrentMergeScheduler)?
Only if we "accept" requiring MergePolicy to be aware that some
segments are in RAMDir and some are in the "real" Dir and to "act
accordingly", ie 1) don't mix the dirs when merging, 2) when RAM is
"full" merge every single RAM segment into a single "real Dir" segment
(requires IW to provide exposure on how much RAM DW's buffer is
currently consuming), 3) properly "maintain" the RAM segments (ie,
merge RAM -> RAM somehow) so that searchers don't search too many RAM
segments.
I think this approach is probably best: you're right that allowing CMS
to manage these RAM segments is nice since it'll happen in the BG and
will not block updates.
It does mean, though, that the RAM usage semantics of IW is no longer
so "crisp" as flushing today ("once RAM is full, stop world & flush it
to disk, then resume") but I think that's acceptable and perhaps
preferable since world is no longer stopped to flush RAM -> disk.
Though one trickiness is... if a large RAM -> RAM merge takes place,
we temporarily double the RAM consumption. I think MergePolicy simply
shouldn't do that. Ie at not point should it be merging a very large
%tg of the RAM segments. It should instead merge RAM -> disk.
This'd also mean advanced users that implement their own MergePolicy
must realize when IW is used with NRT reader that additional smarts is
recommended wrt
When implementing using
addIndexesNoOptimize (which blocks) I realized we probably don't
want blocking to occur because that means shutting down the
updates.
Right, this is one of the strong reasons to do the "internal" approach
vs "external" one.
Also a random thought, it seems like ConcurrentMergeScheduler
works great for RAMDir merging, how does it compare with
SerialMS on an FSDirectory? It seems like it shouldn'y be too much
faster given the IO sequential access bottleneck?
By far the biggest win of CMS over SMS is in the first merge, because
it does not block the further addition of docs. Thus an app can
continue indexing into RAM buffer (consuming CPU & RAM resources)
while a BG thread consumes RAM + IO resources. This is very much a
win.
Beyond the first merge...in theory, modern IO systems have concurrency
(eg the NCQ in a single SATA drive) so you should "gain" by having
several threads performing IO at once. The OS & hard drives attempt
to re-order the request in a more optimal way (like an elevator,
sweeping floors). I haven't explictly tested this with Lucene...
I believe SSDs handle concurrent requests very well since under the
hood most of them are multi-channel basically RAID0 devices (eg Intel
X25M has 10 channels).

A merge policy may be more optimal for ram segments vs disk
segments. Perhaps the best way to make this clean is to keep the
ram merge policy and real dir merge policies different? That way
we don't merge policy implementations don't need to worry about
ram and non-ram dir cases.

Perhaps an IW.updatePendingRamMerges method should be added that
handles this separately? Does the ram dir ever need to worry
about things like maxNumSegmentsOptimize and optimize?

Right, this is one of the strong reasons to do the
"internal" approach vs "external" one.

I think having the ram merge policy should cover the reasons I
had for having a separate ram writer. Although the IW.addWriter
method I implemented would not have blocked, but I don't think
it's necessary now if we have a separate ram merge policy.

Jason Rutherglen
added a comment - 29/Apr/09 21:02 - edited A merge policy may be more optimal for ram segments vs disk
segments. Perhaps the best way to make this clean is to keep the
ram merge policy and real dir merge policies different? That way
we don't merge policy implementations don't need to worry about
ram and non-ram dir cases.
Perhaps an IW.updatePendingRamMerges method should be added that
handles this separately? Does the ram dir ever need to worry
about things like maxNumSegmentsOptimize and optimize?
Right, this is one of the strong reasons to do the
"internal" approach vs "external" one.
I think having the ram merge policy should cover the reasons I
had for having a separate ram writer. Although the IW.addWriter
method I implemented would not have blocked, but I don't think
it's necessary now if we have a separate ram merge policy.

Perhaps the best way to make this clean is to keep the
ram merge policy and real dir merge policies different? That way
we don't merge policy implementations don't need to worry about
ram and non-ram dir cases.

OK tentatively this feels like a good approach. Would you re-use
MergePolicy, or make a new RAMMergePolicy?

Would we use the same MergeScheduler to then execute the selected
merges?

How would we handle the "it's time to flush some RAM to disk" case?
Would RAMMergePolicy make that decision?

Perhaps an IW.updatePendingRamMerges method should be added that handles this separately?

Yes?

Does the ram dir ever need to worry about things like maxNumSegmentsOptimize and optimize?

No?

I think having the ram merge policy should cover the reasons I
had for having a separate ram writer. Although the IW.addWriter
method I implemented would not have blocked, but I don't think
it's necessary now if we have a separate ram merge policy.

Michael McCandless
added a comment - 30/Apr/09 13:17
Perhaps the best way to make this clean is to keep the
ram merge policy and real dir merge policies different? That way
we don't merge policy implementations don't need to worry about
ram and non-ram dir cases.
OK tentatively this feels like a good approach. Would you re-use
MergePolicy, or make a new RAMMergePolicy?
Would we use the same MergeScheduler to then execute the selected
merges?
How would we handle the "it's time to flush some RAM to disk" case?
Would RAMMergePolicy make that decision?
Perhaps an IW.updatePendingRamMerges method should be added that handles this separately?
Yes?
Does the ram dir ever need to worry about things like maxNumSegmentsOptimize and optimize?
No?
I think having the ram merge policy should cover the reasons I
had for having a separate ram writer. Although the IW.addWriter
method I implemented would not have blocked, but I don't think
it's necessary now if we have a separate ram merge policy.
OK good.

MergePolicy is used as is with a special IW method that handles
merging ram segments for the real directory (which has an issue
around merging contiguous segments, can that be relaxed in this
case as I don't understand why this is?)

The patch is not committable, however I am posting it to show a
path that seems to work. It includes test cases for merging in
ram and merging to the real directory.

IW.getFlushDirectory is used by internal calls to obtain the
directory to flush segments to. This is used in DocumentsWriter
related calls.

DocumentsWriter.directory is removed so that methods requiring
the directory call IW.getFlushDirectory instead.

IW.setRAMDirectory sets the ram directory to be used.

IW.setRAMMergePolicy sets the merge policy to be used for
merging segments on the ram dir.

In IW.updatePendingMerges totalRamUsed is the size of the ram
segments + the ram buffer used. If totalRamUsed exceeds the max
ram buffer size then IW. updatePendingRamMergesToRealDir is
called.

IW. updatePendingRamMergesToRealDir registers a merge of the
ram segments to the real directory (currently causes a
non-contiguous segments exception)

MergePolicy.OneMerge has a directory attribute used when
building the merge.info in _mergeInit.

Jason Rutherglen
added a comment - 30/Apr/09 20:25 Would you re-use MergePolicy, or make a new
RAMMergePolicy?
MergePolicy is used as is with a special IW method that handles
merging ram segments for the real directory (which has an issue
around merging contiguous segments, can that be relaxed in this
case as I don't understand why this is?)
The patch is not committable, however I am posting it to show a
path that seems to work. It includes test cases for merging in
ram and merging to the real directory.
IW.getFlushDirectory is used by internal calls to obtain the
directory to flush segments to. This is used in DocumentsWriter
related calls.
DocumentsWriter.directory is removed so that methods requiring
the directory call IW.getFlushDirectory instead.
IW.setRAMDirectory sets the ram directory to be used.
IW.setRAMMergePolicy sets the merge policy to be used for
merging segments on the ram dir.
In IW.updatePendingMerges totalRamUsed is the size of the ram
segments + the ram buffer used. If totalRamUsed exceeds the max
ram buffer size then IW. updatePendingRamMergesToRealDir is
called.
IW. updatePendingRamMergesToRealDir registers a merge of the
ram segments to the real directory (currently causes a
non-contiguous segments exception)
MergePolicy.OneMerge has a directory attribute used when
building the merge.info in _mergeInit.
Test case includes testMergeInRam, testMergeToDisk,
testMergeRamExceeded
There is one error that occurs regularly in testMergeRamExceeded
MergePolicy selected non-contiguous segments to merge
(_bo:cx83 _bm:cx4 _bn:cx2 _bl:cx1->_bj _bp:cx1->_bp _bq:cx1->_bp
_c2:cx1->_c2 _c3:cx1->_c2 _c4:cx1->_c2 vs _5x:c120 _6a:c8
_6t:c11 _bo:cx83** _bm:cx4** _bn:cx2** _bl:cx1->_bj**
_bp:cx1->_bp** _bq:cx1->_bp** _c1:c10 _c2:cx1->_c2**
_c3:cx1->_c2** _c4:cx1->_c2**), which IndexWriter (currently)
cannot handle

Ok, fixed the ensureContiguousMerge exception by asking the
mergePolicy (not ramMergePolicy) to evaluate the ram segment
infos as an optimize to directory. Now all the current tests
pass.

The patch is cleaned up a little, needs more, and further test
cases.

IndexWriter doesn't accept setRAMDirectory anymore, it needs to
be passed into the IndexWriter constructor. This because we
can't run the system and the ram dir is changed in the middle of
an operation.

Jason Rutherglen
added a comment - 30/Apr/09 21:59
Ok, fixed the ensureContiguousMerge exception by asking the
mergePolicy (not ramMergePolicy) to evaluate the ram segment
infos as an optimize to directory. Now all the current tests
pass.
The patch is cleaned up a little, needs more, and further test
cases.
IndexWriter doesn't accept setRAMDirectory anymore, it needs to
be passed into the IndexWriter constructor. This because we
can't run the system and the ram dir is changed in the middle of
an operation.

Jason Rutherglen
added a comment - 30/Apr/09 23:19 Fixed and cleaned up more.
All tests pass
Added entry in CHANGES.txt
I'm going to integrate LUCENE-1618 and test that out as a part of the next patch.

I don't think the caller should provide the RAMDir... and we
should have a getter to get it. I think this should be
under-the-hood. This is simply a nice way for IW to use RAM as
buffer in the presence of frequent NRT readers being opened.

If NRT is never used, the behavior of IW should be unchanged
(which is not the case w/ this patch I think). RAMDir should be
created the first time a flush is done due to NRT creation.

StoredFieldsWriter & TermVectorsTermsWriter now writes to
IndexWriter.getFlushDirectory(), which is confusing because that
method returns the RAMDir if set? Shouldn't this be the opposite?
(Ie it should flush to IndexWriter.getDirectory()? Or we should
change getFlushDiretory to NOT return the ramdir?)

Why did you need to add synchronized to some of the SegmentInfo
files methods? (What breaks if you undo that?). The contract
here is IW protects access to SegmentInfo/s.

The MergePolicy needs some smarts when it's dealing w/ RAM. EG it
should not do a merge of more than XXX% of total RAM usage (should
flush to the real directory instead).

Nothing is calling the new ramOverLimit?

Still some noise (MockRAMDir, DocFieldProcessorPerThread, some
changes in LogMergePolicy)

Michael McCandless
added a comment - 01/May/09 13:57 Patch looks good! Some comments:
I don't think the caller should provide the RAMDir... and we
should have a getter to get it. I think this should be
under-the-hood. This is simply a nice way for IW to use RAM as
buffer in the presence of frequent NRT readers being opened.
If NRT is never used, the behavior of IW should be unchanged
(which is not the case w/ this patch I think). RAMDir should be
created the first time a flush is done due to NRT creation.
StoredFieldsWriter & TermVectorsTermsWriter now writes to
IndexWriter.getFlushDirectory(), which is confusing because that
method returns the RAMDir if set? Shouldn't this be the opposite?
(Ie it should flush to IndexWriter.getDirectory()? Or we should
change getFlushDiretory to NOT return the ramdir?)
Why did you need to add synchronized to some of the SegmentInfo
files methods? (What breaks if you undo that?). The contract
here is IW protects access to SegmentInfo/s.
The MergePolicy needs some smarts when it's dealing w/ RAM. EG it
should not do a merge of more than XXX% of total RAM usage (should
flush to the real directory instead).
Nothing is calling the new ramOverLimit?
Still some noise (MockRAMDir, DocFieldProcessorPerThread, some
changes in LogMergePolicy)

IndexFileDeleter takes into account the ram directory (which
when using NRT with the FSD caused files to not be found).

FSD is included and writes fdx, fdt, tvx, tvf, tvd extension
files to the primary directory (which is the same as
IW.directory). LUCENE-1618 needs to be updated with these
changes (or we simply include it in this patch as theLUCENE-1618 patch is only a couple of files).

Removed DocumentsWriter.ramOverLimit

I think we need to give the option of a ram mergescheduler
because the user may want not want the ram merging and disk
merging to compete for threads. I'm thinking if of the use case
where NRT is a priority then one may allocate more threads to
the ram CMS and less to the disk CMS. This also gives us the
option of trying out more parameters when performing benchmarks
of NRT.

We may want to default the ram mergepolicy to not use compound
files as it's not useful when using a ram dir?

Because FSD uses IW.directory, FSD will list files that
originated from FSD and from IW.directory, we may want to keep
track of which files are supposed to be in FSD (from the
underlying primary dir) and which are not?

If NRT is never used, the behavior of IW should be
unchanged (which is not the case w/ this patch I think). RAMDir
should be created the first time a flush is done due to NRT
creation.

In the patch if ramdir is not passed in, the behavior of IW
remains the same as it is today. You're saying we should have IW
create the ramdir by default after getReader is called and
remove the IW ramdir constructor? What if the user has an
alternative ramdir implementation they want to use?

StoredFieldsWriter & TermVectorsTermsWriter now writes to
IndexWriter.getFlushDirectory(), which is confusing because that
method returns the RAMDir if set? Shouldn't this be the
opposite? (Ie it should flush to IndexWriter.getDirectory()? Or
we should change getFlushDiretory to NOT return the
ramdir?)

The attached patch uses FileSwitchDirectory, where these files
are written to the primary directory (IW.directory). So
getFlushDirectory is ok?

Why did you need to add synchronized to some of the
SegmentInfo files methods? (What breaks if you undo that?). The
contract here is IW protects access to SegmentInfo/s

SegmentInfo.files was being cleared while sizeInBytes was called
which resulted in an NPE. The alternative is sync IW in
IW.size(SegmentInfos) which seems a bit extreme just to obtain
the size of a segment info?

The MergePolicy needs some smarts when it's dealing w/
RAM. EG it should not do a merge of more than XXX% of total RAM
usage (should flush to the real directory instead)

Isn't this handled well enough in updatePendingMerges or is
there more that needs to be done?

Jason Rutherglen
added a comment - 01/May/09 18:38
IndexFileDeleter takes into account the ram directory (which
when using NRT with the FSD caused files to not be found).
FSD is included and writes fdx, fdt, tvx, tvf, tvd extension
files to the primary directory (which is the same as
IW.directory). LUCENE-1618 needs to be updated with these
changes (or we simply include it in this patch as the
LUCENE-1618 patch is only a couple of files).
Removed DocumentsWriter.ramOverLimit
I think we need to give the option of a ram mergescheduler
because the user may want not want the ram merging and disk
merging to compete for threads. I'm thinking if of the use case
where NRT is a priority then one may allocate more threads to
the ram CMS and less to the disk CMS. This also gives us the
option of trying out more parameters when performing benchmarks
of NRT.
We may want to default the ram mergepolicy to not use compound
files as it's not useful when using a ram dir?
Because FSD uses IW.directory, FSD will list files that
originated from FSD and from IW.directory, we may want to keep
track of which files are supposed to be in FSD (from the
underlying primary dir) and which are not?
If NRT is never used, the behavior of IW should be
unchanged (which is not the case w/ this patch I think). RAMDir
should be created the first time a flush is done due to NRT
creation.
In the patch if ramdir is not passed in, the behavior of IW
remains the same as it is today. You're saying we should have IW
create the ramdir by default after getReader is called and
remove the IW ramdir constructor? What if the user has an
alternative ramdir implementation they want to use?
StoredFieldsWriter & TermVectorsTermsWriter now writes to
IndexWriter.getFlushDirectory(), which is confusing because that
method returns the RAMDir if set? Shouldn't this be the
opposite? (Ie it should flush to IndexWriter.getDirectory()? Or
we should change getFlushDiretory to NOT return the
ramdir?)
The attached patch uses FileSwitchDirectory, where these files
are written to the primary directory (IW.directory). So
getFlushDirectory is ok?
Why did you need to add synchronized to some of the
SegmentInfo files methods? (What breaks if you undo that?). The
contract here is IW protects access to SegmentInfo/s
SegmentInfo.files was being cleared while sizeInBytes was called
which resulted in an NPE. The alternative is sync IW in
IW.size(SegmentInfos) which seems a bit extreme just to obtain
the size of a segment info?
The MergePolicy needs some smarts when it's dealing w/
RAM. EG it should not do a merge of more than XXX% of total RAM
usage (should flush to the real directory instead)
Isn't this handled well enough in updatePendingMerges or is
there more that needs to be done?

IndexFileDeleter takes into account the ram directory (which
when using NRT with the FSD caused files to not be found).

I don't like how "deep" the dichotomy of "RAMDir vs FSDir" is being
pushed. Why can't we push FSD down to all these places (IFD,
SegmentInfo/s, etc.)?

FSD is included and writes fdx, fdt, tvx, tvf, tvd extension
files to the primary directory (which is the same as
IW.directory). LUCENE-1618 needs to be updated with these
changes (or we simply include it in this patch as theLUCENE-1618 patch is only a couple of files).

Why did this require changes to FSD?

I think we need to give the option of a ram mergescheduler
because the user may want not want the ram merging and disk
merging to compete for threads. I'm thinking if of the use case
where NRT is a priority then one may allocate more threads to
the ram CMS and less to the disk CMS. This also gives us the
option of trying out more parameters when performing benchmarks
of NRT.

I think we're unlikely to gain from more than 1 BG thread for RAM
merging? But I agree it'd be horrible if CMS blocked RAM merging
because its allotted threads were tied up merging disk segments.
Could we simply make the single CMS instance smart enoguh to realize
that a single RAM merge is allowed to proceed regardless of the thread
limit?

We may want to default the ram mergepolicy to not use compound
files as it's not useful when using a ram dir?

I think actually hardwire it, not just default. Building CFS in RAM
makes no sense. Worse, if we allow one to choose to do it we then
have to fix FSD to understand CFX must go to the dir too, and, we'd
have to fix IW to not merge in the doc store files when building a
private CFS. Net/net I think we should not allow CFS for the RAM
segments.

On merging to disk it can then respect the user's CFS setting.

Because FSD uses IW.directory, FSD will list files that
originated from FSD and from IW.directory, we may want to keep
track of which files are supposed to be in FSD (from the
underlying primary dir) and which are not?

I don't understand what's wrong here?

> If NRT is never used, the behavior of IW should be
> unchanged (which is not the case w/ this patch I think). RAMDir
> should be created the first time a flush is done due to NRT
> creation.

In the patch if ramdir is not passed in, the behavior of IW
remains the same as it is today. You're saying we should have IW
create the ramdir by default after getReader is called and
remove the IW ramdir constructor?

Right. This should be "under the hood".

What if the user has an alternative ramdir implementation they want to
use?

I think I'd rather not open up that option just yet. This really is a
private optimization to how IW uses RAM. We may want to further
change/improve how RAM is used.

Way back when, IW used a RAMDir internally for buffering; then, withLUCENE-843 we switched to whole different format (DW's ram
buffering). Now we are adding back RAMDir for NRT; maybe we'll switch
its format at some point... or change NRT to directly search DW's
RAM... etc. How IW uses RAM is very much an internal detail so I'd
rather not expose it publically.

[BTW: once we have this machinery online, it's conceivable that we'd
want to flush to RAMDir even in the non-NRT case. EG, say DW's RAM
buffer is full and it's time to flush. If it flushes to RAM,
typically the RAMDir is far more compact than DW's RAM buffer and it
then still has some more space to work with, before having to flush to
disk. If we explore this it should be in a new issue (later)...]

> StoredFieldsWriter & TermVectorsTermsWriter now writes to
> IndexWriter.getFlushDirectory(), which is confusing because that
> method returns the RAMDir if set? Shouldn't this be the
> opposite? (Ie it should flush to IndexWriter.getDirectory()? Or
> we should change getFlushDiretory to NOT return the
> ramdir?)

The attached patch uses FileSwitchDirectory, where these files
are written to the primary directory (IW.directory). So
getFlushDirectory is ok?

OK, though I'd like to simply always use FSD, even if primary &
secondary are the same dir. All these if's checking for both dirs,
passing both dirs deep into Lucene's APIs, etc., are spooky.

> Why did you need to add synchronized to some of the
> SegmentInfo files methods? (What breaks if you undo that?). The
> contract here is IW protects access to SegmentInfo/s

SegmentInfo.files was being cleared while sizeInBytes was called
which resulted in an NPE. The alternative is sync IW in
IW.size(SegmentInfos) which seems a bit extreme just to obtain
the size of a segment info?

But... why did we have one thread asking for size while another was
tweaking the SegmentInfo? What leads to that? We need to better
understand the root cause here.

The size consumed by the RAM segments should be carefully computed
(called only in sychchronized(iw) context) and then shared. This value
changes relatively rarely (on flushing a new segment to ram; on
applying deletes that include RAM segments; on doing a ram->ram
merge), but is read frequently (per doc added, to decide whether it's
time to flush). I think the value should be pushed to DW whenever it
changes, via synchronized method in DW; and then the existing
synchronized logic in DW that decides if it's time to "flush after"
should consult that value. No further synchronizing should be
necessary.

Also, this ram size should be used not only for deciding when it's
time to merge to a disk segment, but also when it's time for DW to
flush a new segment (which I think your current patch is missing?).

> The MergePolicy needs some smarts when it's dealing w/ RAM. EG it
> should not do a merge of more than XXX% of total RAM usage (should
> flush to the real directory instead).

Isn't this handled well enough in updatePendingMerges or is
there more that needs to be done?

There is more that needs to be done, because MergePolicy must
conditionalize its logic based on RAM vs FS. Ie, if our RAM buffer is
32 MB, and there are say 31 MB of RAM segments that suddenly need
merging (becuase we just flushed the 10th RAM segment), we should not
do a RAM -> RAM merge at that point (because 31./32. = very high pctg
of our net RAM buffer). Instead we should force RAM -> disk at that
point, even though technically RAM is not yet full.

Ooh: maybe a better approach is to disallow the merge if the expected
peak RAM usage will exceed our buffer. I like this better. So if
budget is 32 MB, and net RAM used (segments + DW) is say 22, we have a
10 MB "budget", so we are allowed to select merges that total to < 10
MB.

Michael McCandless
added a comment - 02/May/09 10:51
IndexFileDeleter takes into account the ram directory (which
when using NRT with the FSD caused files to not be found).
I don't like how "deep" the dichotomy of "RAMDir vs FSDir" is being
pushed. Why can't we push FSD down to all these places (IFD,
SegmentInfo/s, etc.)?
FSD is included and writes fdx, fdt, tvx, tvf, tvd extension
files to the primary directory (which is the same as
IW.directory). LUCENE-1618 needs to be updated with these
changes (or we simply include it in this patch as the
LUCENE-1618 patch is only a couple of files).
Why did this require changes to FSD?
I think we need to give the option of a ram mergescheduler
because the user may want not want the ram merging and disk
merging to compete for threads. I'm thinking if of the use case
where NRT is a priority then one may allocate more threads to
the ram CMS and less to the disk CMS. This also gives us the
option of trying out more parameters when performing benchmarks
of NRT.
I think we're unlikely to gain from more than 1 BG thread for RAM
merging? But I agree it'd be horrible if CMS blocked RAM merging
because its allotted threads were tied up merging disk segments.
Could we simply make the single CMS instance smart enoguh to realize
that a single RAM merge is allowed to proceed regardless of the thread
limit?
We may want to default the ram mergepolicy to not use compound
files as it's not useful when using a ram dir?
I think actually hardwire it, not just default. Building CFS in RAM
makes no sense. Worse, if we allow one to choose to do it we then
have to fix FSD to understand CFX must go to the dir too, and, we'd
have to fix IW to not merge in the doc store files when building a
private CFS. Net/net I think we should not allow CFS for the RAM
segments.
On merging to disk it can then respect the user's CFS setting.
Because FSD uses IW.directory, FSD will list files that
originated from FSD and from IW.directory, we may want to keep
track of which files are supposed to be in FSD (from the
underlying primary dir) and which are not?
I don't understand what's wrong here?
> If NRT is never used, the behavior of IW should be
> unchanged (which is not the case w/ this patch I think). RAMDir
> should be created the first time a flush is done due to NRT
> creation.
In the patch if ramdir is not passed in, the behavior of IW
remains the same as it is today. You're saying we should have IW
create the ramdir by default after getReader is called and
remove the IW ramdir constructor?
Right. This should be "under the hood".
What if the user has an alternative ramdir implementation they want to
use?
I think I'd rather not open up that option just yet. This really is a
private optimization to how IW uses RAM. We may want to further
change/improve how RAM is used.
Way back when, IW used a RAMDir internally for buffering; then, with
LUCENE-843 we switched to whole different format (DW's ram
buffering). Now we are adding back RAMDir for NRT; maybe we'll switch
its format at some point... or change NRT to directly search DW's
RAM... etc. How IW uses RAM is very much an internal detail so I'd
rather not expose it publically.
[BTW: once we have this machinery online, it's conceivable that we'd
want to flush to RAMDir even in the non-NRT case. EG, say DW's RAM
buffer is full and it's time to flush. If it flushes to RAM,
typically the RAMDir is far more compact than DW's RAM buffer and it
then still has some more space to work with, before having to flush to
disk. If we explore this it should be in a new issue (later)...]
> StoredFieldsWriter & TermVectorsTermsWriter now writes to
> IndexWriter.getFlushDirectory(), which is confusing because that
> method returns the RAMDir if set? Shouldn't this be the
> opposite? (Ie it should flush to IndexWriter.getDirectory()? Or
> we should change getFlushDiretory to NOT return the
> ramdir?)
The attached patch uses FileSwitchDirectory, where these files
are written to the primary directory (IW.directory). So
getFlushDirectory is ok?
OK, though I'd like to simply always use FSD, even if primary &
secondary are the same dir. All these if's checking for both dirs,
passing both dirs deep into Lucene's APIs, etc., are spooky.
> Why did you need to add synchronized to some of the
> SegmentInfo files methods? (What breaks if you undo that?). The
> contract here is IW protects access to SegmentInfo/s
SegmentInfo.files was being cleared while sizeInBytes was called
which resulted in an NPE. The alternative is sync IW in
IW.size(SegmentInfos) which seems a bit extreme just to obtain
the size of a segment info?
But... why did we have one thread asking for size while another was
tweaking the SegmentInfo? What leads to that? We need to better
understand the root cause here.
The size consumed by the RAM segments should be carefully computed
(called only in sychchronized(iw) context) and then shared. This value
changes relatively rarely (on flushing a new segment to ram; on
applying deletes that include RAM segments; on doing a ram->ram
merge), but is read frequently (per doc added, to decide whether it's
time to flush). I think the value should be pushed to DW whenever it
changes, via synchronized method in DW; and then the existing
synchronized logic in DW that decides if it's time to "flush after"
should consult that value. No further synchronizing should be
necessary.
Also, this ram size should be used not only for deciding when it's
time to merge to a disk segment, but also when it's time for DW to
flush a new segment (which I think your current patch is missing?).
> The MergePolicy needs some smarts when it's dealing w/ RAM. EG it
> should not do a merge of more than XXX% of total RAM usage (should
> flush to the real directory instead).
Isn't this handled well enough in updatePendingMerges or is
there more that needs to be done?
There is more that needs to be done, because MergePolicy must
conditionalize its logic based on RAM vs FS. Ie, if our RAM buffer is
32 MB, and there are say 31 MB of RAM segments that suddenly need
merging (becuase we just flushed the 10th RAM segment), we should not
do a RAM -> RAM merge at that point (because 31./32. = very high pctg
of our net RAM buffer). Instead we should force RAM -> disk at that
point, even though technically RAM is not yet full.
Ooh: maybe a better approach is to disallow the merge if the expected
peak RAM usage will exceed our buffer. I like this better. So if
budget is 32 MB, and net RAM used (segments + DW) is say 22, we have a
10 MB "budget", so we are allowed to select merges that total to < 10
MB.

Agreed, it's a bit awkward but I don't see another way to do
this. The good thing is if IW has written some .fdt files to the
main dir (via FSD), IW crashes, then IW is created again, IFD
automatically deletes the extraneous .fdt (and other extension)
files.

Why can't we push FSD down to all these places (IFD,
SegmentInfo/s, etc.)?

Could we simply make the single CMS instance smart enough
to realize that a single RAM merge is allowed to proceed
regardless of the thread limit?

Hmm... I think for benchmarking it would be good to allow
options as we simply don't know. In the latest patch a ram
mergescheduler can be set to the IndexWriter.

have to fix FSD to understand CFX must go to the dir
too

I think this is fixed in the patch, where compound files are not
created in RAM.

You're saying we should have IW create the ramdir by default
after getReader is called and remove the IW ramdir constructor?
Right. This should be "under the hood".

Ok, this will require some reworking of the patch.

OK, though I'd like to simply always use FSD, even if
primary & secondary are the same dir.

How will always using FSD work? Doesn't it assume writing to two
different directories?

this ram size should be used not only for deciding when
it's time to merge to a disk segment, but also when it's time
for DW to flush a new segment

In the new patch this is fixed.

So if budget is 32 MB, and net RAM used (segments + DW)
is say 22, we have a 10 MB "budget", so we are allowed to select
merges that total to < 10 MB.

One issue is the ram buffer flush doubles the ram used (because
the segment is flushed as is to the RAM dir). You're saying
roughly estimate the ram size used on the result of a merge and
have the merge policy take this into account? This makes sense,
otherwise we will consistently (if temporarily) exceed the ram
buffer size. The algorithm is fairly simple? Find segments whose
total sizes are lower than whatever we have left of the max ram
buffer size? I have new code, but will rework it a bit to
include this discussion.

Jason Rutherglen
added a comment - 04/May/09 18:31 I don't like how "deep" the dichotomy of "RAMDir vs
FSDir"
Agreed, it's a bit awkward but I don't see another way to do
this. The good thing is if IW has written some .fdt files to the
main dir (via FSD), IW crashes, then IW is created again, IFD
automatically deletes the extraneous .fdt (and other extension)
files.
Why can't we push FSD down to all these places (IFD,
SegmentInfo/s, etc.)?
Could we simply make the single CMS instance smart enough
to realize that a single RAM merge is allowed to proceed
regardless of the thread limit?
Hmm... I think for benchmarking it would be good to allow
options as we simply don't know. In the latest patch a ram
mergescheduler can be set to the IndexWriter.
have to fix FSD to understand CFX must go to the dir
too
I think this is fixed in the patch, where compound files are not
created in RAM.
You're saying we should have IW create the ramdir by default
after getReader is called and remove the IW ramdir constructor?
Right. This should be "under the hood".
Ok, this will require some reworking of the patch.
OK, though I'd like to simply always use FSD, even if
primary & secondary are the same dir.
How will always using FSD work? Doesn't it assume writing to two
different directories?
this ram size should be used not only for deciding when
it's time to merge to a disk segment, but also when it's time
for DW to flush a new segment
In the new patch this is fixed.
So if budget is 32 MB, and net RAM used (segments + DW)
is say 22, we have a 10 MB "budget", so we are allowed to select
merges that total to < 10 MB.
One issue is the ram buffer flush doubles the ram used (because
the segment is flushed as is to the RAM dir). You're saying
roughly estimate the ram size used on the result of a merge and
have the merge policy take this into account? This makes sense,
otherwise we will consistently (if temporarily) exceed the ram
buffer size. The algorithm is fairly simple? Find segments whose
total sizes are lower than whatever we have left of the max ram
buffer size? I have new code, but will rework it a bit to
include this discussion.

> OK, though I'd like to simply always use FSD, even if
> primary & secondary are the same dir.

How will always using FSD work? Doesn't it assume writing to two
different directories?

I think on creating IW the user should state (via new expert ctor)
that they intend to use it for NRT (say, a new boolean
"enableNearRealTime").

Then we could pass IFD either an FSD (when in NRT mode) or the normal
directory when not in NRT mode. IFD would not longer have to
duplicate FSD's logic (summing the two dir's listAlls, the
getDirectoryForFile).

SegmentInfos.hasExternalSegments, and MultiSegmentReader ctor, should
be "smart" when they're passed an FSD (probably we should add
Directory.contains(Directory) method, which by default returns true if
this.equals(dir), but FSD would override to return true if the
incoming dir .equals primary & secondary).

Likewise all the switching in DW to handle two dirs should be rolled
back (eg you adde DW.fileLength(name, dir1, dir2) that's dup code with
FSD).

One issue is the ram buffer flush doubles the ram used (because
the segment is flushed as is to the RAM dir).

I think we must keep transient RAM usage below the specified limit, so
that limits our flushing freedom. Ie, in the NRT case, once DW's RAM
buffer exceeds half of the allowed remaining RAM budget (ie, the limit
minus total RAM segments) then we trigger a flush to RAM and then to
the "real" dir.

Or... we could flush the new segment directly to the real dir as one
segment, and merge all prior RAM segments as a separate new segment in
the main dir, if the free RAM is large enough.

> this ram size should be used not only for deciding when
> it's time to merge to a disk segment, but also when it's time
> for DW to flush a new segment

In the new patch this is fixed.

I don't see where this is taken into account? Did you mean to attach
a new patch?

Michael McCandless
added a comment - 04/May/09 22:14
> OK, though I'd like to simply always use FSD, even if
> primary & secondary are the same dir.
How will always using FSD work? Doesn't it assume writing to two
different directories?
I think on creating IW the user should state (via new expert ctor)
that they intend to use it for NRT (say, a new boolean
"enableNearRealTime").
Then we could pass IFD either an FSD (when in NRT mode) or the normal
directory when not in NRT mode. IFD would not longer have to
duplicate FSD's logic (summing the two dir's listAlls, the
getDirectoryForFile).
SegmentInfos.hasExternalSegments, and MultiSegmentReader ctor, should
be "smart" when they're passed an FSD (probably we should add
Directory.contains(Directory) method, which by default returns true if
this.equals(dir), but FSD would override to return true if the
incoming dir .equals primary & secondary).
Likewise all the switching in DW to handle two dirs should be rolled
back (eg you adde DW.fileLength(name, dir1, dir2) that's dup code with
FSD).
One issue is the ram buffer flush doubles the ram used (because
the segment is flushed as is to the RAM dir).
I think we must keep transient RAM usage below the specified limit, so
that limits our flushing freedom. Ie, in the NRT case, once DW's RAM
buffer exceeds half of the allowed remaining RAM budget (ie, the limit
minus total RAM segments) then we trigger a flush to RAM and then to
the "real" dir.
Or... we could flush the new segment directly to the real dir as one
segment, and merge all prior RAM segments as a separate new segment in
the main dir, if the free RAM is large enough.
> this ram size should be used not only for deciding when
> it's time to merge to a disk segment, but also when it's time
> for DW to flush a new segment
In the new patch this is fixed.
I don't see where this is taken into account? Did you mean to attach
a new patch?

In DocumentsWriter.balanceRAM if NRT is on the total ram
consumed is "(numBytesUsed * 2) + writer.getRamDirSize()".
numBytesUsed is the current consumption of the ram buffer.
Basically what we flush to ram, we'll consume that much of the
buffer. This is now taken into account in the bufferIsFull
calculation.

Double dir usage should be factored out.

TestIndexWriterRamDir.testFSDirectory fails. It tries to
simulate a crashing IW. When the IW is created again it should
delete the old files, for some reason it's not with FSDirectory
(open file handles on Windows perhaps)

we could flush the new segment directly to the real dir
as one segment, and merge all prior RAM segments as a separate
new segment in the main dir, if the free RAM is large enough.

Yeah it's unclear what the best policy is here. Do we want to
have some sort of custom merge policy method/class to take care
of this so the user can customize it?

Jason Rutherglen
added a comment - 05/May/09 00:32
In DocumentsWriter.balanceRAM if NRT is on the total ram
consumed is "(numBytesUsed * 2) + writer.getRamDirSize()".
numBytesUsed is the current consumption of the ram buffer.
Basically what we flush to ram, we'll consume that much of the
buffer. This is now taken into account in the bufferIsFull
calculation.
Double dir usage should be factored out.
TestIndexWriterRamDir.testFSDirectory fails. It tries to
simulate a crashing IW. When the IW is created again it should
delete the old files, for some reason it's not with FSDirectory
(open file handles on Windows perhaps)
we could flush the new segment directly to the real dir
as one segment, and merge all prior RAM segments as a separate
new segment in the main dir, if the free RAM is large enough.
Yeah it's unclear what the best policy is here. Do we want to
have some sort of custom merge policy method/class to take care
of this so the user can customize it?

Jason Rutherglen
added a comment - 05/May/09 00:52 Did we decide to simply add a boolean param in the ctor to turn
on NRT instead of relying on getReader. Using getReader could
cause problems with switching directories midstream.

Michael McCandless
added a comment - 05/May/09 08:26
Did we decide to simply add a boolean param in the ctor to turn
on NRT instead of relying on getReader. Using getReader could
cause problems with switching directories midstream.
Yes, let's switch to that.

The dual directories is continuing to push deeper (when I'm
wanting to do the reverse). EG, MergeScheduler.getDestinationDirs
should not be needed?

We should no longer need IndexWriter.getFlushDirectory? IE, IW
once again has a single "Directory" as seen by IFD,
DocFieldProcessorPerThread, etc. In the NRT case, this is an FSD;
in the non-NRT case it's the Dir that was passed in (unless, in a
future issue, we explore using FSD, too, for better performance).

We can't up and change CMS.handleMergeException (breaks
back-compat); can you deprecate old & add a new one that calls old
one? Let's have the new one take a Throwable and
MergePolicy.OneMerge?

Instead of overriding "equals" (FSD.equals) can you change to
"Directory.contains"?

IW's RAMDir usage still isn't factored in properly. EG
DW.doBalancRAM is not taking it into account.

Furthermore, we can't call writer.getRamDirSize() from
DW.balanceRAM – that's far too costly. Instead, whenever RAMDir
changes (deletes are applied, or a new RAM segment is created), we
must push down to DW that usage with a new synchronized method.
(I described this above). We should remove
IW.getRamDirSize()... ie, this size should always be "pushed on
change", not "polled on read". We change it rarely and read it
incredibly often.

We don't need IW.getRamLogMergePolicy()? Instead, let's ignore
"MergePolicy.useCompoundFile()" when we are writing the new
segment to RAMDir? Likewise we should not cast RAMMergePolicy to
LogMergePolicy in setRAMMergePolicy, nor turn off its CFS there.

I still don't think we need a separate RAMMergeScheduler; I think
CMS should simply always run such merges (ie not block on max
thread count). IW.getNextMerge can then revert to its former
self.

MergePolicy.OneMerge.segString no longer needs to take a
Directory (because it now stores a Directory).

The mergeRAMSegmentsToDisk shouldn't be fully synchronized, eg
when doWait is true it should release the lock while merges are
taking place.

Michael McCandless
added a comment - 05/May/09 09:00
We shouldn't add FSD.setPrimaryExtensions?
The dual directories is continuing to push deeper (when I'm
wanting to do the reverse). EG, MergeScheduler.getDestinationDirs
should not be needed?
We should no longer need IndexWriter.getFlushDirectory? IE, IW
once again has a single "Directory" as seen by IFD,
DocFieldProcessorPerThread, etc. In the NRT case, this is an FSD;
in the non-NRT case it's the Dir that was passed in (unless, in a
future issue, we explore using FSD, too, for better performance).
We can't up and change CMS.handleMergeException (breaks
back-compat); can you deprecate old & add a new one that calls old
one? Let's have the new one take a Throwable and
MergePolicy.OneMerge?
Instead of overriding "equals" (FSD.equals) can you change to
"Directory.contains"?
IW's RAMDir usage still isn't factored in properly. EG
DW.doBalancRAM is not taking it into account.
Furthermore, we can't call writer.getRamDirSize() from
DW.balanceRAM – that's far too costly. Instead, whenever RAMDir
changes (deletes are applied, or a new RAM segment is created), we
must push down to DW that usage with a new synchronized method.
(I described this above). We should remove
IW.getRamDirSize()... ie, this size should always be "pushed on
change", not "polled on read". We change it rarely and read it
incredibly often.
We don't need IW.getRamLogMergePolicy()? Instead, let's ignore
"MergePolicy.useCompoundFile()" when we are writing the new
segment to RAMDir? Likewise we should not cast RAMMergePolicy to
LogMergePolicy in setRAMMergePolicy, nor turn off its CFS there.
I still don't think we need a separate RAMMergeScheduler; I think
CMS should simply always run such merges (ie not block on max
thread count). IW.getNextMerge can then revert to its former
self.
MergePolicy.OneMerge.segString no longer needs to take a
Directory (because it now stores a Directory).
The mergeRAMSegmentsToDisk shouldn't be fully synchronized, eg
when doWait is true it should release the lock while merges are
taking place.

RAMDir changes (deletes are applied, or a new RAM segment is
created), we must push down to DW that usage with a new synchronized
method.

Sounds like we create a subclass of RAMDirectory with this
functionality?

We don't need IW.getRamLogMergePolicy()?

Because we don't want the user customizing this?

We should no longer need IndexWriter.getFlushDirectory? IE, IW
once again has a single "Directory" as seen by IFD,
DocFieldProcessorPerThread, etc. In the NRT case, this is an FSD; in
the non-NRT case it's the Dir that was passed in (unless, in a future
issue, we explore using FSD, too, for better performance).

Pass in FSD in the constructor of DocumentsWriter (and others) as
before?

I still don't think we need a separate RAMMergeScheduler; I
think CMS should simply always run such merges (ie not block on max
thread count). IW.getNextMerge can then revert to its former
self.

Where does the thread come from for this if we're using max threads?
If we allocate one, we're over limit and keeping it around. We'd need
a more advanced threadpool that elastically grows the thread pool and
kills threads that are unused over time. With Java 1.5 we can use
ThreadPoolExecutor. Is a dedicated thread pool something we want to
go to? Even then we can potentially still max out a given thread pool
with requests to merge one directory or the other. We'd probably
still need two separate thread pools.

MergePolicy.OneMerge.segString no longer needs to take a
Directory (because it now stores a Directory).

Yeah, I noticed this, I'll change it. MergeSpecification.segString is
public and takes a directory that is not required. What to do?

Jason Rutherglen
added a comment - 05/May/09 19:53 RAMDir changes (deletes are applied, or a new RAM segment is
created), we must push down to DW that usage with a new synchronized
method.
Sounds like we create a subclass of RAMDirectory with this
functionality?
We don't need IW.getRamLogMergePolicy()?
Because we don't want the user customizing this?
We should no longer need IndexWriter.getFlushDirectory? IE, IW
once again has a single "Directory" as seen by IFD,
DocFieldProcessorPerThread, etc. In the NRT case, this is an FSD; in
the non-NRT case it's the Dir that was passed in (unless, in a future
issue, we explore using FSD, too, for better performance).
Pass in FSD in the constructor of DocumentsWriter (and others) as
before?
I still don't think we need a separate RAMMergeScheduler; I
think CMS should simply always run such merges (ie not block on max
thread count). IW.getNextMerge can then revert to its former
self.
Where does the thread come from for this if we're using max threads?
If we allocate one, we're over limit and keeping it around. We'd need
a more advanced threadpool that elastically grows the thread pool and
kills threads that are unused over time. With Java 1.5 we can use
ThreadPoolExecutor. Is a dedicated thread pool something we want to
go to? Even then we can potentially still max out a given thread pool
with requests to merge one directory or the other. We'd probably
still need two separate thread pools.
MergePolicy.OneMerge.segString no longer needs to take a
Directory (because it now stores a Directory).
Yeah, I noticed this, I'll change it. MergeSpecification.segString is
public and takes a directory that is not required. What to do?

The dual directories is continuing to push deeper (when I'm
wanting to do the reverse). EG, MergeScheduler.getDestinationDirs
should not be needed?

If we remove getFlushDirectory, are you saying getDirectory should
return the FSD if RAM NRT is turned on? This seems counter intuitive
in that we still need a clear separation of the two directories? The
user would expect the directory they passed into the ctor to be
returned?

getDestinationDirs is used by the ram merge scheduler, which if we
use a single CMS would go away.

I'm looking at how to get RAMLogMergePolicy to take into account the
size of the ram segments it's merging such that they do not total
beyond the remaining available ram. Looks like we could keep a
running byte total while it's building the merges and stop once we've
reached the limit, though I'm not sure how exact this is (will the
merges be balanced using this system?). It seems like a variation on
the LogByteSizeMergePolicy however it's unclear whether
LogDocMergePolicy or LogByteSizeMergePolicy ram merges will perform
better (does it matter since it's all in ram and we're capping the
total?)

Jason Rutherglen
added a comment - 05/May/09 21:37 The dual directories is continuing to push deeper (when I'm
wanting to do the reverse). EG, MergeScheduler.getDestinationDirs
should not be needed?
If we remove getFlushDirectory, are you saying getDirectory should
return the FSD if RAM NRT is turned on? This seems counter intuitive
in that we still need a clear separation of the two directories? The
user would expect the directory they passed into the ctor to be
returned?
getDestinationDirs is used by the ram merge scheduler, which if we
use a single CMS would go away.
I'm looking at how to get RAMLogMergePolicy to take into account the
size of the ram segments it's merging such that they do not total
beyond the remaining available ram. Looks like we could keep a
running byte total while it's building the merges and stop once we've
reached the limit, though I'm not sure how exact this is (will the
merges be balanced using this system?). It seems like a variation on
the LogByteSizeMergePolicy however it's unclear whether
LogDocMergePolicy or LogByteSizeMergePolicy ram merges will perform
better (does it matter since it's all in ram and we're capping the
total?)

I'm not sure we have the right model yet for deciding when to
flush the ram buffer and/or ram segments. Perhaps we can simply
divide the ram buffer size in half, allocating one part to the
ram buffer, the other to the ram segments. When one exceeds it's
(rambuffersize/2) allotment, it's flushed to disk. This way if
the ram buffer size is 32MB, we will always safely flush 16MB to
disk. The more ram allotted, greater the size of what's flushed
to disk. We may eventually want to offer an expert method to set
the ram buffer size and ram dir max size individually.

Put another way I think we need a balanced upper limit for the
ram buffer and the NRT ram dir, which seems (to me) to be hard
to achieve by allowing too much growth at the expensive of the
other.

I'd like to stay away from flushing the ram buffer to disk when
it's below say 20% of the ram buffer size as it seems
inefficient to do this (because we'll have to do an expensive
disk merge on it later). On the other hand if the user is not
calling get reader very often and we're auto flushing at 1/2 the
ram buffer size, we're short changing ourselves and only
flushing a segment half the size of what it could be. I suppose
we could stick with the 1/2 model, only turning it on once ram
segments are being merged in ram?

If when merging ram segments (using the specialized
RAMMergePolicy) we only merge in ram the ones that fit, what do
we do with the ram segments remaining that need to be flushed to
disk? What if they are only make up 20% of the total size of the
ram segments? If we merge the 20% to disk it seems inefficient?

Jason Rutherglen
added a comment - 06/May/09 05:21 I'm not sure we have the right model yet for deciding when to
flush the ram buffer and/or ram segments. Perhaps we can simply
divide the ram buffer size in half, allocating one part to the
ram buffer, the other to the ram segments. When one exceeds it's
(rambuffersize/2) allotment, it's flushed to disk. This way if
the ram buffer size is 32MB, we will always safely flush 16MB to
disk. The more ram allotted, greater the size of what's flushed
to disk. We may eventually want to offer an expert method to set
the ram buffer size and ram dir max size individually.
Put another way I think we need a balanced upper limit for the
ram buffer and the NRT ram dir, which seems (to me) to be hard
to achieve by allowing too much growth at the expensive of the
other.
I'd like to stay away from flushing the ram buffer to disk when
it's below say 20% of the ram buffer size as it seems
inefficient to do this (because we'll have to do an expensive
disk merge on it later). On the other hand if the user is not
calling get reader very often and we're auto flushing at 1/2 the
ram buffer size, we're short changing ourselves and only
flushing a segment half the size of what it could be. I suppose
we could stick with the 1/2 model, only turning it on once ram
segments are being merged in ram?
If when merging ram segments (using the specialized
RAMMergePolicy) we only merge in ram the ones that fit, what do
we do with the ram segments remaining that need to be flushed to
disk? What if they are only make up 20% of the total size of the
ram segments? If we merge the 20% to disk it seems inefficient?

> RAMDir changes (deletes are applied, or a new RAM segment is
> created), we must push down to DW that usage with a new synchronized
> method.

Sounds like we create a subclass of RAMDirectory with this
functionality?

I don't think that's needed. I think whenever IW makes a change to
the RAMDir, which is easily tracked, it pushes to DW the new RAMDir
size.

> We don't need IW.getRamLogMergePolicy()?

Because we don't want the user customizing this?

That, and because it's only used to determine CFS or not, which we've
turned off for RAMDir.

> We should no longer need IndexWriter.getFlushDirectory? IE, IW
> once again has a single "Directory" as seen by IFD,
> DocFieldProcessorPerThread, etc. In the NRT case, this is an FSD; in
> the non-NRT case it's the Dir that was passed in (unless, in a future
> issue, we explore using FSD, too, for better performance).

Pass in FSD in the constructor of DocumentsWriter (and others) as
before?

Right. All these places could care less if they are dealing w/ FSD or
a "real" dir. They should simply use the Directory API as they
previously did.

> I still don't think we need a separate RAMMergeScheduler; I
> think CMS should simply always run such merges (ie not block on max
> thread count). IW.getNextMerge can then revert to its former
> self.

Where does the thread come from for this if we're using max threads?
If we allocate one, we're over limit and keeping it around. We'd need
a more advanced threadpool that elastically grows the thread pool and
kills threads that are unused over time. With Java 1.5 we can use
ThreadPoolExecutor. Is a dedicated thread pool something we want to
go to? Even then we can potentially still max out a given thread pool
with requests to merge one directory or the other. We'd probably
still need two separate thread pools.

The thread is simply launched w/o checking maxThreadCount, if the
merge is in RAM.

Right, with JDK 1.5 we can make CMS better about pooling threads.
Right now it does no long-term pooling (unless another merge happens
to be needed when a thread finishes its last merge).

> MergePolicy.OneMerge.segString no longer needs to take a
> Directory (because it now stores a Directory).

Yeah, I noticed this, I'll change it. MergeSpecification.segString is
public and takes a directory that is not required. What to do?

Do the usual back-compat dance – deprecate it and add the new one.

> The dual directories is continuing to push deeper (when I'm
> wanting to do the reverse). EG, MergeScheduler.getDestinationDirs
> should not be needed?

If we remove getFlushDirectory, are you saying getDirectory should
return the FSD if RAM NRT is turned on? This seems counter intuitive
in that we still need a clear separation of the two directories? The
user would expect the directory they passed into the ctor to be
returned?

I agree, we should leave getDirectory() as is (returns whatever Dir
was passed in).

We can keep getFlushDirectory, but it should not have duality inside it
– it should simply return the FSD (in the NRT case) or the normal
dir. I don't really like the name getFlushDirectory... but can't
think of a better one yet.

Then, nothing outside of IW should ever know there are two directories
at play. They all simply deal with the one and only Directory that IW
hands out.

On the "when to flush to RAM" question... I agree it's tricky. This
logic belongs in the RAMMergePolicy. That policy needs to be
empowered to decide if a new flush goes to RAM or disk, to decide when
to merge all RAM segments to a new disk segment, to be able to check
if IW is in NRT mode, etc. Probably the RAM merge policy also needs
control over how much of the RAM buffer it's going to give to DW,
too. At first the policy should not change the non-NRT case (ie one
always flushes straight to disk). We can play w/ that in a separate
issue. Need to think more about the logic...

Michael McCandless
added a comment - 06/May/09 11:16
> RAMDir changes (deletes are applied, or a new RAM segment is
> created), we must push down to DW that usage with a new synchronized
> method.
Sounds like we create a subclass of RAMDirectory with this
functionality?
I don't think that's needed. I think whenever IW makes a change to
the RAMDir, which is easily tracked, it pushes to DW the new RAMDir
size.
> We don't need IW.getRamLogMergePolicy()?
Because we don't want the user customizing this?
That, and because it's only used to determine CFS or not, which we've
turned off for RAMDir.
> We should no longer need IndexWriter.getFlushDirectory? IE, IW
> once again has a single "Directory" as seen by IFD,
> DocFieldProcessorPerThread, etc. In the NRT case, this is an FSD; in
> the non-NRT case it's the Dir that was passed in (unless, in a future
> issue, we explore using FSD, too, for better performance).
Pass in FSD in the constructor of DocumentsWriter (and others) as
before?
Right. All these places could care less if they are dealing w/ FSD or
a "real" dir. They should simply use the Directory API as they
previously did.
> I still don't think we need a separate RAMMergeScheduler; I
> think CMS should simply always run such merges (ie not block on max
> thread count). IW.getNextMerge can then revert to its former
> self.
Where does the thread come from for this if we're using max threads?
If we allocate one, we're over limit and keeping it around. We'd need
a more advanced threadpool that elastically grows the thread pool and
kills threads that are unused over time. With Java 1.5 we can use
ThreadPoolExecutor. Is a dedicated thread pool something we want to
go to? Even then we can potentially still max out a given thread pool
with requests to merge one directory or the other. We'd probably
still need two separate thread pools.
The thread is simply launched w/o checking maxThreadCount, if the
merge is in RAM.
Right, with JDK 1.5 we can make CMS better about pooling threads.
Right now it does no long-term pooling (unless another merge happens
to be needed when a thread finishes its last merge).
> MergePolicy.OneMerge.segString no longer needs to take a
> Directory (because it now stores a Directory).
Yeah, I noticed this, I'll change it. MergeSpecification.segString is
public and takes a directory that is not required. What to do?
Do the usual back-compat dance – deprecate it and add the new one.
> The dual directories is continuing to push deeper (when I'm
> wanting to do the reverse). EG, MergeScheduler.getDestinationDirs
> should not be needed?
If we remove getFlushDirectory, are you saying getDirectory should
return the FSD if RAM NRT is turned on? This seems counter intuitive
in that we still need a clear separation of the two directories? The
user would expect the directory they passed into the ctor to be
returned?
I agree, we should leave getDirectory() as is (returns whatever Dir
was passed in).
We can keep getFlushDirectory, but it should not have duality inside it
– it should simply return the FSD (in the NRT case) or the normal
dir. I don't really like the name getFlushDirectory... but can't
think of a better one yet.
Then, nothing outside of IW should ever know there are two directories
at play. They all simply deal with the one and only Directory that IW
hands out.
On the "when to flush to RAM" question... I agree it's tricky. This
logic belongs in the RAMMergePolicy. That policy needs to be
empowered to decide if a new flush goes to RAM or disk, to decide when
to merge all RAM segments to a new disk segment, to be able to check
if IW is in NRT mode, etc. Probably the RAM merge policy also needs
control over how much of the RAM buffer it's going to give to DW,
too. At first the policy should not change the non-NRT case (ie one
always flushes straight to disk). We can play w/ that in a separate
issue. Need to think more about the logic...

I don't think that's needed. I think whenever IW makes a
change to the RAMDir, which is easily tracked, it pushes to DW
the new RAMDir size.

Because we know the IW.ramdir is a RAMDirectory implementation,
we can use sizeInBytes? It's synchronized, maybe we want a
different method that's not? It seems like keeping track of all
files writes outside ramdir is going to be difficult? For
example when we do deletes via SegmentReader how would we keep
track of that?

That, and because it's only used to determine CFS or not,
which we've turned off for RAMDir.

So we let the user set the RAMMergePolicy but not get it?

The thread is simply launched w/o checking
maxThreadCount, if the merge is in RAM.

Hmm... We can't just create threads and let them be garbage
collected as JVMs tend to throw OOMs with this. If we go down
this route of a single CMS, maybe we can borrow some code from
an Apache project that's implemented a threadpool.

Jason Rutherglen
added a comment - 06/May/09 19:13 I don't think that's needed. I think whenever IW makes a
change to the RAMDir, which is easily tracked, it pushes to DW
the new RAMDir size.
Because we know the IW.ramdir is a RAMDirectory implementation,
we can use sizeInBytes? It's synchronized, maybe we want a
different method that's not? It seems like keeping track of all
files writes outside ramdir is going to be difficult? For
example when we do deletes via SegmentReader how would we keep
track of that?
That, and because it's only used to determine CFS or not,
which we've turned off for RAMDir.
So we let the user set the RAMMergePolicy but not get it?
The thread is simply launched w/o checking
maxThreadCount, if the merge is in RAM.
Hmm... We can't just create threads and let them be garbage
collected as JVMs tend to throw OOMs with this. If we go down
this route of a single CMS, maybe we can borrow some code from
an Apache project that's implemented a threadpool.

> I don't think that's needed. I think whenever IW makes a
> change to the RAMDir, which is easily tracked, it pushes to DW
> the new RAMDir size.

Because we know the IW.ramdir is a RAMDirectory implementation,
we can use sizeInBytes? It's synchronized, maybe we want a
different method that's not? It seems like keeping track of all
files writes outside ramdir is going to be difficult? For
example when we do deletes via SegmentReader how would we keep
track of that?

We should definitely just use the sizeInBytes() method.

I'm saying that IW knows when it writes new files to the RAMDir
(flushing deletes, flushing new segment) and it's only at those times
that it should call sizeInBytes() and push that value down to DW.

> That, and because it's only used to determine CFS or not,
> which we've turned off for RAMDir.

So we let the user set the RAMMergePolicy but not get it?

Oh, we should add a getter (getRAMMergePolicy, not getLogMergePolicy)
for it, but it should return MergePolicy not LogMergePolicy.

> The thread is simply launched w/o checking
> maxThreadCount, if the merge is in RAM.

Hmm... We can't just create threads and let them be garbage
collected as JVMs tend to throw OOMs with this. If we go down
this route of a single CMS, maybe we can borrow some code from
an Apache project that's implemented a threadpool.

This is how CMS has always been. It launches threads relatively
rarely – this shouldn't lead to OOMs. One can always subclass CMS if
this is somehow a problem. Or we could modify CMS to pool its threads
(as a new issue)?

Michael McCandless
added a comment - 06/May/09 21:53
> I don't think that's needed. I think whenever IW makes a
> change to the RAMDir, which is easily tracked, it pushes to DW
> the new RAMDir size.
Because we know the IW.ramdir is a RAMDirectory implementation,
we can use sizeInBytes? It's synchronized, maybe we want a
different method that's not? It seems like keeping track of all
files writes outside ramdir is going to be difficult? For
example when we do deletes via SegmentReader how would we keep
track of that?
We should definitely just use the sizeInBytes() method.
I'm saying that IW knows when it writes new files to the RAMDir
(flushing deletes, flushing new segment) and it's only at those times
that it should call sizeInBytes() and push that value down to DW.
> That, and because it's only used to determine CFS or not,
> which we've turned off for RAMDir.
So we let the user set the RAMMergePolicy but not get it?
Oh, we should add a getter (getRAMMergePolicy, not getLogMergePolicy)
for it, but it should return MergePolicy not LogMergePolicy.
> The thread is simply launched w/o checking
> maxThreadCount, if the merge is in RAM.
Hmm... We can't just create threads and let them be garbage
collected as JVMs tend to throw OOMs with this. If we go down
this route of a single CMS, maybe we can borrow some code from
an Apache project that's implemented a threadpool.
This is how CMS has always been. It launches threads relatively
rarely – this shouldn't lead to OOMs. One can always subclass CMS if
this is somehow a problem. Or we could modify CMS to pool its threads
(as a new issue)?

In the patch the merge policies are split up which requires some
of the RAM NRT logic to be in updatePendingMerges.

One solution is to have a merge policy that manages merging to
ram and to disk, kind of an overarching merge policy for the
primary MP and the RAM MP. This would push the logic of ram
merging and primary dir merging to the meta merge policy which
would clean up IW from managing ram segs vs. prim segs.

Does IW.optimize and IW.expungeDeletes operate on the ramdir as
well (the expungeDeletes javadoc implies calling
IR.numDeletedDocs will return zero when there are no deletes).

Jason Rutherglen
added a comment - 06/May/09 23:38 In the patch the merge policies are split up which requires some
of the RAM NRT logic to be in updatePendingMerges.
One solution is to have a merge policy that manages merging to
ram and to disk, kind of an overarching merge policy for the
primary MP and the RAM MP. This would push the logic of ram
merging and primary dir merging to the meta merge policy which
would clean up IW from managing ram segs vs. prim segs.
Does IW.optimize and IW.expungeDeletes operate on the ramdir as
well (the expungeDeletes javadoc implies calling
IR.numDeletedDocs will return zero when there are no deletes).

Something in the DocumentsWriter API we may need to change is to
allow passing a directory through the IndexingChain. In the RAM
NRT case, which directory we write to can change depending on if
a ram buffer has exceeded it's maximum available size. If it is
under half the available ram it will to go the ram dir, if not
the new segment will be written to disk. For this reason we
can't simply pass a directory into the constructor of
DocumentsWriter, nor can we rely on calling
IW.getFlushDirectory. We should be able to rely on the directory
in SegmentWriteState?

Jason Rutherglen
added a comment - 08/May/09 05:36 Something in the DocumentsWriter API we may need to change is to
allow passing a directory through the IndexingChain. In the RAM
NRT case, which directory we write to can change depending on if
a ram buffer has exceeded it's maximum available size. If it is
under half the available ram it will to go the ram dir, if not
the new segment will be written to disk. For this reason we
can't simply pass a directory into the constructor of
DocumentsWriter, nor can we rely on calling
IW.getFlushDirectory. We should be able to rely on the directory
in SegmentWriteState?

Does IW.optimize and IW.expungeDeletes operate on the ramdir as
well (the expungeDeletes javadoc implies calling
IR.numDeletedDocs will return zero when there are no deletes).

I think IW.optimize should mean all RAM segments are merged into the single on-disk segment?

And IW.expungeDeletes should also apply to RAM segments, ie if RAM segments have pending deletes, they are merged away (possibly entirely in RAM, ie the RAM merge policy could simply merge to a new RAM segment)?

In the patch the merge policies are split up which requires some
of the RAM NRT logic to be in updatePendingMerges.
One solution is to have a merge policy that manages merging to
ram and to disk,

It looks like it's the "is it time to flush to disk" logic, right? Why can't we make that the purview of the RAM MergePolicy? We may need to extend MergePolicy API to tell it how much RAM is free in the budget.

We should be able to rely on the directory in SegmentWriteState?

I think we should fix the indexing chain to always use SegmentWriteState's Directory and not pass Directory to the ctors? Does something go wrong if we take that approach?

Michael McCandless
added a comment - 08/May/09 09:06
Does IW.optimize and IW.expungeDeletes operate on the ramdir as
well (the expungeDeletes javadoc implies calling
IR.numDeletedDocs will return zero when there are no deletes).
I think IW.optimize should mean all RAM segments are merged into the single on-disk segment?
And IW.expungeDeletes should also apply to RAM segments, ie if RAM segments have pending deletes, they are merged away (possibly entirely in RAM, ie the RAM merge policy could simply merge to a new RAM segment)?
In the patch the merge policies are split up which requires some
of the RAM NRT logic to be in updatePendingMerges.
One solution is to have a merge policy that manages merging to
ram and to disk,
It looks like it's the "is it time to flush to disk" logic, right? Why can't we make that the purview of the RAM MergePolicy? We may need to extend MergePolicy API to tell it how much RAM is free in the budget.
We should be able to rely on the directory in SegmentWriteState?
I think we should fix the indexing chain to always use SegmentWriteState's Directory and not pass Directory to the ctors? Does something go wrong if we take that approach?

IW.optimize should mean all RAM segments are merged into
the single on-disk segment

Yes.

IW.expungeDeletes should also apply to RAM segments, ie
if RAM segments have pending deletes, they are merged away
(possibly entirely in RAM, ie the RAM merge policy could simply
merge to a new RAM segment)?

Yes.

we should fix the indexing chain to always use
SegmentWriteState's Directory and not pass Directory to the
ctors

Jason Rutherglen
added a comment - 11/May/09 16:21 IW.optimize should mean all RAM segments are merged into
the single on-disk segment
Yes.
IW.expungeDeletes should also apply to RAM segments, ie
if RAM segments have pending deletes, they are merged away
(possibly entirely in RAM, ie the RAM merge policy could simply
merge to a new RAM segment)?
Yes.
we should fix the indexing chain to always use
SegmentWriteState's Directory and not pass Directory to the
ctors
Yep.
The next patch will have these features.

A single merge scheduler is used. We will need to open a new
issue for a version of ConcurrentMergeScheduler that allocates
threads perhaps based on the merge.directory? We'd also probably
want to add thread pooling.

There's a package protected IW ctor that accepts the ram dir.
This is used in the test case for insuring we aren't creating
.cfs files in the ram dir.

IW.optimize merges all segments (ram included) to the primary
dir

IW.expungeDeletes merges segments with deletes, in ram ones
stay in ram (unless they won't fit), and primary dir ones are
handled as usual

Added testOptimize, testExpungeDeletes, and some other test
cases

Needs a test case to make sure we're merging to the primary
dir when the ram dir is full or a flush won't fit in the ram dir

There's a mergeRamSegmentsToDir and resolveRamSegments. Two
different methods because mergeRamSegmentsToDir operates by
simply scheduling merges, resolveRamSegments operates in the
foreground like resolveExternalSegments. I'm not sure if we can
combine the two. resolveRamSegments seems to have a thread
notification problem and so hangs at times. I'll look into this
further unless it's obvious what the problem is.

When RAM NRT is on (via the IndexWriter constructor), setting
the ram buffer size allocates half of the given number to the
DocumentsWriter buffer and half to the ram dir. It may be best
to dynamically change these numbers based on usage etc.

Added NRTMergePolicy which is used only when RAM NRT is on. It
utilizes the regular merge policy and the ram merge policy.

The ram dir size is pushed to DocumentsWriter

RAMMergePolicy extends LogDocMergePolicy and defaults the
useCompoundFile and useCompoundDocStore to false

Sorry for the whitespace stuff, I'll clean it up later, I
wanted to post the latest to get feedback

Jason Rutherglen
added a comment - 12/May/09 03:20
A single merge scheduler is used. We will need to open a new
issue for a version of ConcurrentMergeScheduler that allocates
threads perhaps based on the merge.directory? We'd also probably
want to add thread pooling.
There's a package protected IW ctor that accepts the ram dir.
This is used in the test case for insuring we aren't creating
.cfs files in the ram dir.
IW.optimize merges all segments (ram included) to the primary
dir
IW.expungeDeletes merges segments with deletes, in ram ones
stay in ram (unless they won't fit), and primary dir ones are
handled as usual
Added testOptimize, testExpungeDeletes, and some other test
cases
Needs a test case to make sure we're merging to the primary
dir when the ram dir is full or a flush won't fit in the ram dir
There's a mergeRamSegmentsToDir and resolveRamSegments. Two
different methods because mergeRamSegmentsToDir operates by
simply scheduling merges, resolveRamSegments operates in the
foreground like resolveExternalSegments. I'm not sure if we can
combine the two. resolveRamSegments seems to have a thread
notification problem and so hangs at times. I'll look into this
further unless it's obvious what the problem is.
When RAM NRT is on (via the IndexWriter constructor), setting
the ram buffer size allocates half of the given number to the
DocumentsWriter buffer and half to the ram dir. It may be best
to dynamically change these numbers based on usage etc.
Added NRTMergePolicy which is used only when RAM NRT is on. It
utilizes the regular merge policy and the ram merge policy.
The ram dir size is pushed to DocumentsWriter
RAMMergePolicy extends LogDocMergePolicy and defaults the
useCompoundFile and useCompoundDocStore to false
Sorry for the whitespace stuff, I'll clean it up later, I
wanted to post the latest to get feedback

I think the easiest way to handle the ram buf size vs. the ram
dir size is the allow each to grow on request. I have some code
I need to test that implements it. This way we're growing based
on demand and availability. The only thing we may want to add is
a way to grow and perhaps automatically flush based on the
growth requested and perhaps prioritizing requests?

Jason Rutherglen
added a comment - 12/May/09 19:24 I think the easiest way to handle the ram buf size vs. the ram
dir size is the allow each to grow on request. I have some code
I need to test that implements it. This way we're growing based
on demand and availability. The only thing we may want to add is
a way to grow and perhaps automatically flush based on the
growth requested and perhaps prioritizing requests?

Added DocumentsWriter.growRamBufferBy/growRamDirMaxBy methods
that allow dynamically requesting more ram. We start off at
50/50, ramdir/rambuffer. Then whenever one needs more, grow* is
called.

We need a RAMPolicy class that allows customizing how ram is
allocated. Currently the ramdir and the rambuffer compete for
space, the user will presumably want to customize this.

I'm not sure the flushing always occurs when it should, and
not sure yet how to test to insure it's flushing when it should
(other than watching a log). What happened to the adding logging
to Lucene patch?

Jason Rutherglen
added a comment - 19/May/09 21:59
All tests pass, added more tests
Added DocumentsWriter.growRamBufferBy/growRamDirMaxBy methods
that allow dynamically requesting more ram. We start off at
50/50, ramdir/rambuffer. Then whenever one needs more, grow* is
called.
We need a RAMPolicy class that allows customizing how ram is
allocated. Currently the ramdir and the rambuffer compete for
space, the user will presumably want to customize this.
I'm not sure the flushing always occurs when it should, and
not sure yet how to test to insure it's flushing when it should
(other than watching a log). What happened to the adding logging
to Lucene patch?

I'd like to eliminate IW.getInternalDirectory, if possible: to
anyone interacting with IW, there is only one Directory, and the
switching is entirely "under the hood".

I realized there is in fact a benefit to using CFS in RAM: much
better RAM efficiency for tiny segments (because RAMDir's buffer
size is 1 KB). Though such segments would presumably be merged
away with time, so it may not be a big deal...

Is IW.mergeRAMSegmentToDir only for testing?

Can you name things theRAMSetting instead of theRamSetting? (Ie,
RAM is all caps).

For IW.resolveRAMSegments, maybe we should make a single merge
that merges everything down? Why even bother interacting with a
merge policy, here?

Can you rename flush()'s new arg "flushToRAM" to
"allowFlushToRAM"? Ie, even when this is true, that method may
decide RAM is full and in fact flush to the real dir.

Can you rename IW.ramNRT to IW.flushToRAM? (Since it's in fact
orthogonal to NRT).

It's sneaky to set docWriter.flushToDir before calling
docWriter.flush; can't we make that an arg to docWriter.flush?
(And docWriter would never store it).

Why did you need to add DW.fileLength?

IW.SWITCH_FILE_EXTS should be private static final (not public)?

We lost private on a number of attrs in IW – can you restore?
(You should insert nocommit comments when you do that, to reduce
risk that such changes slip in).

Likewise for SegmentReader.coreRef.

Why did you need to make RAMDir.sizeInBytes volatile? Isn't it
always updated/accessed from sync(RAMDir) context?

Why do we need a new class RAMMergePolicy? (There's no API
difference over MergePolicy). Can't we simply by default
instantiate LogByteSizeMergePolicy, and set CFS/CFX to false?

IW.fileSwitchDirectory should be private?

Have you done any perf tests with flushToRAM = true? EG should we
enable it by default? I think if we have a good policy for
managing RAM it could very well be higher performance. But, we
should explore this under a different issue, so leave the default
at "no ram dir".

On the "how to share RAM" between RAMDir & DW's RAM buffer... instead
of pre-dividing and growing over time, I think we can simplify it by
logically sharing a single "pool".

The RAMDir only alters its ram usage when 1) we flush a new segment to
it, 2) a merge completes (either writing to the real dir or to the ram
dir), or 3) deletes are applied to segments in RAM. When such a
change happens we notify DW. DW takes then adds that base into its
ram consumption to decide when it's time to flush.

For starters, and we can optimize this later, I don't think DW should
choose on its own to flush itself to the RAMDir? That should only
happen when getReader is called, and there's still plenty of RAM
free.

So what happens is... each time getReader() is called, we make a new
smallish RAM segment. Over time, these RAM segments need merging so
we merge them. (If such a merge is fairly large, probably instead of
writing to ram it should write the new segment to the real dir, since
intermediate RAM usage will be too high).

At some point, DW detects that the RAMDir size plus its own buffer is
at the limit. If DW's buffer is relatively small, it should probably
simply flush to the RAMDir then dump entire RAMDir to the real dir as
a single merge. If DW's buffer is big, as would happen if you opened
an NRT reader but never actually called getReader(), it should flush
straight to the real dir.

One challenge we face is ensuring that while we are flushing all ram
segments to disk, we don't block the getReader() turnaround. IE we
can't make getReader() do that flush synchronously. So that needs to
be a BG merge, but we must somehow temporarily disregard the size of
those segments while the merge is running. Or, perhaps we "merge RAM
segments to disk" a bit early, eg once RAM consumed is > 90% of the
total RAM buffer, or something.

Michael McCandless
added a comment - 22/May/09 13:47 I think generally we are close. I have lots of little comments from
looking through the patch:
Can you update the CHANGES entry to something like "IndexWriter
now uses RAM more efficiently when in near real-time mode"? (Ie
we don't pass RAMDir to IW).
DW.push/getRAMDirSize, RAMTotalMax, RAMBufferAvailable, etc. need
to be synchronized?
Since IW.flushDocStores always goes to the main directory, why
does it now take a Directory arg?
I don't think doAfterFlush should be responsible for calling
pushRamDirSize(); that's more of a hook for external subclasses.
Yes, IW.ramSizeInBytes() should include the ramDir's bytes
There are still places where Directory.contains should be used,
instead of pulling both dirs and checkign each. EG, the assert in
DW.applyDeletes, and this assert in IW:
if (ramNrt && merge.directory == switchDirectory) {
assert !merge.useCompoundFile;
}
I'd like to eliminate IW.getInternalDirectory, if possible: to
anyone interacting with IW, there is only one Directory, and the
switching is entirely "under the hood".
I realized there is in fact a benefit to using CFS in RAM: much
better RAM efficiency for tiny segments (because RAMDir's buffer
size is 1 KB). Though such segments would presumably be merged
away with time, so it may not be a big deal...
Is IW.mergeRAMSegmentToDir only for testing?
Can you name things theRAMSetting instead of theRamSetting? (Ie,
RAM is all caps).
For IW.resolveRAMSegments, maybe we should make a single merge
that merges everything down? Why even bother interacting with a
merge policy, here?
Can you rename flush()'s new arg "flushToRAM" to
"allowFlushToRAM"? Ie, even when this is true, that method may
decide RAM is full and in fact flush to the real dir.
Can you rename IW.ramNRT to IW.flushToRAM? (Since it's in fact
orthogonal to NRT).
It's sneaky to set docWriter.flushToDir before calling
docWriter.flush; can't we make that an arg to docWriter.flush?
(And docWriter would never store it).
Why did you need to add DW.fileLength?
IW.SWITCH_FILE_EXTS should be private static final (not public)?
We lost private on a number of attrs in IW – can you restore?
(You should insert nocommit comments when you do that, to reduce
risk that such changes slip in).
Likewise for SegmentReader.coreRef.
Why did you need to make RAMDir.sizeInBytes volatile? Isn't it
always updated/accessed from sync(RAMDir) context?
Why do we need a new class RAMMergePolicy? (There's no API
difference over MergePolicy). Can't we simply by default
instantiate LogByteSizeMergePolicy, and set CFS/CFX to false?
IW.fileSwitchDirectory should be private?
Have you done any perf tests with flushToRAM = true? EG should we
enable it by default? I think if we have a good policy for
managing RAM it could very well be higher performance. But, we
should explore this under a different issue, so leave the default
at "no ram dir".
On the "how to share RAM" between RAMDir & DW's RAM buffer... instead
of pre-dividing and growing over time, I think we can simplify it by
logically sharing a single "pool".
The RAMDir only alters its ram usage when 1) we flush a new segment to
it, 2) a merge completes (either writing to the real dir or to the ram
dir), or 3) deletes are applied to segments in RAM. When such a
change happens we notify DW. DW takes then adds that base into its
ram consumption to decide when it's time to flush.
For starters, and we can optimize this later, I don't think DW should
choose on its own to flush itself to the RAMDir? That should only
happen when getReader is called, and there's still plenty of RAM
free.
So what happens is... each time getReader() is called, we make a new
smallish RAM segment. Over time, these RAM segments need merging so
we merge them. (If such a merge is fairly large, probably instead of
writing to ram it should write the new segment to the real dir, since
intermediate RAM usage will be too high).
At some point, DW detects that the RAMDir size plus its own buffer is
at the limit. If DW's buffer is relatively small, it should probably
simply flush to the RAMDir then dump entire RAMDir to the real dir as
a single merge. If DW's buffer is big, as would happen if you opened
an NRT reader but never actually called getReader(), it should flush
straight to the real dir.
One challenge we face is ensuring that while we are flushing all ram
segments to disk, we don't block the getReader() turnaround. IE we
can't make getReader() do that flush synchronously. So that needs to
be a BG merge, but we must somehow temporarily disregard the size of
those segments while the merge is running. Or, perhaps we "merge RAM
segments to disk" a bit early, eg once RAM consumed is > 90% of the
total RAM buffer, or something.

I started on the pooled ram model after the last patch because
it is cleaner. Bytes are allocated up to the given limit set by
IW.setRAMBufferSizeMB. As mentioned below, we may want to add a
setting for the max ram temporarily used.

I'm reusing the DocumentsWriter.numBytesAlloc/numBytesUsed and
created a RAMPolicy that manages ramDirBytesAlloc and
ramDirBytes. Each time a merge is scheduled, the
sizeof(segments) is allocated by RAMPolicy and the segmentsAlloc
is stored in OneMerge. Once the merge completes or fails, the
ramDirBytesAlloc is adjusted by the difference between the
actual bytes used and OM.ramDirAlloc. This way we always have
the most accurate ramDir allocation in RamP, and we properly
adjust the amount of ram consumed. This works well with our
concurrent merging model where we can't predict when a merge
will complete.

One challenge we face is ensuring that while we are
flushing all ram segments to disk, we don't block the
getReader() turnaround. IE we can't make getReader() do that
flush synchronously....perhaps we "merge RAM segments to disk" a
bit early, eg once RAM consumed is > 90% of the total RAM
buffer

You're talking about the synchronization in IW.doFlushInternal
which would block getReader while writing a segment to disk? Our
default RAMPolicy should be one where we always flush the ram
buffer to the ramdir. Basically there must always be room in the
ram dir for the ram buffer. ramdir + (rambuf * 2) < maxSize. Or
do we assume that it's ok for ramUsed to temporarily exceed
ramMax by a given percent (110% which would be an option in
RAMPolicy)? while ramBuf is being flushed to ramDir?

We may want to make some assumptions about usage of getReader
(i.e. getReader is called fairly often such that the rambuffer
is usually less than half of the ram used) when flushToRam=true
so that we can get a version of this functionality out the door,
then iterate as we gather feedback from users?

Jason Rutherglen
added a comment - 26/May/09 16:42 I started on the pooled ram model after the last patch because
it is cleaner. Bytes are allocated up to the given limit set by
IW.setRAMBufferSizeMB. As mentioned below, we may want to add a
setting for the max ram temporarily used.
I'm reusing the DocumentsWriter.numBytesAlloc/numBytesUsed and
created a RAMPolicy that manages ramDirBytesAlloc and
ramDirBytes. Each time a merge is scheduled, the
sizeof(segments) is allocated by RAMPolicy and the segmentsAlloc
is stored in OneMerge. Once the merge completes or fails, the
ramDirBytesAlloc is adjusted by the difference between the
actual bytes used and OM.ramDirAlloc. This way we always have
the most accurate ramDir allocation in RamP, and we properly
adjust the amount of ram consumed. This works well with our
concurrent merging model where we can't predict when a merge
will complete.
One challenge we face is ensuring that while we are
flushing all ram segments to disk, we don't block the
getReader() turnaround. IE we can't make getReader() do that
flush synchronously....perhaps we "merge RAM segments to disk" a
bit early, eg once RAM consumed is > 90% of the total RAM
buffer
You're talking about the synchronization in IW.doFlushInternal
which would block getReader while writing a segment to disk? Our
default RAMPolicy should be one where we always flush the ram
buffer to the ramdir. Basically there must always be room in the
ram dir for the ram buffer. ramdir + (rambuf * 2) < maxSize. Or
do we assume that it's ok for ramUsed to temporarily exceed
ramMax by a given percent (110% which would be an option in
RAMPolicy)? while ramBuf is being flushed to ramDir?
We may want to make some assumptions about usage of getReader
(i.e. getReader is called fairly often such that the rambuffer
is usually less than half of the ram used) when flushToRam=true
so that we can get a version of this functionality out the door,
then iterate as we gather feedback from users?
I'll include the comments in the next patch.

I had forgotten about concurrency with the docstores (keeping an
open IndexInput and IndexOutput) when using an FSDirectory. I
wrote a test (which fails) so getting this functionality to work
could require some reworking of FSDirectory internals?
(Something on the order of auto updating IndexInput's buffers
and file length as IndexOutput is flushed?)

We need simultaneous IndexInput and IndexOutput ops to work as
docstore is streamed to FSDir for multiple segments. After a
segment is flushed to the ramdir it can be read from, including
the docstore which is still actively being written to (for the
new segments)?

Jason Rutherglen
added a comment - 29/May/09 15:49 I had forgotten about concurrency with the docstores (keeping an
open IndexInput and IndexOutput) when using an FSDirectory. I
wrote a test (which fails) so getting this functionality to work
could require some reworking of FSDirectory internals?
(Something on the order of auto updating IndexInput's buffers
and file length as IndexOutput is flushed?)
We need simultaneous IndexInput and IndexOutput ops to work as
docstore is streamed to FSDir for multiple segments. After a
segment is flushed to the ramdir it can be read from, including
the docstore which is still actively being written to (for the
new segments)?

Michael McCandless
added a comment - 29/May/09 16:05 I had forgotten about concurrency with the docstores
That's a big (and good) change; I think we should save that one for another issue, and leave this one focusing on flushing segments through a RAMDir?

I guess I'm confused about the doc stores. We keep one open on
disk for multiple segments being created in the via FSD, which
means the IndexOutput isn't closed, however for each new
SegmentReader that's opened, we're creating a new IndexInput
only after the segment is flushed (to FSD, docstore to disk).

So the concurrent docstores may work without too much changing
of FSDir internals, as the portion of the docstore file that the
SR needs to know about has been flushed to disk when SR is
opened. The SR should then be able to open the docstore file
cleanly regardless of the open IndexOutput adding more to the
file for the next rambuf segment?

Jason Rutherglen
added a comment - 29/May/09 17:31 I guess I'm confused about the doc stores. We keep one open on
disk for multiple segments being created in the via FSD, which
means the IndexOutput isn't closed, however for each new
SegmentReader that's opened, we're creating a new IndexInput
only after the segment is flushed (to FSD, docstore to disk).
So the concurrent docstores may work without too much changing
of FSDir internals, as the portion of the docstore file that the
SR needs to know about has been flushed to disk when SR is
opened. The SR should then be able to open the docstore file
cleanly regardless of the open IndexOutput adding more to the
file for the next rambuf segment?

The SR should then be able to open the docstore file
cleanly regardless of the open IndexOutput adding more to the
file for the next rambuf segment?

That's the crucial question: can we open a new IndexInput, while an IndexOutput is still writing to the file? I vaguely remember being surprised that this worked fine on Windows, but I'm not sure.

If that's fine across all OS's, then, yes we could avoid closing the docStores when flushing a new segment.

If it's not fine, then we'd need a way to make an IndexOutputInput, which is a bigger change.

We also should [separately] consider having multiple SegmentReaders that share the same docStores, share a single set of IndexInputs (cloned). Ie if the RAM MergePolicy allows many segments in RAM at once, we are still opening real file descriptors to read the doc stores, so without such sharing we could start running out of descriptors.

Michael McCandless
added a comment - 29/May/09 18:01
The SR should then be able to open the docstore file
cleanly regardless of the open IndexOutput adding more to the
file for the next rambuf segment?
That's the crucial question: can we open a new IndexInput, while an IndexOutput is still writing to the file? I vaguely remember being surprised that this worked fine on Windows, but I'm not sure.
If that's fine across all OS's, then, yes we could avoid closing the docStores when flushing a new segment.
If it's not fine, then we'd need a way to make an IndexOutputInput, which is a bigger change.
We also should [separately] consider having multiple SegmentReaders that share the same docStores, share a single set of IndexInputs (cloned). Ie if the RAM MergePolicy allows many segments in RAM at once, we are still opening real file descriptors to read the doc stores, so without such sharing we could start running out of descriptors.

I want to run the Lucene unit tests in NRT mode without creating and/or
modifying all the test cases. In lieu of adding a
System.property that turns NRT on, have we settled on a
different mechanism for global settings? Perhaps the back compat
type of system can be used here? Or for now a static variable on
IW?

can we open a new IndexInput, while an IndexOutput is
still writing to the file?

I ran a test case successfully that writes to a file while
opening threads that read from flushed sections on windows.

Closing docstores for every flush would seem to cause a lot of
overhead. With NRT + FSD aren't termvector files merged on disk for
every segment?

We also should [separately] consider having multiple
SegmentReaders that share the same docStores

Jason Rutherglen
added a comment - 31/May/09 20:24 I want to run the Lucene unit tests in NRT mode without creating and/or
modifying all the test cases. In lieu of adding a
System.property that turns NRT on, have we settled on a
different mechanism for global settings? Perhaps the back compat
type of system can be used here? Or for now a static variable on
IW?
can we open a new IndexInput, while an IndexOutput is
still writing to the file?
I ran a test case successfully that writes to a file while
opening threads that read from flushed sections on windows.
Closing docstores for every flush would seem to cause a lot of
overhead. With NRT + FSD aren't termvector files merged on disk for
every segment?
We also should [separately] consider having multiple
SegmentReaders that share the same docStores
Doesn't FSDir open only one FD per file?

I want to run the Lucene unit tests in NRT mode without creating and/or
modifying all the test cases.

You can just temporarily set the default then run all tests?

We never reached consensus on a back compat sort of setting...

I ran a test case successfully that writes to a file while
opening threads that read from flushed sections on windows.

Excellent; let's take this up under a new issue? If this holds true across platform (and Windows is the most challenging one) then it'll make sharing much easier.

Closing docstores for every flush would seem to cause a lot of
overhead. With NRT + FSD aren't termvector files merged on disk for
every segment?

Yes it adds overhead, though, it's not on the critical path for opening a new reader since it happens in the BG. So... things like LUCENE-1526 (making deletions, norms incrementally copy-on-write) I think are higher priority.

Doesn't FSDir open only one FD per file?

No, it opens a real file every time openInput is called. I guess we could think about having it share/clone internally?

Michael McCandless
added a comment - 01/Jun/09 00:52
I want to run the Lucene unit tests in NRT mode without creating and/or
modifying all the test cases.
You can just temporarily set the default then run all tests?
We never reached consensus on a back compat sort of setting...
I ran a test case successfully that writes to a file while
opening threads that read from flushed sections on windows.
Excellent; let's take this up under a new issue? If this holds true across platform (and Windows is the most challenging one) then it'll make sharing much easier.
Closing docstores for every flush would seem to cause a lot of
overhead. With NRT + FSD aren't termvector files merged on disk for
every segment?
Yes it adds overhead, though, it's not on the critical path for opening a new reader since it happens in the BG. So... things like LUCENE-1526 (making deletions, norms incrementally copy-on-write) I think are higher priority.
Doesn't FSDir open only one FD per file?
No, it opens a real file every time openInput is called. I guess we could think about having it share/clone internally?

Or have separate parallel classes inherited classes that set the
static IW variable, run the tests, and perhaps have additional
NRT specific test methods.

I guess we could think about having it share/clone
internally?

Yeah we should, I got this confused with how we're cloning in
most o.a.l.i classes, but if we're calling openInput it seems
like we should have it internally clone, I guess we'll have an
issue with classes that don't IndexInput.close, however it's
better to solve these than open many unnecessary file
descriptors.

let's take this up under a new issue?

The issue would only be the unit test for now, or should it be a
part of an existing issue? Ok, it will clean up LUCENE-1313's
unit test class.

Jason Rutherglen
added a comment - 01/Jun/09 04:26 You can just temporarily set the default then run all
tests?
Or have separate parallel classes inherited classes that set the
static IW variable, run the tests, and perhaps have additional
NRT specific test methods.
I guess we could think about having it share/clone
internally?
Yeah we should, I got this confused with how we're cloning in
most o.a.l.i classes, but if we're calling openInput it seems
like we should have it internally clone, I guess we'll have an
issue with classes that don't IndexInput.close, however it's
better to solve these than open many unnecessary file
descriptors.
let's take this up under a new issue?
The issue would only be the unit test for now, or should it be a
part of an existing issue? Ok, it will clean up LUCENE-1313 's
unit test class.

Jason Rutherglen
added a comment - 01/Jun/09 19:53 With flushToRAM=true, when IW.commit is called, do we still want
to not have concurrent merges execute synchronously? Or only
wait for the merges to complete that are from ramDir to
primaryDir?

Michael McCandless
added a comment - 01/Jun/09 23:52
The issue would only be the unit test for now, or should it be a
part of an existing issue?
I meant the issue of FSDir sharing a single IndexInput if the same file is opened more than once (you opened LUCENE-1671 for this).
I guess we'll have an issue with classes that don't IndexInput.close
This (failing to close an IndexInput that was obtained from Directory.openInput) should be rare in Lucene – we try not to do that. Failing to close clones does happen...
With flushToRAM=true, when IW.commit is called, do we still want
to not have concurrent merges execute synchronously? Or only
wait for the merges to complete that are from ramDir to
primaryDir?
I think only ramDir -> primaryDir? commit() today doens't block on BG merges.

For IW.resolveRAMSegments, maybe we should make a single
merge that merges everything down? Why even bother interacting
with a merge policy, here?

LogMergePolicy.findMergesForOptimize I assumed would merge
segments to a single segment, in testing resolveRAMSegments that
doesn't always seem to be the case?

When I created a mergeAllSegments to one, the
ensureContiguousMerge would throw an exception. I thought about
avoiding ensureContiguousMerge but it required reworking a bunch
of code. Why does ensureContiguousMerge exist? Is it for
assertions or is there a reason segments need to be next to each
other when merging?

Jason Rutherglen
added a comment - 02/Jun/09 18:59 For IW.resolveRAMSegments, maybe we should make a single
merge that merges everything down? Why even bother interacting
with a merge policy, here?
LogMergePolicy.findMergesForOptimize I assumed would merge
segments to a single segment, in testing resolveRAMSegments that
doesn't always seem to be the case?
When I created a mergeAllSegments to one, the
ensureContiguousMerge would throw an exception. I thought about
avoiding ensureContiguousMerge but it required reworking a bunch
of code. Why does ensureContiguousMerge exist? Is it for
assertions or is there a reason segments need to be next to each
other when merging?

LogMergePolicy.findMergesForOptimize I assumed would merge
segments to a single segment, in testing resolveRAMSegments that
doesn't always seem to be the case?

It should, but it will respect mergeFactor in the process (ie do multiple merges if necessary). I'm thinking we should just merge all the RAM segments in one go, and not consult merge policy, in resolveRAMSegments.

When I created a mergeAllSegments to one, the
ensureContiguousMerge would throw an exception.

Why does this throw an exception? Ie you passed in the in-order RAM segments?

Why does ensureContiguousMerge exist? Is it for
assertions or is there a reason segments need to be next to each
other when merging?

I think the only thing that'd break is IndexWriter now assumes continuity when it removes the old segments and puts the new one in. LUCENE-1076 explores allowing MergePolicy to select non-contiguous merges.

But: with autoCommit=false, in order to avoid merging the doc stores, the segments (even RAM segments) must be contiguous. This is a sizable performance gain when building a large index in one IndexWriter session.

Michael McCandless
added a comment - 02/Jun/09 19:29
LogMergePolicy.findMergesForOptimize I assumed would merge
segments to a single segment, in testing resolveRAMSegments that
doesn't always seem to be the case?
It should, but it will respect mergeFactor in the process (ie do multiple merges if necessary). I'm thinking we should just merge all the RAM segments in one go, and not consult merge policy, in resolveRAMSegments.
When I created a mergeAllSegments to one, the
ensureContiguousMerge would throw an exception.
Why does this throw an exception? Ie you passed in the in-order RAM segments?
Why does ensureContiguousMerge exist? Is it for
assertions or is there a reason segments need to be next to each
other when merging?
I think the only thing that'd break is IndexWriter now assumes continuity when it removes the old segments and puts the new one in. LUCENE-1076 explores allowing MergePolicy to select non-contiguous merges.
But: with autoCommit=false, in order to avoid merging the doc stores, the segments (even RAM segments) must be contiguous. This is a sizable performance gain when building a large index in one IndexWriter session.

RAM buffer size is stored in the writer rather than set into
DocumentsWriter. This is due to the actual ram buffer limit in
NRT changing depending on the size of the ramdir.

NRTMergePolicy and IW.resolveRAMSegments merges all ram dir
segments to primaryDir (i.e. disk) when the ramDir is over
totalMax, or any new merges would put ramDir over totalMax.

In DocumentsWriter we have a set limit on the buffer size
which is (tempMax - ramDirSize)/2. This keeps the total ram used
under the totalMax (or IW.maxBufferSize), while also keeping our
temporary ram usage under the tempMax amount. When DW.ramBuffer
limit is reached, it's auto flushed to the ramDir.

All tests pass except TestIndexWriterRAMDir.testFSDirectory.
Will look into this further. When flushToRAM is on by default,
there seems to be deadlock in
org.apache.lucene.TestMergeSchedulerExternal, however when I
tried to see if there is any via jconsole by setting
ANT_OPTS="-Dcom.sun.management.jmxremote" I didn't see any. I'm
not sure if this is due to not connecting to the right process?
Or something else.

Added testReadDocuments which insures we can read documents
we've flushed to disk. This essentially tests our ability to
simultaneously read and write documents to and from the
docstore. It seemd to work on Windows.

I think there's more that can be done to more accurately
manage the RAM however I think the way it works is a good
starting point.

Jason Rutherglen
added a comment - 05/Jun/09 04:44
RAM buffer size is stored in the writer rather than set into
DocumentsWriter. This is due to the actual ram buffer limit in
NRT changing depending on the size of the ramdir.
NRTMergePolicy and IW.resolveRAMSegments merges all ram dir
segments to primaryDir (i.e. disk) when the ramDir is over
totalMax, or any new merges would put ramDir over totalMax.
In DocumentsWriter we have a set limit on the buffer size
which is (tempMax - ramDirSize)/2. This keeps the total ram used
under the totalMax (or IW.maxBufferSize), while also keeping our
temporary ram usage under the tempMax amount. When DW.ramBuffer
limit is reached, it's auto flushed to the ramDir.
All tests pass except TestIndexWriterRAMDir.testFSDirectory.
Will look into this further. When flushToRAM is on by default,
there seems to be deadlock in
org.apache.lucene.TestMergeSchedulerExternal, however when I
tried to see if there is any via jconsole by setting
ANT_OPTS="-Dcom.sun.management.jmxremote" I didn't see any. I'm
not sure if this is due to not connecting to the right process?
Or something else.
Added testReadDocuments which insures we can read documents
we've flushed to disk. This essentially tests our ability to
simultaneously read and write documents to and from the
docstore. It seemd to work on Windows.
I think there's more that can be done to more accurately
manage the RAM however I think the way it works is a good
starting point.

OK let's push it to 3.1. It's very much in progress, but 1) the iterations are slow (it's a big patch), 2) it's a biggish change so I'd prefer to it shortly after a release, not shortly before, so it has plenty of time to "bake" on trunk.

Michael McCandless
added a comment - 15/Jun/09 15:35 OK let's push it to 3.1. It's very much in progress, but 1) the iterations are slow (it's a big patch), 2) it's a biggish change so I'd prefer to it shortly after a release, not shortly before, so it has plenty of time to "bake" on trunk.

Just wanted to give an update on this, I'm running the unit
tests with flushToRAM=true, the ones that fail are (mostly)
tests that look for files when they're now in RAM (temporarily)
and the like. I'm not sure what to do with these tests, 1)
ignore them (kind of hard to not run specific methods, I think)
2) or conditionalize them to run only if flushToRAM=false.

Jason Rutherglen
added a comment - 15/Jun/09 17:45 Just wanted to give an update on this, I'm running the unit
tests with flushToRAM=true, the ones that fail are (mostly)
tests that look for files when they're now in RAM (temporarily)
and the like. I'm not sure what to do with these tests, 1)
ignore them (kind of hard to not run specific methods, I think)
2) or conditionalize them to run only if flushToRAM=false.

TestThreadedOptimize is throwing a ensureContiguousMerge
exception. I think this is highlighting the change to merging
all ram segments to a single primaryDir segment can sometimes
lead to choosing segments that are non-contiguous? I'm not sure
of the best way to handle this.

Jason Rutherglen
added a comment - 15/Jun/09 20:56 TestThreadedOptimize is throwing a ensureContiguousMerge
exception. I think this is highlighting the change to merging
all ram segments to a single primaryDir segment can sometimes
lead to choosing segments that are non-contiguous? I'm not sure
of the best way to handle this.

Michael McCandless
added a comment - 16/Jun/09 14:25
I think this is highlighting the change to merging
all ram segments to a single primaryDir segment can sometimes
lead to choosing segments that are non-contiguous?
I don't see why that results in a non-contiguous merge? Ie, at all times the RAM segments should be on the tail of SegmentInfos? So if you merge all of them, in order, that merge should be contiguous?

The patch is cleaned up. A static variable IndexWriter.GLOBALNRT
is added, which allows all the tests to be run with
flushToRAM=true. I reran the tests which hopefully still work as
intended. Tests that looked for specific file names were changed
to work with NRT. Some of the tests are skipped entirely and
need to be written specifically for flushToRAM.

Jason Rutherglen
added a comment - 18/Jun/09 00:55 The patch is cleaned up. A static variable IndexWriter.GLOBALNRT
is added, which allows all the tests to be run with
flushToRAM=true. I reran the tests which hopefully still work as
intended. Tests that looked for specific file names were changed
to work with NRT. Some of the tests are skipped entirely and
need to be written specifically for flushToRAM.
TestIndexWriterMergePolicy,TestBackwardsCompatibility failures
are expected
TestIndexWriterRAMDir.testFSDirectory fails (will be fixed)
TestThreadedOptimize ensureContiguousMerge fails. This one is
a bit mysterious, perhaps the correct assertion will show where
it's going wrong.
I need to go through and mark the tests that can be converted to
be NRT specific.

TestThreadedOptimize passes, LogMergePolicy now filters the
segmentInfos based on the dir, rather than NRTMergePolicy
passing in only ramInfos or primaryInfos. LogMergePolicy is
careful to select contiguous segments, by passing in a subset of
segmentInfos, the merge policy selection broke down.

TestIndexWriter.testAddIndexOnDiskFull,
testAddIndexesWithCloseNoWait fails, which I don't think
happened before. testAddIndexOnDiskFull fails when
autoCommit=true which I'm not sure is a valid test by the time
this patch goes in but it probably needs to be looked into.

Jason Rutherglen
added a comment - 18/Jun/09 22:52
TestThreadedOptimize passes, LogMergePolicy now filters the
segmentInfos based on the dir, rather than NRTMergePolicy
passing in only ramInfos or primaryInfos. LogMergePolicy is
careful to select contiguous segments, by passing in a subset of
segmentInfos, the merge policy selection broke down.
TestIndexWriter.testAddIndexOnDiskFull,
testAddIndexesWithCloseNoWait fails, which I don't think
happened before. testAddIndexOnDiskFull fails when
autoCommit=true which I'm not sure is a valid test by the time
this patch goes in but it probably needs to be looked into.
The other previous notes are still valid.

Using a single segmentInfos in IW seems to be problematic as
we'll always potentially have different dir non-contiguous
infos. I'm seeing the error off and on in different test cases.
I will put together a patch separating the two dir infos in IW.

Jason Rutherglen
added a comment - 19/Jun/09 18:46 Using a single segmentInfos in IW seems to be problematic as
we'll always potentially have different dir non-contiguous
infos. I'm seeing the error off and on in different test cases.
I will put together a patch separating the two dir infos in IW.

On second thought, the previous idea would require quite a bit
of work. Perhaps OneMerge can have the segmentInfos (ramDir or
primaryDir) they were selected from and the ensureContiguous can
verify that? Then we'd adjust commitMerge to remove the newly
merged segments individually.

Jason Rutherglen
added a comment - 19/Jun/09 18:56 On second thought, the previous idea would require quite a bit
of work. Perhaps OneMerge can have the segmentInfos (ramDir or
primaryDir) they were selected from and the ensureContiguous can
verify that? Then we'd adjust commitMerge to remove the newly
merged segments individually.
I'll give this a try.

It's progressing. Randomly some tests fail such as the one noted below.

TestIndexWriter.testAddIndexesWithCloseNoWait fails with
"rollback() was called or addIndexes* hit an unhandled
exception", TestCrash.testWriterAfterCrash fails with " [junit]
java.io.FileNotFoundException: _a.fnm [junit] at
org.apache.lucene.store.MockRAMDirectory.openInput(MockRAMDirecto
ry.java:252) [junit] at
org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:67)"

assert in the ctor of MergeDocIDRemapper removed (not yet sure
how to replace it)

OneMerge.fromInfos is added which is the set of segmentinfos
the merge was selected from. This is for ensureContiguousMerge
where it's failing because we have essentially two different
sets of segmentInfos (ram and primaryDir) in the
IW.segmentInfos. They are not related, but for convenience are
kept together for most of IW, then are separated out in the
merge policy. If the goal of ensureContiguousMerge is to keep
docStoreSegments together, this will work as ramDir and
primaryDir docStores should not need to be adjacent (I think,
and need to verify).

Jason Rutherglen
added a comment - 22/Jun/09 22:16 It's progressing. Randomly some tests fail such as the one noted below.
TestIndexWriter.testAddIndexesWithCloseNoWait fails with
"rollback() was called or addIndexes* hit an unhandled
exception", TestCrash.testWriterAfterCrash fails with " [junit]
java.io.FileNotFoundException: _a.fnm [junit] at
org.apache.lucene.store.MockRAMDirectory.openInput(MockRAMDirecto
ry.java:252) [junit] at
org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:67)"
assert in the ctor of MergeDocIDRemapper removed (not yet sure
how to replace it)
OneMerge.fromInfos is added which is the set of segmentinfos
the merge was selected from. This is for ensureContiguousMerge
where it's failing because we have essentially two different
sets of segmentInfos (ram and primaryDir) in the
IW.segmentInfos. They are not related, but for convenience are
kept together for most of IW, then are separated out in the
merge policy. If the goal of ensureContiguousMerge is to keep
docStoreSegments together, this will work as ramDir and
primaryDir docStores should not need to be adjacent (I think,
and need to verify).

The included benchmark indexes 1 document, queries with a sort,
and repeats N times. This is way too taxing using the existing
IW.getReader when pointing to a hard drive, basically the
machine becomes unusable. I haven't figured out what to publish
or how long I want to wait. I'll just run it overnight.

Jason Rutherglen
added a comment - 23/Sep/09 02:03 Patch is updated to trunk and compiles. Unit tests fail because
they're hardcoded for specific numbers of files etc, otherwise I
don't think there's anything glaring.
The included benchmark indexes 1 document, queries with a sort,
and repeats N times. This is way too taxing using the existing
IW.getReader when pointing to a hard drive, basically the
machine becomes unusable. I haven't figured out what to publish
or how long I want to wait. I'll just run it overnight.
This patch's benchmark fairs much better, I'll publish it's
results as well. Thinking 1, 10, 25, 50, 100, 1000 doc
increments should suffice, so there's some variation.

I think this patch has a memory leak, I'm trying to get a copy of an open source Yourkit license for profiling the heap usage. The code is unfortunately quite large so whatever it is, is probably easy to fix and hard to find.

Jason Rutherglen
added a comment - 16/Oct/09 02:10 I think this patch has a memory leak, I'm trying to get a copy of an open source Yourkit license for profiling the heap usage. The code is unfortunately quite large so whatever it is, is probably easy to fix and hard to find.

Realizing the previous patches approach has grown too
complicated, this is a far simpler implementation that fulfills
the same goal, batching segments in RAM until they exceed a
given maximum size, then merging those RAM segments to a primary
directory (i.e. disk). All the while allowing all segments to be
searchable with a minimum latency turnaround.

Segment names are generated for the ram writer from the
primary writer, this insures name continuity. Actually I'm not
sure this is necessary anymore.

The problem is when the ram segments are merged into the
primary writer, they appear to be non-contiguous. Some of the
contiguous segment checking has been relaxed for this case, and
needs to be conditional on the segment merging being from the
ram dir. Perhaps we can have our cake and eat it too here by
keeping the contiguous check around for all cases?

When the ram writer's usage exceeds a specified size, the ram
buffer is flushed, and the ram segments are synchronously merged
to the primary writer using a mechanism similar to
addIndexesNoOptimize.

Jason Rutherglen
added a comment - 02/Nov/09 19:04 Realizing the previous patches approach has grown too
complicated, this is a far simpler implementation that fulfills
the same goal, batching segments in RAM until they exceed a
given maximum size, then merging those RAM segments to a primary
directory (i.e. disk). All the while allowing all segments to be
searchable with a minimum latency turnaround.
Segment names are generated for the ram writer from the
primary writer, this insures name continuity. Actually I'm not
sure this is necessary anymore.
The problem is when the ram segments are merged into the
primary writer, they appear to be non-contiguous. Some of the
contiguous segment checking has been relaxed for this case, and
needs to be conditional on the segment merging being from the
ram dir. Perhaps we can have our cake and eat it too here by
keeping the contiguous check around for all cases?
When the ram writer's usage exceeds a specified size, the ram
buffer is flushed, and the ram segments are synchronously merged
to the primary writer using a mechanism similar to
addIndexesNoOptimize.

Jason Rutherglen
added a comment - 03/Nov/09 00:37
Ensure contiguous is mostly back
Cleaned up the code and made the flush method non-synchronized
There's a subtle synchronization bug causing files to not be found in the testRandomThreads method
There's excessive merge logging to debug the sync issue

I wanted to simplify a little more to more easily understand the
edge cases that fail. In the multithreaded test, files were
sometimes left open which is hard for me to debug.

The TestNRT.testSimple method passes however, IndexFileDeleter
is complaining about not being able to delete when expected
which is shown in the IW.infoStream.

The NRT.flush method creates a merge of all the ram segments,
then calls IW.mergeIn to manually merge the ram segments into
the writer. OneMerge contains the writer where the segment
readers should be obtained from. In this case, the primary
writer obtains the readers from the ram writer's readerpool.
This is important because deletes may be coming in as we're
merging. However I'm not sure this will work without a shared
lock between the writers for commitMergedDeletes which requires
syncing.

Jason Rutherglen
added a comment - 04/Nov/09 19:51 I wanted to simplify a little more to more easily understand the
edge cases that fail. In the multithreaded test, files were
sometimes left open which is hard for me to debug.
The TestNRT.testSimple method passes however, IndexFileDeleter
is complaining about not being able to delete when expected
which is shown in the IW.infoStream.
The NRT.flush method creates a merge of all the ram segments,
then calls IW.mergeIn to manually merge the ram segments into
the writer. OneMerge contains the writer where the segment
readers should be obtained from. In this case, the primary
writer obtains the readers from the ram writer's readerpool.
This is important because deletes may be coming in as we're
merging. However I'm not sure this will work without a shared
lock between the writers for commitMergedDeletes which requires
syncing.
Mike, can you take a look to see if this path will work?

The tests pass, and the previous kinks seem to be worked out.
Actually there is still one issue, where in waitForMerges, the
assert mergingSegments size equals zero occasionally fails. I
think this is a small sync problem because of the manual merge
between the two writers.

I'll run the multi threaded test at a longer interval to see
what other errors may crop up. Once it runs successfully for
lets say 30 minutes, we can beef up the stress testing of this
patch by doing concurrent updates, deletes, etc.

Jason Rutherglen
added a comment - 05/Nov/09 02:05 The tests pass, and the previous kinks seem to be worked out.
Actually there is still one issue, where in waitForMerges, the
assert mergingSegments size equals zero occasionally fails. I
think this is a small sync problem because of the manual merge
between the two writers.
I'll run the multi threaded test at a longer interval to see
what other errors may crop up. Once it runs successfully for
lets say 30 minutes, we can beef up the stress testing of this
patch by doing concurrent updates, deletes, etc.

Jason Rutherglen
added a comment - 05/Nov/09 02:15 Alright, the issue was simple, OneMerge.registerDone was being set to false by the primary writer, so the ram writer wasn't removing the infos from mergingSegments in mergeFinish.

TestNRTReaderWithThreads2 fails periodically, it's just another
synchronization issue. I added syncing on the merge writer in
methods like commitMergedDeletes and commitMerge. Perhaps more
of that type of syncing needs to be added. It can take time for
these issues to be figured out.

There's also remnants of a first attempt at transparently
utilizing the NRT class within IW.

Jason Rutherglen
added a comment - 05/Nov/09 06:34 TestNRTReaderWithThreads2 fails periodically, it's just another
synchronization issue. I added syncing on the merge writer in
methods like commitMergedDeletes and commitMerge. Perhaps more
of that type of syncing needs to be added. It can take time for
these issues to be figured out.
There's also remnants of a first attempt at transparently
utilizing the NRT class within IW.

Jason Rutherglen
added a comment - 05/Nov/09 21:43 OK, all the tests pass consistently now.
I guess the next feature is to have NRT.flush execute in a single background thread rather than block update doc calls.

Well, I added the background thread for NRT.flush, however, I've also been debugging this assert !sr.hasChanges issue, which out of 7000 iterations, occurs once, and is fairly minor. Hmm... Apply deletes shouldn't really conflict so I'm hoping this isn't an original bug unrelated to LUCENE-1313.

Jason Rutherglen
added a comment - 06/Nov/09 01:23 Well, I added the background thread for NRT.flush, however, I've also been debugging this assert !sr.hasChanges issue, which out of 7000 iterations, occurs once, and is fairly minor. Hmm... Apply deletes shouldn't really conflict so I'm hoping this isn't an original bug unrelated to LUCENE-1313 .

Jason Rutherglen
added a comment - 09/Nov/09 20:47 This patch includes flushing in a background thread. Some formatting has been cleaned up, javadocs added.
I ran TestNRTReaderWithThreads2 a couple times for kicks and didn't see the assert sr.hasChanges error. I'll probably focus on adding more stress testing.

I went back to trying to utilize a RAM dir inside of IW. This
actually works well now, and the code is less intrusive than the
previous patches. The incoming directory is placed in a
PrefixSwitchDirectory which accepts indicating whether a segment
is destined for the ram directory or the primary directory.

A single internal segment infos collection is used, the first
half are primary dir segments, the second, ram dir segments.

The above mentioned changes of course break many unit tests. I'm
going through and evaluating what do on a case by case basis,
and am open to suggestions.

Jason Rutherglen
added a comment - 24/Nov/09 23:53 I went back to trying to utilize a RAM dir inside of IW. This
actually works well now, and the code is less intrusive than the
previous patches. The incoming directory is placed in a
PrefixSwitchDirectory which accepts indicating whether a segment
is destined for the ram directory or the primary directory.
A single internal segment infos collection is used, the first
half are primary dir segments, the second, ram dir segments.
The above mentioned changes of course break many unit tests. I'm
going through and evaluating what do on a case by case basis,
and am open to suggestions.

I've just tried applying this patch to my checked-out version of trunk (revision 895585) but it appears that the PrefixSwitchDirectory class is missing - is there another patch that is needed to get this working?

Jingkei Ly
added a comment - 04/Jan/10 14:31 I've just tried applying this patch to my checked-out version of trunk (revision 895585) but it appears that the PrefixSwitchDirectory class is missing - is there another patch that is needed to get this working?

I've tested it out but it appears that in my extreme use case (I need to handle a case where 1 document is added to the index and then is made immediately available for searching many times a second) the patch version seems to be about 1/3rd slower than the current NRT implementation in Lucene 3.0, with most of the time lost in searching. I'd expected the RAMDirectory implementation to be faster - is what I'm seeing against your expectations too?

Jingkei Ly
added a comment - 05/Jan/10 14:42 Thanks Jason, the patch applies cleanly now.
I've tested it out but it appears that in my extreme use case (I need to handle a case where 1 document is added to the index and then is made immediately available for searching many times a second) the patch version seems to be about 1/3rd slower than the current NRT implementation in Lucene 3.0, with most of the time lost in searching. I'd expected the RAMDirectory implementation to be faster - is what I'm seeing against your expectations too?

Jason Rutherglen
added a comment - 05/Jan/10 15:37 > the patch version seems to be about 1/3rd slower than the current NRT implementation in Lucene 3.0
How are you measuring the 1/3? Can you post your test? It may be helpful to create a similar test using Lucene's benchmark code.
> with most of the time lost in searching
Is it QPS or query times that are going down?

I've attached the benchmarking test I've based my observations on. My use case is to add 1 document to the index and to perform a search immediately afterwards and be guaranteed that the document I previously added is available in the search (which it is why it is done synchronously in 1 thread).

Apologies for the sysouts on which I'm basing my observations on - the main point is that adding a few thousand docs in this fashion appears to be faster in Lucene 3.0 when compared to the equivalent with trunk+patch. The 1/3rd slower observation I made is based on the fact that on my machine (quad-core, 4GB RAM, Windows XP), Lucene 3.0 is able to process 35 docs per second in the fashion I've described, whereas trunk + patch runs at 22 docs per second.

Having rerun the test, It appears the average time it takes for a query to complete is slower but it's actually the average time it takes to get a new realtime reader that is much slower.

I haven't ruled out having done something really stupid in my test, in which case I apologise in advance!

Jingkei Ly
added a comment - 05/Jan/10 16:46 Jason,
I've attached the benchmarking test I've based my observations on. My use case is to add 1 document to the index and to perform a search immediately afterwards and be guaranteed that the document I previously added is available in the search (which it is why it is done synchronously in 1 thread).
Apologies for the sysouts on which I'm basing my observations on - the main point is that adding a few thousand docs in this fashion appears to be faster in Lucene 3.0 when compared to the equivalent with trunk+patch. The 1/3rd slower observation I made is based on the fact that on my machine (quad-core, 4GB RAM, Windows XP), Lucene 3.0 is able to process 35 docs per second in the fashion I've described, whereas trunk + patch runs at 22 docs per second.
Having rerun the test, It appears the average time it takes for a query to complete is slower but it's actually the average time it takes to get a new realtime reader that is much slower.
I haven't ruled out having done something really stupid in my test, in which case I apologise in advance!

Lucene 3.0 is able to process 35 docs per second in the
fashion I've described, whereas trunk + patch runs at 22 docs
per second.

I glanced at the benchmark posted. It'll take some digging into,
I ran tests before like this (add 1 doc, do a search) andLUCENE-1313 was faster than trunk/3.0. On Windows XP it was
definitely faster before (I saw XP perform lots of disk access
when creating small files) so perhaps there's something else
going on.

Jason Rutherglen
added a comment - 05/Jan/10 16:57 Lucene 3.0 is able to process 35 docs per second in the
fashion I've described, whereas trunk + patch runs at 22 docs
per second.
I glanced at the benchmark posted. It'll take some digging into,
I ran tests before like this (add 1 doc, do a search) and
LUCENE-1313 was faster than trunk/3.0. On Windows XP it was
definitely faster before (I saw XP perform lots of disk access
when creating small files) so perhaps there's something else
going on.