Details

Description

I can't find any way to close the IndexSearcher (and IndexReader) that
is being used by SpellChecker internally.

I've worked around this issue by keeping a single SpellChecker open
for each index, but I'd really like to be able to close it and
reopen it on demand without leaking file descriptors.

Could we add a close() method to SpellChecker that will close the
IndexSearcher and null the reference to it? And perhaps add some code
that reopens the searcher if the reference to it is null? Or would
that break thread safety of SpellChecker?

The attached patch adds a close method but leaves it to the user to
call setSpellIndex to reopen the searcher if desired.

Mike, I changed the testcase to be 1.4 compatible this might help you to merge the spellchecker into 2.9.1 since I can not commit into branches.
It does not make sense to create a patch against the branch as you really want the mergeinfo and don't do it by patching things in branches.

Simon Willnauer
added a comment - 05/Dec/09 22:30 Mike, I changed the testcase to be 1.4 compatible this might help you to merge the spellchecker into 2.9.1 since I can not commit into branches.
It does not make sense to create a patch against the branch as you really want the mergeinfo and don't do it by patching things in branches.
simon

Shalin Shekhar Mangar
added a comment - 05/Dec/09 20:53 I ran into index corruption during stress testing in SOLR-785 . After upgrading contrib-spellcheck to lucene trunk, those issues are no longer reproducible. You guys have saved me a lot of time
Thanks for fixing this!

Simon Willnauer
added a comment - 05/Dec/09 14:20 Mike, I just realized that we need to change the test as it uses java5
classes. I will provide you a patch compatible w. 1.4 later.
On Dec 5, 2009 2:35 PM, "Michael McCandless (JIRA)" <jira@apache.org> wrote:
[
https://issues.apache.org/jira/browse/LUCENE-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786406#action_12786406 ]
Michael McCandless commented on LUCENE-2108 :
--------------------------------------------
We should backport this change to 2.9 - can you commit that please, I
can not though.
And, to 3.0. OK will do...
by SpellChecker internally
-----------------------------------------------------------------------------------------------------
LUCENE-2108 .patch, LUCENE-2108 .patch, LUCENE-2108 .patch, LUCENE-2108 .patch
–
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Only small concern... you hold the lock while opening the new searcher....

I fixed that one - very important for performance though!

I found another issue, write access is not synchronized at all so it is possible to concurrently reindex or at least call indexDictionary() concurrently. In the first place this is not a huge issue as the writer will raise an exception but if the spellIndex is reset while the indexDicitonary is still in progress we could have inconsistencies with searcher and spellindex.
I added another lock for write operations for now,

Simon Willnauer
added a comment - 04/Dec/09 19:03 Only small concern... you hold the lock while opening the new searcher....
I fixed that one - very important for performance though!
I found another issue, write access is not synchronized at all so it is possible to concurrently reindex or at least call indexDictionary() concurrently. In the first place this is not a huge issue as the writer will raise an exception but if the spellIndex is reset while the indexDicitonary is still in progress we could have inconsistencies with searcher and spellindex.
I added another lock for write operations for now,

Only small concern... you hold the lock while opening the new searcher. It would be better to open the new searcher without the lock, then only acquire the lock to do the swap; this way any respell requests that come in don't block while the searcher is being opened (because obtainSearcher() needs to get the lock).

Michael McCandless
added a comment - 04/Dec/09 18:11 Looks good! Nice and simple.
Only small concern... you hold the lock while opening the new searcher. It would be better to open the new searcher without the lock, then only acquire the lock to do the swap; this way any respell requests that come in don't block while the searcher is being opened (because obtainSearcher() needs to get the lock).

Simon Willnauer
added a comment - 04/Dec/09 17:33 If you back-port this to 2.9, you can't use any of the java.util.concurrent.*
Very good point! - didn't thought about back porting at all.
I'm not sure you need a separate SearcherHolder class - can't you re-use IndexReader's decRef/incRef?
I guess I did not see the simplicity the reader offers - blind due to java.util.concurrent.*
I think there are some thread safety issues..
this is weird - on my dev machine and in the patch it is not synchronized.. on the machine I run the tests it is. Anyway you are right.
I changed the code to be compatible with 2.9 using indexReaders.dec/incRef.. will attache in a minute

You don't need to do this.XXX in most places (maybe you're coming
from eg Python? ).

Maybe rename "releaseNewSearcher" -> "swapSearcher"? (Because it
releases the old one and installs the new one).

I think there are some thread safety issues – eg
getSearcherHolder isn't sync'd, so, when you incRef
this.searcherHolder at the start, but then return
this.searcherHolder at the end, it could be two different
instances.

Michael McCandless
added a comment - 04/Dec/09 16:12 Some feedback on the patch:
If you back-port this to 2.9, you can't use any of the
java.util.concurrent.*
I'm not sure you need a separate SearcherHolder class – can't you
re-use IndexReader's decRef/incRef?
You don't need to do this.XXX in most places (maybe you're coming
from eg Python? ).
Maybe rename "releaseNewSearcher" -> "swapSearcher"? (Because it
releases the old one and installs the new one).
I think there are some thread safety issues – eg
getSearcherHolder isn't sync'd, so, when you incRef
this.searcherHolder at the start, but then return
this.searcherHolder at the end, it could be two different
instances.

Simon Willnauer
added a comment - 04/Dec/09 12:29 This patch adds a close operation to SpellChecker and enables thread-safety.
I added a testcase for concurrency as well as the close method - comments and review welcome!

Eirik, could you open a new issue to address SpellChecker's non-thread-safety? I actually thing simply documenting clearly that it's not thread safe is fine.

Mike, IMO thread-safety and close() are closely related. If you close the spellchecker you don't want other threads to access the spellchecker anymore. I would keep that inside this issue and rather add close + make it threadsafe in one go.
Since spellchecker instances are shared between threads already we should rather try to make it thread-safe than documenting it. I see this as a bug though as you really want to share the spellchecker (essentially the searcher) instance instead of opening one for each thread.

Simon Willnauer
added a comment - 04/Dec/09 10:54 Eirik, could you open a new issue to address SpellChecker's non-thread-safety? I actually thing simply documenting clearly that it's not thread safe is fine.
Mike, IMO thread-safety and close() are closely related. If you close the spellchecker you don't want other threads to access the spellchecker anymore. I would keep that inside this issue and rather add close + make it threadsafe in one go.
Since spellchecker instances are shared between threads already we should rather try to make it thread-safe than documenting it. I see this as a bug though as you really want to share the spellchecker (essentially the searcher) instance instead of opening one for each thread.

Michael McCandless
added a comment - 03/Dec/09 23:02 Just a reminder - we need to fix the CHANGES.TXT entry once this is done.
Simon how about you do this, and take this issue (to commit your improvement to throw ACE not NPE)? Thanks

Michael McCandless
added a comment - 03/Dec/09 23:01 Eirik, could you open a new issue to address SpellChecker's non-thread-safety? I actually thing simply documenting clearly that it's not thread safe is fine.

1) Concurrent invocations of suggestSimilar() should not interfere with each other.
2) An invocation of any of the write methods (setSpellIndex, clearIndex, indexDictionary) should not interfere with aleady invoked suggestSimilar
3) All calls to write methods should be serialized (We could probably synchronize these methods?)

If we synchronize any writes to the searcher reference, couldn't suggestSimilar just start its work by putting searcher in a local variable and use that instead of the field?

Eirik Bjorsnos
added a comment - 03/Dec/09 22:57 Mike,
Please account for my demonstrated stupidity when considering this suggestion for thread safety policy / goals:
1) Concurrent invocations of suggestSimilar() should not interfere with each other.
2) An invocation of any of the write methods (setSpellIndex, clearIndex, indexDictionary) should not interfere with aleady invoked suggestSimilar
3) All calls to write methods should be serialized (We could probably synchronize these methods?)
If we synchronize any writes to the searcher reference, couldn't suggestSimilar just start its work by putting searcher in a local variable and use that instead of the field?
I guess concurrency is hard to get right..

I'd assume ensureOpen needs to be synchronized some way so that two threads can't open IndexSearchers concurrently?

this class is not threadsafe anyway. If you look at this snippet:

// close the old searcher, if there was one
if (searcher != null) {
searcher.close();
}
searcher = new IndexSearcher(this.spellIndex, true);

there could be a race if you concurrently reindex or set a new dictionary. IMO this should either be documented or made threadsafe. The close method should invalidate the spellchecker - it should not be possible to use a already closed Spellchecker.
The searcher should be somehow ref counted so that if there is a searcher still in use you can concurrently reindex / add a new dictionary to ensure that the same searcher is used throughout suggestSimilar().

Simon Willnauer
added a comment - 03/Dec/09 22:03 I'd assume ensureOpen needs to be synchronized some way so that two threads can't open IndexSearchers concurrently?
this class is not threadsafe anyway. If you look at this snippet:
// close the old searcher, if there was one
if (searcher != null ) {
searcher.close();
}
searcher = new IndexSearcher( this .spellIndex, true );
there could be a race if you concurrently reindex or set a new dictionary. IMO this should either be documented or made threadsafe. The close method should invalidate the spellchecker - it should not be possible to use a already closed Spellchecker.
The searcher should be somehow ref counted so that if there is a searcher still in use you can concurrently reindex / add a new dictionary to ensure that the same searcher is used throughout suggestSimilar().
I will take care of it once I get back tomorrow.

Eirik Bjorsnos
added a comment - 03/Dec/09 21:35 Well not exactly. Simon's suggestion was just to throw an AlreadyClosedException instead of a NullPointerException which is probably ok and definitely easier.

Eirik Bjorsnos
added a comment - 03/Dec/09 21:19 Simon,
Yes, that sound excactly like what I was thinking when I said "some code
that reopens the searcher if the reference to it is null".
I just didn't include it in my patch because I couldn't figure out how to do it properly.
I'd assume ensureOpen needs to be synchronized some way so that two threads can't open IndexSearchers concurrently?

If you set the searcher to null you might risk a NPE if suggestSimilar() or other methods are called afterwards. I would like to see something like ensureOpen() which throws an AlreadyClosedException or something similar. I will upload a suggestion in a second but need to run so tread it just as a suggestion.

Simon Willnauer
added a comment - 03/Dec/09 21:14 Mike / Eirik,
If you set the searcher to null you might risk a NPE if suggestSimilar() or other methods are called afterwards. I would like to see something like ensureOpen() which throws an AlreadyClosedException or something similar. I will upload a suggestion in a second but need to run so tread it just as a suggestion.
Simon

Michael McCandless
added a comment - 03/Dec/09 20:32 Note that you said "private" again I'm starting to wonder if you are not human! Is this a turing test?
OK, ok, I'll make it public, and port back to the 3.0 branch!

Eirik Bjorsnos
added a comment - 03/Dec/09 20:22 Haha, this is why I said the patch should be "pretty" trivial, instead of just "trivial"
Yes, it should certainly be private. No idea how that happend. Must have been sleeping at the keyboad.

Eirik Bjorsnos
added a comment - 03/Dec/09 20:11 Patch that adds a close method to SpellChecker. The method calls close on the searcher used and then nulls the reference so that a new IndexSearcher will be created by the next call to setSpellIndex