TimeLimitedIndexReader and associated utility class

Details

Description

An alternative to TimeLimitedCollector that has the following advantages:

1) Any reader activity can be time-limited rather than just single searches e.g. the document retrieve phase.
2) Times out faster (i.e. runaway queries such as fuzzies detected quickly before last "collect" stage of query processing)

Uses new utility timeout class that is independent of IndexReader.

Initial contribution includes a performance test class but not had time as yet to work up a formal Junit test.
TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

Activity

In stop(), shouldn't the 'else' part be reached only if the firstAnticipatedThreadToFail == Thread.currentThread()? Currently, if no thread has timed out, and I'm not the firstAnticipatedThreadToFail, the code will still look for a new candidate, and probably find the same firstAnticipatedThreadToFail. Right?

Also, even though that's somewhat mentioned in the class, we don't support multiple timing out threads, and I'm not sure if that's good. Currently, if two threads time out, and the calling thread to checkTimeOutIsThisThread() is not firstAnticipatedThreadToFail, it will continue processing. That may not be good, if the other thread is busy-waiting somewhere, and may not call checkTimeOutIsThisThread for a long time.

What if we change firstAnticipatedThreadToFail to a HashSet and call contains()? It's slower than '==', but safer, which is also an important aspect of this utility. TimeoutThread can add all the timeoud threads to this HashSet, when it detects a timeout has occurred (by iterating on all the 'registered' threads and their expected time out time, and compare to the current time). What do you think?

Shai Erera
added a comment - 29/Jun/09 13:08 In stop(), shouldn't the 'else' part be reached only if the firstAnticipatedThreadToFail == Thread.currentThread()? Currently, if no thread has timed out, and I'm not the firstAnticipatedThreadToFail, the code will still look for a new candidate, and probably find the same firstAnticipatedThreadToFail. Right?
Also, even though that's somewhat mentioned in the class, we don't support multiple timing out threads, and I'm not sure if that's good. Currently, if two threads time out, and the calling thread to checkTimeOutIsThisThread() is not firstAnticipatedThreadToFail, it will continue processing. That may not be good, if the other thread is busy-waiting somewhere, and may not call checkTimeOutIsThisThread for a long time.
What if we change firstAnticipatedThreadToFail to a HashSet and call contains()? It's slower than '==', but safer, which is also an important aspect of this utility. TimeoutThread can add all the timeoud threads to this HashSet, when it detects a timeout has occurred (by iterating on all the 'registered' threads and their expected time out time, and compare to the current time). What do you think?

Currently the class hinges on a "fast fail" mechanism whereby all the many calls checking for a timeout are very quickly testing a single volatile boolean, "anActivityHasTimedOut".
99.99% of calls are expected to fail this test (nothing has timed out) and fail quickly - I was reluctant to add any hashset lookup etc in there needed to determine failure.

With that as a guiding principle maybe the solution is to change
volatile boolean anActivityHasTimedOut
into
volatile int numberOfTimedOutThreads;

which would cater for >1 error condition at once. The fast-fail check then becomes:
if(numberOfTimedOutThreads > 0)
{
if(timedoutThreads.contains(Thread.currentThread)

Mark Harwood
added a comment - 29/Jun/09 14:19 Currently the class hinges on a "fast fail" mechanism whereby all the many calls checking for a timeout are very quickly testing a single volatile boolean, "anActivityHasTimedOut".
99.99% of calls are expected to fail this test (nothing has timed out) and fail quickly - I was reluctant to add any hashset lookup etc in there needed to determine failure.
With that as a guiding principle maybe the solution is to change
volatile boolean anActivityHasTimedOut
into
volatile int numberOfTimedOutThreads;
which would cater for >1 error condition at once. The fast-fail check then becomes:
if(numberOfTimedOutThreads > 0)
{
if(timedoutThreads.contains(Thread.currentThread)
{
timedoutThreads.remove(Thread.currentThread);
numberOfTimedOutThreads=timedoutThreads.size();
throw RuntimeException.....
}
}

it's been late for this issue, but maybe worth thinking about. We could change semantics of this problem completely. Imo, the problem can be reformulated as "Provide possibility to cancel running queries on best effort basis, with or without providing so far collected results"

That would leave Timer management to the end users and make an issue focus on one "Lucene core" ... Timeout management can be then provided as an example somewhere "How to implement Timeout management using ..."

Eks Dev
added a comment - 29/Jun/09 14:23 it's been late for this issue, but maybe worth thinking about. We could change semantics of this problem completely. Imo, the problem can be reformulated as "Provide possibility to cancel running queries on best effort basis, with or without providing so far collected results"
That would leave Timer management to the end users and make an issue focus on one "Lucene core" ... Timeout management can be then provided as an example somewhere "How to implement Timeout management using ..."

Oh, I did not mean to skip this check. After anActivityHasTimedOut is true, instead of comparing Thread.currentThread() to firstAnticipatedThreadToFail, check if Thread.currentThread() is in the failed HashSet of threads, or something like that.

I totally agree this should be kept and used that way, and it's probably better than numberOfTimedOutThreads since we don't need to inc/dec the latter every failure, just set a boolean flag and test it.

Imo, the problem can be reformulated as "Provide possibility to cancel running queries on best effort basis, with or without providing so far collected results".

That's where we started from, but Mark here wanted to provide a much more generalized way of stopping any other activity, not just search. With this utility class, someone can implement a TimeLimitedIndexWriter which times out indexing, merging etc. Search is just one operation which will be covered as well.

I also think that TimeLimitingCollector already provides a possibility to "cancel running queries on a best effort basis" and therefore if someone is interested in just that, he doesn't need to use TimeLimitedIndexReader. However this approach seems much more simple if you want to ensure queries are stopped ASAP, w/o passing a Timeout object around or anything. This approach also guarantees (I think) that any custom Scorer which does a lot of work, but uses IndexReader for that, will be stopped, even if the Scorer's developer did not implement a Timeout mechanism. Right?

Shai Erera
added a comment - 29/Jun/09 14:42 ... quickly testing a single volatile boolean, "anActivityHasTimedOut".
Oh, I did not mean to skip this check. After anActivityHasTimedOut is true, instead of comparing Thread.currentThread() to firstAnticipatedThreadToFail, check if Thread.currentThread() is in the failed HashSet of threads, or something like that.
I totally agree this should be kept and used that way, and it's probably better than numberOfTimedOutThreads since we don't need to inc/dec the latter every failure, just set a boolean flag and test it.
Imo, the problem can be reformulated as "Provide possibility to cancel running queries on best effort basis, with or without providing so far collected results".
That's where we started from, but Mark here wanted to provide a much more generalized way of stopping any other activity, not just search. With this utility class, someone can implement a TimeLimitedIndexWriter which times out indexing, merging etc. Search is just one operation which will be covered as well.
I also think that TimeLimitingCollector already provides a possibility to "cancel running queries on a best effort basis" and therefore if someone is interested in just that, he doesn't need to use TimeLimitedIndexReader. However this approach seems much more simple if you want to ensure queries are stopped ASAP, w/o passing a Timeout object around or anything. This approach also guarantees (I think) that any custom Scorer which does a lot of work, but uses IndexReader for that, will be stopped, even if the Scorer's developer did not implement a Timeout mechanism. Right?

But the check is on a variable with a yes/no state. We need to cater for >1 simultaneous timeout error condition in play. With only a boolean it could be hard to know precisely when to clear it, no?

Mark here wanted to provide a much more generalized way of stopping any other activity, not just search

To be fair I think the use case for IndexWriter is weaker. In reader you have multiple users all expressing different queries and you want them all to share nicely with each other. In index writing it's typically a batch system indexing docs and there's no "fairness" to mediate? Breaking it out into a utility class seems like a good idea anyway.

Mark Harwood
added a comment - 29/Jun/09 14:58 Oh, I did not mean to skip this check.
But the check is on a variable with a yes/no state. We need to cater for >1 simultaneous timeout error condition in play. With only a boolean it could be hard to know precisely when to clear it, no?
Mark here wanted to provide a much more generalized way of stopping any other activity, not just search
To be fair I think the use case for IndexWriter is weaker. In reader you have multiple users all expressing different queries and you want them all to share nicely with each other. In index writing it's typically a batch system indexing docs and there's no "fairness" to mediate? Breaking it out into a utility class seems like a good idea anyway.

Sure, I just wanted to "sharpen definition" what is Lucene core issue, and what we can leave to end users. It is not only about the time, rather about canceling search requests (even better, general activities).

Eks Dev
added a comment - 29/Jun/09 15:03 Sure, I just wanted to "sharpen definition" what is Lucene core issue, and what we can leave to end users. It is not only about the time, rather about canceling search requests (even better, general activities).

With only a boolean it could be hard to know precisely when to clear it, no?

We can cleat it when the time out threads' Set's size() is 0?

I agree that this issue is mostly about IndexReader (and hence the name), and that the scenario of IndexWriter is weaker. But a utility class together w/ the TimeLimitedIndexReader example can help someone write a TimeLimitedIndexWriter very easily, and/or reuse this utility elsewhere.

Shai Erera
added a comment - 29/Jun/09 15:03 With only a boolean it could be hard to know precisely when to clear it, no?
We can cleat it when the time out threads' Set's size() is 0?
I agree that this issue is mostly about IndexReader (and hence the name), and that the scenario of IndexWriter is weaker. But a utility class together w/ the TimeLimitedIndexReader example can help someone write a TimeLimitedIndexWriter very easily, and/or reuse this utility elsewhere.

Mark Harwood
added a comment - 29/Jun/09 15:21 any custom Scorer which does a lot of work, but uses IndexReader for that, will be stopped, even if the Scorer's developer did not implement a Timeout mechanism. Right?
Correct. I'm not familiar with the proposal to pass around a Timeout object but I get the idea and the code here would certainly avoid that overhead.
We can cleat it when the time out threads' Set's size() is 0?
Yes, that would work.

On the email thread I offered to create on QueryWeight a scorer(IndexSearcher, boolean, boolean, Timeout) in order to pass a Timeout object to Scorer, and also create a TimeLimitedQuery. But it's no longer needed.

Shai Erera
added a comment - 29/Jun/09 15:26 I'm not familiar with the proposal to pass around a Timeout object
On the email thread I offered to create on QueryWeight a scorer(IndexSearcher, boolean, boolean, Timeout) in order to pass a Timeout object to Scorer, and also create a TimeLimitedQuery. But it's no longer needed.

Maybe we can benchmark this approach to see if it slows down
queries due to the the Thread.currentThread and hash lookup? As
this would go into 3.0 maybe we can look at how to change
the Lucene API such that we pass in an argument to the
IndexReader methods where the timeout may be checked for?

Jason Rutherglen
added a comment - 30/Jun/09 00:07 Maybe we can benchmark this approach to see if it slows down
queries due to the the Thread.currentThread and hash lookup? As
this would go into 3.0 maybe we can look at how to change
the Lucene API such that we pass in an argument to the
IndexReader methods where the timeout may be checked for?

if it slows down queries due to the the Thread.currentThread and hash lookup

This lookup only happens when threads start or stop timed activities and when there is a timed out state - all other method invocations on TimeLimitedIndexReader eg termDocs.next() are simply testing a volatile boolean which is used to indicate if any timeout has occurred. This seems to be fast in my benchmarks.

maybe we can .. change the Lucene API such that we pass in an argument to the IndexReader methods where the timeout may be checked

Mark Harwood
added a comment - 30/Jun/09 07:51 Maybe we can benchmark this approach
See http://www.nabble.com/Improving-TimeLimitedCollector-td24174758.html#a24229185
The figures were produced by TestTimeLimitedIndexReader that is part of this Jira issue so you can try benchmarks on your own indexes.
if it slows down queries due to the the Thread.currentThread and hash lookup
This lookup only happens when threads start or stop timed activities and when there is a timed out state - all other method invocations on TimeLimitedIndexReader eg termDocs.next() are simply testing a volatile boolean which is used to indicate if any timeout has occurred. This seems to be fast in my benchmarks.
maybe we can .. change the Lucene API such that we pass in an argument to the IndexReader methods where the timeout may be checked
The current design uses static methods which remove the need to pass a timeout object as context everywhere but using this approach comes with the downside that a single client thread is unable to time >1 activity at once which we thought was a reasonable trade-off. See http://www.nabble.com/Re%3A-Improving-TimeLimitedCollector-p24234976.html

Jason Rutherglen
added a comment - 30/Jun/09 18:54 This lookup only happens when threads start or stop timed
activities and when there is a timed out state
Ah, so we're assuming most actions don't timeout, and when they
do we're then doing the slightly more expensive threadlocal
lookup.
Sounds good!
Maybe we can turn this into a patch, and create a benchmark .alg
etc (I'll volunteer to do the latter).

Mark Harwood
added a comment - 30/Jun/09 19:41 Ah, so we're assuming most actions don't timeout....
Yes, that's it.
(I'll volunteer to do the latter).
Cool. I'll work on tidying up the classes under test as per comments earlier .

Can we move to call Thread.currentThread() only once per method? Currently stop() calls it 3 times, start() 2 and checkTimeoutIsThisThread()

Can we change TimeoutThread to just wait() on timeLimitedThreads, instead of wait(1000)? I think it's enough to rely on notify()? Otherwise, if my system is idle, this thread will wake up every second for nothing.

Maybe change checkTimeoutIsThisThread to do "if(timedOutThreads.remove(Thread.currentThread()) != null)" and delete the next line? If a thread has timed out, there's no need to look it up in the map twice.

Question - TimeoutThread checks firstAnticipatedTimeout vs. the current time and if it has exceeded, it adds firstAnticipatedThreadToFail to timedoutThreads. I think it will fail in the following scenario:

Thread 1 start an activity w/ time 100 (the expected exceeded time).

Thread 2 start an activity w/ time 150.

Thread 1 is stuck somewhere.

TimeoutThread checks the current time against firstAnticipatedTimeout and adds firstAnticipatedThreadToFail to timedOutThreads.

Thread 2 checks for timeout, but timedOutThreads does not contain it. Therefore it continues its execution.

If a thread is stuck, the rest of the threads should not fail on their "timeout exceeded" checks. How about if when TimeoutThread detects the first timeout has reached it will: (1) add that thread to the timedOutThreads Set and (2) set "first timeout" to be the next in the Map/List. I think we'll need an additional LinkedList or something, so that start(), stop(), check() and TimeoutThread.run() will work properly, but that shouldn't be too complicated.

Shai Erera
added a comment - 01/Jul/09 13:45
Can we move to call Thread.currentThread() only once per method? Currently stop() calls it 3 times, start() 2 and checkTimeoutIsThisThread()
Can we change TimeoutThread to just wait() on timeLimitedThreads, instead of wait(1000)? I think it's enough to rely on notify()? Otherwise, if my system is idle, this thread will wake up every second for nothing.
Maybe change checkTimeoutIsThisThread to do "if(timedOutThreads.remove(Thread.currentThread()) != null)" and delete the next line? If a thread has timed out, there's no need to look it up in the map twice.
Question - TimeoutThread checks firstAnticipatedTimeout vs. the current time and if it has exceeded, it adds firstAnticipatedThreadToFail to timedoutThreads. I think it will fail in the following scenario:
Thread 1 start an activity w/ time 100 (the expected exceeded time).
Thread 2 start an activity w/ time 150.
Thread 1 is stuck somewhere.
TimeoutThread checks the current time against firstAnticipatedTimeout and adds firstAnticipatedThreadToFail to timedOutThreads.
Thread 2 checks for timeout, but timedOutThreads does not contain it. Therefore it continues its execution.
If a thread is stuck, the rest of the threads should not fail on their "timeout exceeded" checks. How about if when TimeoutThread detects the first timeout has reached it will: (1) add that thread to the timedOutThreads Set and (2) set "first timeout" to be the next in the Map/List. I think we'll need an additional LinkedList or something, so that start(), stop(), check() and TimeoutThread.run() will work properly, but that shouldn't be too complicated.
What do you think?

re the question - yes, TimeoutThread should call the existing "resetFirstAnticipatedFailure()" method to advance timeout monitoring immediately to the next candidate - it currently requires the first bad Thread to call stop() before monitoring is advanced to spot the next bad thread.

I think a useful safety measure is to manage clients that don't call stop() (e.g. forgetting to code a "finally...stop") but this is likely to add complexity to ActivityTimeMonitor so I want to get a basic version solid first before thinking too much about this.

Mark Harwood
added a comment - 01/Jul/09 14:51 re points 1,2,3 - yep, will change.
re the question - yes, TimeoutThread should call the existing "resetFirstAnticipatedFailure()" method to advance timeout monitoring immediately to the next candidate - it currently requires the first bad Thread to call stop() before monitoring is advanced to spot the next bad thread.
I think a useful safety measure is to manage clients that don't call stop() (e.g. forgetting to code a "finally...stop") but this is likely to add complexity to ActivityTimeMonitor so I want to get a basic version solid first before thinking too much about this.

Shai Erera
added a comment - 01/Jul/09 15:26 TimeoutThread should call the existing "resetFirstAnticipatedFailure()" method to advance timeout monitoring immediately to the next candidate
I'm not sure that's that simple. reseFirstAticipatedFailure will find Thread 1 again, right? And even if you overcome this, start() or stop() called by another thread may change it?

1) There is a holding list of active threads that are of indeterminate status
2) There is a list of threads that are known to have timed out
3) The monitoring thread has the job of moving items from 1) to 2) and waits for firstAnticipatedTimeout and is notify-ed if firstAnticipatedTimeout changes
4) Start() adds a thread to 1)
5) Stop() removes a thread from 1) or 2)
6) Check() throws an exception if anActivityHasTimedOut is true (for fast fail) and current thread is in 2)
7) Any modification to 2) should set anActivityHasTimedOut boolean flag = 2)'s size is >0.
8) Any modification to 1) should re-asses firstAnticipatedTimeout and notify 3) if changed

Mark Harwood
added a comment - 01/Jul/09 17:50 Maybe we should start by debugging some guiding principles:
1) There is a holding list of active threads that are of indeterminate status
2) There is a list of threads that are known to have timed out
3) The monitoring thread has the job of moving items from 1) to 2) and waits for firstAnticipatedTimeout and is notify-ed if firstAnticipatedTimeout changes
4) Start() adds a thread to 1)
5) Stop() removes a thread from 1) or 2)
6) Check() throws an exception if anActivityHasTimedOut is true (for fast fail) and current thread is in 2)
7) Any modification to 2) should set anActivityHasTimedOut boolean flag = 2)'s size is >0.
8) Any modification to 1) should re-asses firstAnticipatedTimeout and notify 3) if changed

Minor change - should be "from 1) and/or 2)". Actually I think the impl will always be "1) and 2)", i.e., we'll call remove() from both Map/Set w/o checking first if the Thread exists there.

The monitoring thread has the job of moving items from 1) to 2) and waits for firstAnticipatedTimeout and is notify-ed if firstAnticipatedTimeout changes

I think here we'd want to keep a list (true "list") of timeouts, where firstAnticipatedTimeout = list.head() and if wait(firstAnticipatedTimeout) reaches, in addition to add that Thread to timedOutThread Set, also remove that timeout from the head of the list, and wait on the following expected timeout?

Perhaps all it means is that (1) should be a TreeMap, sorted by the expected timeout, and when the first timeout is reached, the thread is removed from (1) as its state is no longer indeterminate (i.e., we already know it has timed out)?

Shai Erera
added a comment - 02/Jul/09 09:38 Stop() removes a thread from 1) or 2)
Minor change - should be "from 1) and/or 2)". Actually I think the impl will always be "1) and 2)", i.e., we'll call remove() from both Map/Set w/o checking first if the Thread exists there.
The monitoring thread has the job of moving items from 1) to 2) and waits for firstAnticipatedTimeout and is notify-ed if firstAnticipatedTimeout changes
I think here we'd want to keep a list (true "list") of timeouts, where firstAnticipatedTimeout = list.head() and if wait(firstAnticipatedTimeout) reaches, in addition to add that Thread to timedOutThread Set, also remove that timeout from the head of the list, and wait on the following expected timeout?
Perhaps all it means is that (1) should be a TreeMap, sorted by the expected timeout, and when the first timeout is reached, the thread is removed from (1) as its state is no longer indeterminate (i.e., we already know it has timed out)?

Mark Harwood
added a comment - 22/Jul/09 17:07 Hey Mark. Have you made any progress with that?
Apologies, recently the lure of developing apps for the new iPhone has put paid to that
I'm still happy that the pseudo-code we outlined in the last couple of comments is what is needed to finish this.
We can tag team if you want
Yep, happy to do that. Let me know if you start work to avoid me duplicating effort and I'll do the same.
Cheers
Mark

Shai Erera
added a comment - 13/Aug/09 18:54 We still haven't coded in the items on the comments above (Jul 1 and 2). I hope to get to it some time soon though, but if you'd like to give those a shot, go ahead.

Jason Rutherglen
added a comment - 13/Aug/09 19:17 I've been using TimeLimitingCollector which as mentioned before,
with large OR queries is somewhat inexact in it's precision. So
I may update the patch, though I'll let you know if/when I
start.
Thanks for the heads up.

Had another run at ActivityTimeMonitor tonight and rationalised the code based on earlier comments. It should now cater for multiple simultaneous timeouts more cleanly.

I'm concentrating on robustness with this currently - there's a TODO comment in the code that captures a small remaining inefficiency in iterating through all threads' data rather than using some form of time-sorted list. There was a suggestion in the earlier Jira comments re TreeMap might be a simple alternative but see my Java code comments as to why this is unlikely to work. Optimising this is likely to require the introduction of yet another data structure but this will add a runtime cost to maintain it - a cost I'm not sure is justified.

Mark Harwood
added a comment - 18/Aug/09 00:05 Had another run at ActivityTimeMonitor tonight and rationalised the code based on earlier comments. It should now cater for multiple simultaneous timeouts more cleanly.
I'm concentrating on robustness with this currently - there's a TODO comment in the code that captures a small remaining inefficiency in iterating through all threads' data rather than using some form of time-sorted list. There was a suggestion in the earlier Jira comments re TreeMap might be a simple alternative but see my Java code comments as to why this is unlikely to work. Optimising this is likely to require the introduction of yet another data structure but this will add a runtime cost to maintain it - a cost I'm not sure is justified.

Collected all the files in the issue into a single .patch file. I also included the following changes/additions:

Formatting.

Renamed TestTimeLimitedIndexReader to TimeLimitedIndexReaderBenchmark - I don't want it to be run by ant test. We should review it anyway and put in in benchmark.

Created TestActivityTimeMonitor which tests the ATM itself. This revealed a bug in the ATM, where it removed from the map while iterating on it (TimeoutThread.run()).

BTW, the test testMultipleThreadsFailing() passes when the number of threads is 10, and fails when it's set higher. It is definitely a timing issue, b/c if I change each thread to sleep more than 200ms, the test passes. I suspect that when there are so many threads in the system, TimeoutThread will not be as accurate. However I'm not sure if we should do something about it ... if there are 50 threads running concurrently, things will be slow, and executing search requests will take longer, therefore the TimeoutThread will have enough time to pick their timeouts.

We need to create a test for the IndexReader case, I hope to get to it, but if you want, please take a stab at it.

Shai Erera
added a comment - 11/Feb/10 09:30 Collected all the files in the issue into a single .patch file. I also included the following changes/additions:
Formatting.
Renamed TestTimeLimitedIndexReader to TimeLimitedIndexReaderBenchmark - I don't want it to be run by ant test. We should review it anyway and put in in benchmark.
Created TestActivityTimeMonitor which tests the ATM itself. This revealed a bug in the ATM, where it removed from the map while iterating on it (TimeoutThread.run()).
BTW, the test testMultipleThreadsFailing() passes when the number of threads is 10, and fails when it's set higher. It is definitely a timing issue, b/c if I change each thread to sleep more than 200ms, the test passes. I suspect that when there are so many threads in the system, TimeoutThread will not be as accurate. However I'm not sure if we should do something about it ... if there are 50 threads running concurrently, things will be slow, and executing search requests will take longer, therefore the TimeoutThread will have enough time to pick their timeouts.
We need to create a test for the IndexReader case, I hope to get to it, but if you want, please take a stab at it.

BTW Mark, if we assume the number of threads that will be monitored is relatively low, I agree w/ your comment on using the TreeMap. I think we can remove it from the code (the comment I mean). What do you think?

Shai Erera
added a comment - 11/Feb/10 09:31 BTW Mark, if we assume the number of threads that will be monitored is relatively low, I agree w/ your comment on using the TreeMap. I think we can remove it from the code (the comment I mean). What do you think?

Agreed on removing the treemap comment..
As you suggest, their may be a low-level accuracy timing issue under heavy load but for the typically longer timeout settings we may set this is less likely to be an issue.

Related: I did think of another feature for ATM - timeouts will typically be set to the maximum bearable value that can be sustained by the hardware without upsetting lots of users/customers who need answers.
This setting is therefore a tough business decision to make and is likely to be on the high side to avoid annoying customers (10 seconds? 30?).
The current monitoring solution only aborts at the latest possible stage when the uppermost acceptable limit has been reached and expensive resource has already been burned.
Maybe we could add a progress-testing method to ATM which can throw an exception earlier e.g.
public void checkForProjectedActivityTimeout(float percentActivityCompletedSoFar)
Clients would need to estimate how far through a task they were and call this method periodically.

Mark Harwood
added a comment - 11/Feb/10 11:13 Thanks for the updates, Shai.
Agreed on removing the treemap comment..
As you suggest, their may be a low-level accuracy timing issue under heavy load but for the typically longer timeout settings we may set this is less likely to be an issue.
Related: I did think of another feature for ATM - timeouts will typically be set to the maximum bearable value that can be sustained by the hardware without upsetting lots of users/customers who need answers.
This setting is therefore a tough business decision to make and is likely to be on the high side to avoid annoying customers (10 seconds? 30?).
The current monitoring solution only aborts at the latest possible stage when the uppermost acceptable limit has been reached and expensive resource has already been burned.
Maybe we could add a progress-testing method to ATM which can throw an exception earlier e.g.
public void checkForProjectedActivityTimeout(float percentActivityCompletedSoFar)
Clients would need to estimate how far through a task they were and call this method periodically.

I like the idea of adding the projected activity timeout in general, but I'd like to question its usefulness in reality (or at least for search applications). The way I think of it (and it might be because I'm thinking of my use case) there are two problems with such API:

It might not be very easy (if at all) or performing to project how much of the work has been done. For TermQuery it might be easy to tell this (e.g. numSeenSoFar / df(term)), but that will add an 'if' to every document that is traversed, and possible more operations. But for more complicated queries, I'm not sure you'll be able to tell how much of the query has been processed.

If I am willing to sustain a 10s query, then I guess I'd want to extract as much information as I can in those 10s. If after 1s I realize I haven't processed even 10% of the data that doesn't mean I'd like to stop, right? Maybe the query/activity will speed up shortly? I think that if I put a cap on the query time, it means I don't mind spending that amount of time ... but I also recognize this may depend on the application, and therefore that is not a too strong argument.

I think this approach is interesting, as it is able to detect 'hanging' threads (such as those stuck in infinite loops).

I realize however that ActivityTimeMonitor is not search specific (which makes me think it should be moved to o.a.l.util or something) and therefore the projected activity timeout can have its usage in other places.

How about if we do it in a separate issue? We still need to write enough tests for what exists so far, and turn the Benchmark class into a benchmark task/alg. I think that if we can avoid extra functionality (which is likely to add more bugs to cover) it will be easier to finish that issue, no?
BTW, in order to support this we'll need to store the startTime as well, not just the timeoutTime, which means that we either add another startTimesThreads map, or change the map to be from Thread to a Times object which encapsulates both times ... Minor thing though.

Shai Erera
added a comment - 11/Feb/10 12:17 I like the idea of adding the projected activity timeout in general, but I'd like to question its usefulness in reality (or at least for search applications). The way I think of it (and it might be because I'm thinking of my use case) there are two problems with such API:
It might not be very easy (if at all) or performing to project how much of the work has been done. For TermQuery it might be easy to tell this (e.g. numSeenSoFar / df(term)), but that will add an 'if' to every document that is traversed, and possible more operations. But for more complicated queries, I'm not sure you'll be able to tell how much of the query has been processed.
If I am willing to sustain a 10s query, then I guess I'd want to extract as much information as I can in those 10s. If after 1s I realize I haven't processed even 10% of the data that doesn't mean I'd like to stop, right? Maybe the query/activity will speed up shortly? I think that if I put a cap on the query time, it means I don't mind spending that amount of time ... but I also recognize this may depend on the application, and therefore that is not a too strong argument.
I think this approach is interesting, as it is able to detect 'hanging' threads (such as those stuck in infinite loops).
I realize however that ActivityTimeMonitor is not search specific (which makes me think it should be moved to o.a.l.util or something) and therefore the projected activity timeout can have its usage in other places.
How about if we do it in a separate issue? We still need to write enough tests for what exists so far, and turn the Benchmark class into a benchmark task/alg. I think that if we can avoid extra functionality (which is likely to add more bugs to cover) it will be easier to finish that issue, no?
BTW, in order to support this we'll need to store the startTime as well, not just the timeoutTime, which means that we either add another startTimesThreads map, or change the map to be from Thread to a Times object which encapsulates both times ... Minor thing though.
Also, is this targeted to be added to 'core' or contrib?

I agree it will be challenging to work out when to call this from readers etc and how to estimate completeness but as a general utility class (as you suggest, in o.a.l.util ) it seems like a useful addition.

My suspicion is that this is currently "contrib" - but then TimeLimitingCollector is currently in core.
Maybe TimeLimitingCollector could be rewritten to use ATM and then we maintain a common generally reusable implementation?

Mark Harwood
added a comment - 11/Feb/10 13:00 The change to ATM isn't that big - as you say just adding "start" to the data on each thread.
Here's an (untested) example
Bar.java
/**
* Checks to see if this thread is likely to exceed it's pre-determined timeout.
* This is a heavier-weight call than "checkForTimeout" and should not be called quite as frequently
*
* Throws {@link ActivityTimedOutException}RuntimeException in the event of any anticipated timeout.
* @param progress
*/
public static final void checkProjectedTimeoutOnThisThread( float progress)
{
Thread currentThread= Thread .currentThread();
synchronized (timeLimitedThreads)
{
ActivityTime thisTimeOut = timeLimitedThreads.get(currentThread);
if (thisTimeOut!= null )
{
long now= System .currentTimeMillis();
long maxDuration=thisTimeOut.scheduledTimeout-thisTimeOut.startTime;
long durationSoFar=now-thisTimeOut.startTime;
float expectedMinimumProgress=( float )durationSoFar/maxDuration;
if (progress<expectedMinimumProgress)
{
long expectedOverrun=( long ) (((durationSoFar*(1f-progress))+now)-thisTimeOut.scheduledTimeout);
throw new ActivityTimedOutException( " Thread " +currentThread+ " is expected to time out, estimated overrun ="
+expectedOverrun+ " ms" ,expectedOverrun);
}
}
}
}
static class ActivityTime
{
public ActivityTime( long startTime, long timeOutTime)
{
this .startTime=startTime;
this .scheduledTimeout=timeOutTime;
}
long startTime;
long scheduledTimeout;
}
I agree it will be challenging to work out when to call this from readers etc and how to estimate completeness but as a general utility class (as you suggest, in o.a.l.util ) it seems like a useful addition.
My suspicion is that this is currently "contrib" - but then TimeLimitingCollector is currently in core.
Maybe TimeLimitingCollector could be rewritten to use ATM and then we maintain a common generally reusable implementation?

This looks good. About throwing the exception, it bothers me that we decide for the caller that if the task is expected to timeout, we terminate it. Maybe someone will want to give it another try, and really timeout if the projection failed for a couple of times? So for this I think we either throw a different exception, or return a boolean which I prefer better. The caller can then decide what to do if the method returned false ... what do you think?
I'm thinking that unlike checkForTimeout which is definite - if it 'returns' true you need to terminate, checking for the projected timeout is different, more intentional and 'test' in nature, so let's be more forgiving with it?

About TimeLimitingCollector, I think the usage does not allow to reuse TimeAcitivityMonitor? i.e., w/ ATM you need to start and stop, but the collector cannot stop (it can start in the ctor, which is not good either since that may be 'too far' from collect()). Even though it's not critical for a thread to remain in the timeLimitedThreads map (because if the thread is reused, next time start() is called it will override itself in the map), the TimeoutThread will falsely signal that a timeout has occurred. Doesn't feel right to me.

Shai Erera
added a comment - 11/Feb/10 13:32 Liked the formatting. Didn't know you can do that .
This looks good. About throwing the exception, it bothers me that we decide for the caller that if the task is expected to timeout, we terminate it. Maybe someone will want to give it another try, and really timeout if the projection failed for a couple of times? So for this I think we either throw a different exception, or return a boolean which I prefer better. The caller can then decide what to do if the method returned false ... what do you think?
I'm thinking that unlike checkForTimeout which is definite - if it 'returns' true you need to terminate, checking for the projected timeout is different, more intentional and 'test' in nature, so let's be more forgiving with it?
About TimeLimitingCollector, I think the usage does not allow to reuse TimeAcitivityMonitor? i.e., w/ ATM you need to start and stop, but the collector cannot stop (it can start in the ctor, which is not good either since that may be 'too far' from collect()). Even though it's not critical for a thread to remain in the timeLimitedThreads map (because if the thread is reused, next time start() is called it will override itself in the map), the TimeoutThread will falsely signal that a timeout has occurred. Doesn't feel right to me.

Agreed, might be useful to provide boolean response to the progress method - a kind of "how am I doing?" check.
We can always provide a convenience wrapper method which throws an exception : ATM.blowUpIfNotGoingFastEnough(float progress)

Re TimeLimitingCollector - agreed, you really do need to protect ATM/start/stop calls in the same try...finally block.
Maybe ATM could have a "start" method variant that takes an additional "alreadyRunningSince" argument as opposed to the existing assumption that the activity is starting right now. The first "collect" could then call this with a timestamp initialised in the constructor.
Even then, there is the issue of where to put the "stop" call - collector has no "close" call to signal the end of the activity.

Doesn't seem like TimeLimitingCollector can be based on the same ATM code. Shame.

Mark Harwood
added a comment - 11/Feb/10 13:51 Agreed, might be useful to provide boolean response to the progress method - a kind of "how am I doing?" check.
We can always provide a convenience wrapper method which throws an exception : ATM.blowUpIfNotGoingFastEnough(float progress)
Re TimeLimitingCollector - agreed, you really do need to protect ATM/start/stop calls in the same try...finally block.
Maybe ATM could have a "start" method variant that takes an additional "alreadyRunningSince" argument as opposed to the existing assumption that the activity is starting right now. The first "collect" could then call this with a timestamp initialised in the constructor.
Even then, there is the issue of where to put the "stop" call - collector has no "close" call to signal the end of the activity.
Doesn't seem like TimeLimitingCollector can be based on the same ATM code. Shame.

Note: that means that every collect will need to check if it's the first.

I think we should leave TLC out of this for now. Maybe w/ TimeLimitingIndexReader, which has finer granularity, TLC won't be used at all. At least, I definitely don't see why would one use both...

About the packages, so we agree that ATM and exception move to o.a.l.util, and the reader stay in index (like IndexReader)? And I don't think this warrants a contrib package, unless we move TLC there as well. But this seems like a useful Lucene core utility.

So that we don't override each other - are you going to add the projected method to the patch? If so, can you also order the packages and remove the TreeMap comment? If not I can do it.

Shai Erera
added a comment - 11/Feb/10 14:10 We can always provide a convenience wrapper method which throws an exception : ATM.blowUpIfNotGoingFastEnough(float progress)
Right. Although I'm thinking, in the sake of keeping the API simple, and back-compat sane, it's easy enough for the caller to do:
if (!ATM.checkProjectedTimeoutOnThisThread(progress)) {
throw new ActivityTimedOutException( "terminating due to projected timeout. progress = " + progress);
}
The first "collect"
Note: that means that every collect will need to check if it's the first.
I think we should leave TLC out of this for now. Maybe w/ TimeLimitingIndexReader, which has finer granularity, TLC won't be used at all. At least, I definitely don't see why would one use both...
About the packages, so we agree that ATM and exception move to o.a.l.util, and the reader stay in index (like IndexReader)? And I don't think this warrants a contrib package, unless we move TLC there as well. But this seems like a useful Lucene core utility.
So that we don't override each other - are you going to add the projected method to the patch? If so, can you also order the packages and remove the TreeMap comment? If not I can do it.

Shai Erera
added a comment - 11/Feb/10 14:34 BTW, perhaps to allow even finer granularity and precision, we should move to use nanoTime()? the given timeout can be in ms, but the TimeoutThread can work w/ nanoTime() ... what do you think?

I think the API is pretty stable - call try..start..finally...stop around time-critical stuff and use a TimeLimitedIndexReader to wrap your IndexReader.

Internally the implementation feels reasonably stable too.

In my tests it doesn't seem to add too much overhead to calls - I was getting response times of 3400 milliseconds on a heavy wikipedia query with TimeLimitedIndexReader versus 3300 for the same query on a raw IndexReader without timeout protection.

I'm tempted to try put the timeout check calls directly into a version of IndexReader rather than in a delegating reader wrapper just to try see if the wrapper code is where the bulk of the extra overhead comes in. I'd hate to add any overhead to core IndexReader but I'm keen to see just how low-cost this check can get.

Mark Harwood
added a comment - 11/Feb/10 23:00 When's this ready to test with Solr?
I think the API is pretty stable - call try..start..finally...stop around time-critical stuff and use a TimeLimitedIndexReader to wrap your IndexReader.
Internally the implementation feels reasonably stable too.
In my tests it doesn't seem to add too much overhead to calls - I was getting response times of 3400 milliseconds on a heavy wikipedia query with TimeLimitedIndexReader versus 3300 for the same query on a raw IndexReader without timeout protection.
I'm tempted to try put the timeout check calls directly into a version of IndexReader rather than in a delegating reader wrapper just to try see if the wrapper code is where the bulk of the extra overhead comes in. I'd hate to add any overhead to core IndexReader but I'm keen to see just how low-cost this check can get.

Thanks Mark. I will take a look at the patch a bit later. TestActivityMonitor should move to o.a.l.util as well. I can do that.
I also want to experiment w/ using ConcurrentHashMap, as most of the times the map is just checked and not modified. I'll see if it removes some of the synchronization. The synchronization is needed because of TimeoutThread, which only checks the map ... if it's possible, I believe it will improve the performance of ATM.

About adding it to IndexReader directly - it'd be interesting to see if it affects performance in any way, but I wouldn't want to see it directly in IR, since if I don't want to limit anything, I don't want to add an 'if' everywhere in IR, or a call to ATM ... it'd be interesting to see if it improves anything though.

I also want to add a TestTimeLimitedIndexReader.

BTW about the name - I think TimeLimitingIndexReader is more accurate, since it's not the IR which is time limited, but the operations it performs. What do you think? (also we'll follow TimeLimitingCollector) ...

Shai Erera
added a comment - 12/Feb/10 06:06 Thanks Mark. I will take a look at the patch a bit later. TestActivityMonitor should move to o.a.l.util as well. I can do that.
I also want to experiment w/ using ConcurrentHashMap, as most of the times the map is just checked and not modified. I'll see if it removes some of the synchronization. The synchronization is needed because of TimeoutThread, which only checks the map ... if it's possible, I believe it will improve the performance of ATM.
About adding it to IndexReader directly - it'd be interesting to see if it affects performance in any way, but I wouldn't want to see it directly in IR, since if I don't want to limit anything, I don't want to add an 'if' everywhere in IR, or a call to ATM ... it'd be interesting to see if it improves anything though.
I also want to add a TestTimeLimitedIndexReader.
BTW about the name - I think TimeLimitingIndexReader is more accurate, since it's not the IR which is time limited, but the operations it performs. What do you think? (also we'll follow TimeLimitingCollector) ...

Having done this there were some test failures - notably calls to SegmentReader.getOnlySegmentReader(IndexReader reader) because it has a bunch of "instanceof" testing code that doesn't expect our wrapper.

This is a general Lucene issue. If we support Reader-wrapping as a concept (FilterIndexReader certainly suggests this) then it might make sense to provide a method call to "getWrappedReader" in the same way java.lang.Exception introduced a standard "getCause" method in java 1.4 because prior to that unwrapping objects required specialised knowledge of each wrapper class. This is perhaps another Jira issue and related changes to Junit tests.

I'll attach an updated patch with the Junit test that currently fails on these "instanceof" checks

Mark Harwood
added a comment - 12/Feb/10 12:57 I also want to add a TestTimeLimitedIndexReader.
To simplify this I started down the route of making core's TestIndexReader subclassable for testing any IndexReader wrappers such as ours.
This involves centralising all the "r= IndexReader.open(..)" calls into a single overridable "getReader" method. The TimeLimitingIndexReader then becomes just this:
TestTimeLimitingIndexReader.java
public class TestTimeLimitingIndexReader extends TestIndexReader{
public TestTimeLimitingIndexReader( String name) {
super (name);
}
@Override
public IndexReader getReader(Directory dir, boolean readOnly)
throws CorruptIndexException, IOException {
return new TimeLimitedIndexReader( super .getReader(dir, readOnly));
}
}
Having done this there were some test failures - notably calls to SegmentReader.getOnlySegmentReader(IndexReader reader) because it has a bunch of "instanceof" testing code that doesn't expect our wrapper.
This is a general Lucene issue. If we support Reader-wrapping as a concept (FilterIndexReader certainly suggests this) then it might make sense to provide a method call to "getWrappedReader" in the same way java.lang.Exception introduced a standard "getCause" method in java 1.4 because prior to that unwrapping objects required specialised knowledge of each wrapper class. This is perhaps another Jira issue and related changes to Junit tests.
I'll attach an updated patch with the Junit test that currently fails on these "instanceof" checks

Hey Mark, I'm working on the patch also (experimenting with ConcurrentHashMap). Is your latest patch differs from the previous only around the test to TimeLimitingReader and TestIndexReader? If so, I'll apply those changes locally.

Shai Erera
added a comment - 12/Feb/10 14:22 Hey Mark, I'm working on the patch also (experimenting with ConcurrentHashMap). Is your latest patch differs from the previous only around the test to TimeLimitingReader and TestIndexReader? If so, I'll apply those changes locally.

I forgot to add that while extending TestIndexReader is good to ensure nothing breaks w/ TimeLimitingIndexReader, I think we should add some search timeout tests to it, something in the spirit of TestActivityTimeMonitor. I'll do that while I'm working on the ConurrentHashMap thing, if you don't mind.

Shai Erera
added a comment - 12/Feb/10 14:25 Hit the submit button too soon.
I forgot to add that while extending TestIndexReader is good to ensure nothing breaks w/ TimeLimitingIndexReader, I think we should add some search timeout tests to it, something in the spirit of TestActivityTimeMonitor. I'll do that while I'm working on the ConurrentHashMap thing, if you don't mind.

Mark Harwood
added a comment - 12/Feb/10 14:36 I think we should add some search timeout tests to it,
Yep, I left a "TODO" in there to cover this.
I'll do that while I'm working on the ConurrentHashMap thing, if you don't mind.
Great stuff. I'll leave this with you until further notice.
Thanks

Ok, it looks like ConcurrentHashMap cannot be used by ATM because of its TimeoutThread. When the thread checks the map and iterates on its elements, we need to lock the map completely because otherwise we can hit a case where the thread pulled an entry for inspection, and then the thread which is monitored calls stop(), but the timeout thread won't know that any might wrongly mark that thread as a timeout activity thread ...
Since checkForTimeout does not need to obtain any lock, and the synchronization happens on start/stop/isProjected and TimeoutThread, I'm not sure how important is it to use CHM.

Another thing, CHM allows you to specify the concurrency level, which is essentially the number of threads you know are going to 'change' the map (put or remove). If you don't know that, and default to clevel=1, it means only one thread is expected to change the map, which in ATM's case would yield to <0 benefit, since all the threads (which their number we don't know up front) change the map ... of course we can guess, or try to, but ...

Anyway, I'm putting that aside for now, and moving no to adding more tests to TestTimeLimitingReader.

getWrappedReader

Mark, the exception in the tests is thrown from SegmentReader.getOnlySegmentReader. I think if we add this method to FilterIndexReader only and in getOnlySegmentReader we add this code:

Shai Erera
added a comment - 15/Feb/10 14:59 Ok, it looks like ConcurrentHashMap cannot be used by ATM because of its TimeoutThread. When the thread checks the map and iterates on its elements, we need to lock the map completely because otherwise we can hit a case where the thread pulled an entry for inspection, and then the thread which is monitored calls stop(), but the timeout thread won't know that any might wrongly mark that thread as a timeout activity thread ...
Since checkForTimeout does not need to obtain any lock, and the synchronization happens on start/stop/isProjected and TimeoutThread, I'm not sure how important is it to use CHM.
Another thing, CHM allows you to specify the concurrency level, which is essentially the number of threads you know are going to 'change' the map (put or remove). If you don't know that, and default to clevel=1, it means only one thread is expected to change the map, which in ATM's case would yield to <0 benefit, since all the threads (which their number we don't know up front) change the map ... of course we can guess, or try to, but ...
Anyway, I'm putting that aside for now, and moving no to adding more tests to TestTimeLimitingReader.
getWrappedReader
Mark, the exception in the tests is thrown from SegmentReader.getOnlySegmentReader. I think if we add this method to FilterIndexReader only and in getOnlySegmentReader we add this code:
if (reader instanceof FilterIndexReader) {
return getOnlySegmentReader(((FilterIndexReader) reader).getWrappedReader());
}
It should work? This can be done in this issue I think as it doesn't break back-compat and exposes a reasonable method where it should be. What do you think?

Anyway, I'm putting that aside for now, and moving no to adding more tests to TestTimeLimitingReader.

OK.

I always shudder when I see lists of "if instanceof..." logic.

My suggestion of "getWrappedReader" was intended for broader use - there are other reasons to wrap a reader e.g. security.
I was thinking of putting it on IndexReader but maybe the convenience wrapper base class "FilterIndexReader" would be a better home - most reader-wrappers would use this as a base class?

Mark Harwood
added a comment - 15/Feb/10 15:11 Anyway, I'm putting that aside for now, and moving no to adding more tests to TestTimeLimitingReader.
OK.
I always shudder when I see lists of "if instanceof..." logic.
My suggestion of "getWrappedReader" was intended for broader use - there are other reasons to wrap a reader e.g. security.
I was thinking of putting it on IndexReader but maybe the convenience wrapper base class "FilterIndexReader" would be a better home - most reader-wrappers would use this as a base class?

Hmm ... I've added the method above, and it solved the issue in one of the tests, but revealed other issues with FilterIndexReader, like for example it doesn't override getUniqueTermCount() and reopen() ... so maybe we need to change the current TestFilterIndexReader to extend TestIndexReader and override getReader() ... this looks like a separate issue, though it will block this issue ...
A change in TestIndexReader is also required in one test which asserts that the instance of the reader is DirectoryReader (or ReadOnlySegmentReader), but that's easy to do by for example adding a isInstanceOf(IndexReader) method to TestIndexReader which will be overridden in TestFilterIndexReader and TestTimeLimitingIndexReader.

Shai Erera
added a comment - 15/Feb/10 15:22 Hmm ... I've added the method above, and it solved the issue in one of the tests, but revealed other issues with FilterIndexReader, like for example it doesn't override getUniqueTermCount() and reopen() ... so maybe we need to change the current TestFilterIndexReader to extend TestIndexReader and override getReader() ... this looks like a separate issue, though it will block this issue ...
A change in TestIndexReader is also required in one test which asserts that the instance of the reader is DirectoryReader (or ReadOnlySegmentReader), but that's easy to do by for example adding a isInstanceOf(IndexReader) method to TestIndexReader which will be overridden in TestFilterIndexReader and TestTimeLimitingIndexReader.

In my understanding it should be. I've started making this change, and TimeLimitingIndexReader does not need for example to override getUniqueTermCount or other such methods which are passed blindly to the wrapped reader. I think it makes sense to request that a wrapping reader should sub-class FilterIndexReader.

Of course, because of the semantics of the various reopen methods, it is not enough to provide one impl in FilterIndexReader and rely on that in sub-classes, because they need to wrap the reader with themselves as a new instance (I BTW found and fixed a bug in TimeLimitingIndexReader.reopen which returned the wrapped reopened instance if it wasn't changed, instead of itself). We can get over that by offering a protected getNewInstance(IndexReader) which will be overridden by sub-classes ...

Shai Erera
added a comment - 15/Feb/10 15:31 most reader-wrappers would use this as a base class?
In my understanding it should be. I've started making this change, and TimeLimitingIndexReader does not need for example to override getUniqueTermCount or other such methods which are passed blindly to the wrapped reader. I think it makes sense to request that a wrapping reader should sub-class FilterIndexReader.
Of course, because of the semantics of the various reopen methods, it is not enough to provide one impl in FilterIndexReader and rely on that in sub-classes, because they need to wrap the reader with themselves as a new instance (I BTW found and fixed a bug in TimeLimitingIndexReader.reopen which returned the wrapped reopened instance if it wasn't changed, instead of itself). We can get over that by offering a protected getNewInstance(IndexReader) which will be overridden by sub-classes ...

BTW found and fixed a bug in TimeLimitingIndexReader.reopen which returned the wrapped reopened instance if it wasn't changed, instead of itself

Good catch.

We can get over that by offering a protected getNewInstance(IndexReader) which will be overridden by sub-classes

Would that be abstract? That would effectively help force subclasses to do the right thing when reopening but introduce a back-compatibility issue.
If we don't make it abstract what would be the default implementation of this method?
Maybe it's all best handled by simply adding a note saying "you really should think about overriding reopen" in FilterIndexReader's javadocs?

Mark Harwood
added a comment - 15/Feb/10 15:58 BTW found and fixed a bug in TimeLimitingIndexReader.reopen which returned the wrapped reopened instance if it wasn't changed, instead of itself
Good catch.
We can get over that by offering a protected getNewInstance(IndexReader) which will be overridden by sub-classes
Would that be abstract? That would effectively help force subclasses to do the right thing when reopening but introduce a back-compatibility issue.
If we don't make it abstract what would be the default implementation of this method?
Maybe it's all best handled by simply adding a note saying "you really should think about overriding reopen" in FilterIndexReader's javadocs?

I've actually went ahead and implemented a wrap(IndexReader) method. The default impl returns a new FilterIndexReader, since it's a concrete class. We should add it to its javadocs, saying that you'd better override that method as well. Since it's not just for reopen (on all its 3 variants), but for getSequentialSubReaders as well, I think it's useful ...

BTW, you seem to implemented correctly getSequentialSubReaders on TimeLimitingIndexReader, while FilterIndexReader doesn't ... so I pulled your implementation up to FIR and by calling wrap(IndexReader) for every sub reader, both FIR and TLIR implement it correctly.

I'm now working on adding more tests to TestTLIR. Hope to post a patch really soon.

Shai Erera
added a comment - 15/Feb/10 16:06 Would that be abstract? ...
I've actually went ahead and implemented a wrap(IndexReader) method. The default impl returns a new FilterIndexReader, since it's a concrete class. We should add it to its javadocs, saying that you'd better override that method as well. Since it's not just for reopen (on all its 3 variants), but for getSequentialSubReaders as well, I think it's useful ...
BTW, you seem to implemented correctly getSequentialSubReaders on TimeLimitingIndexReader, while FilterIndexReader doesn't ... so I pulled your implementation up to FIR and by calling wrap(IndexReader) for every sub reader, both FIR and TLIR implement it correctly.
I'm now working on adding more tests to TestTLIR. Hope to post a patch really soon.

Although some of the changes can be viewed as belonging to a separate issue, I think they are not very big and fit nicely here. It allows testing the new TimeLimitingIndexReader which is important.

Mark, the only thing that remains is to convert TimeLimitingIndexReaderBenchmark to a benchmark algorithm/task. Would you mind taking a stab at this? If not, I'll try to get to it later this week (perhaps in a day or two).

Shai Erera
added a comment - 15/Feb/10 16:51 Added patch contains, above all previous things:
Changes to FitlerIndexReader to better support extensions.
Changes in TimeLimitingIndexReader based on the previous bullet.
Changes to TestIndexReader to better support extensions.
TestFilterIndexReader now extends TestIndexReader.
TestTimeLimitingIndexReader was added few test methods.
Although some of the changes can be viewed as belonging to a separate issue, I think they are not very big and fit nicely here. It allows testing the new TimeLimitingIndexReader which is important.
Mark, the only thing that remains is to convert TimeLimitingIndexReaderBenchmark to a benchmark algorithm/task. Would you mind taking a stab at this? If not, I'll try to get to it later this week (perhaps in a day or two).

Mark Harwood
added a comment - 15/Feb/10 17:53 Mark, the only thing that remains is to convert TimeLimitingIndexReaderBenchmark to a benchmark algorithm/task. Would you mind taking a stab at this?
Will need to look at existing benchmark tasks for guidance. I may get some time later.

How do we proceed from here? Is there a committer that's willing to look at the code? The only thing that's left to do is writing the benchmark algorithm, but that can happen in parallel (if at all for this issue), while we have review cycles.

Shai Erera
added a comment - 17/Feb/10 15:24 How do we proceed from here? Is there a committer that's willing to look at the code? The only thing that's left to do is writing the benchmark algorithm, but that can happen in parallel (if at all for this issue), while we have review cycles.

Mark Harwood
added a comment - 17/Feb/10 16:19 How do we proceed from here? Is there a committer that's willing to look at the code
I have commit rights but I'd like to find some time to add the benchmarking code first and also trial it in a live environment.

Shai Erera
added a comment - 17/Feb/10 16:28 I have commit rights
Oh, sorry, I didn't know . Then we're ok.
I'd like to find some time to add the benchmarking code
I've started to look into it, but stopped because of other duties. If I'll get back to it, I'll let you know, so that we don't work on the same thing.

We're using lucene in a large distributed search environment and we would hugely benefit from this patch being available in the main Lucene build. We were formerly using timeouts as part of our search broker, but it would make much more sense for these timeouts to operate at the IndexReader level so that we can prevent hanging threads.

Greg Tarr
added a comment - 17/May/10 15:57 We're using lucene in a large distributed search environment and we would hugely benefit from this patch being available in the main Lucene build. We were formerly using timeouts as part of our search broker, but it would make much more sense for these timeouts to operate at the IndexReader level so that we can prevent hanging threads.

If I may I'd like to retract my wish for this to be available in the main lucene build pending further testing on our systems.

There seems to be some code that was available in earlier versions of this patch but removed from the latest version (2010-02-15). Specifically the TimeLimitingIndexReader.java once had a method called getSequentialSubReaders(), but this has been removed from the latest commit..

Greg Tarr
added a comment - 07/Oct/10 16:04 If I may I'd like to retract my wish for this to be available in the main lucene build pending further testing on our systems.
There seems to be some code that was available in earlier versions of this patch but removed from the latest version (2010-02-15). Specifically the TimeLimitingIndexReader.java once had a method called getSequentialSubReaders(), but this has been removed from the latest commit..
Could someone please explain why this change has been made?

You should see that getSequentialSubReaders is catered for in the amended base class FilterIndexReader - it delegates to an over-ridable "wrap" method for subclasses to instantiate suitable wrapper classes for any sub readers
I suspect you took TimeLimitedIndexReader without the corresponding base class change?

Mark Harwood
added a comment - 07/Oct/10 21:39 Greg, check out the patch dated 2010-02-15 01:27 PM
You should see that getSequentialSubReaders is catered for in the amended base class FilterIndexReader - it delegates to an over-ridable "wrap" method for subclasses to instantiate suitable wrapper classes for any sub readers
I suspect you took TimeLimitedIndexReader without the corresponding base class change?
Cheers
Mark

Lucene 3.3 changed scoring in TermQuery (and others?) which means that TimeLimitedIndexReader breaks query scoring, returning all scores as 0. This is because hash codes for the subreaders are stored and reused. Unfortunately the getSequentialSubReaders uses a new wrapping object each time and doesn't implement hash code in anyway, hence returning oid's. In my case I just implemented hashcode in the TimeLimitedIndexReader, returning in.hashcode() and this fixed the problem.

Carl Austin
added a comment - 16/Aug/11 15:29 Lucene 3.3 changed scoring in TermQuery (and others?) which means that TimeLimitedIndexReader breaks query scoring, returning all scores as 0. This is because hash codes for the subreaders are stored and reused. Unfortunately the getSequentialSubReaders uses a new wrapping object each time and doesn't implement hash code in anyway, hence returning oid's. In my case I just implemented hashcode in the TimeLimitedIndexReader, returning in.hashcode() and this fixed the problem.