Details

Description

With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so we can proceed landing the DWPT development on trunk soon. I think one of the bigger issues here is to make sure that all JavaDocs for IW etc. are still correct though. I will start going through that first.

Activity

I added a jenkins build that runs every 4 hours to give the RT branch some exercise. I added my email address and buschmi to the recipients if the build fails.... if you wanna be added let me know.
From now on we should only commit bugfixes, documentation and merges with trunk to this branch. From my point of view there is only one blocker left here (LUCENE-3028) so the remaining work is mainly reviewing the current state and polishing the javadocs. I will go over IW, IR and DW java docs as a start.

Simon Willnauer
added a comment - 14/Apr/11 12:18 I added a jenkins build that runs every 4 hours to give the RT branch some exercise. I added my email address and buschmi to the recipients if the build fails.... if you wanna be added let me know.
From now on we should only commit bugfixes, documentation and merges with trunk to this branch. From my point of view there is only one blocker left here ( LUCENE-3028 ) so the remaining work is mainly reviewing the current state and polishing the javadocs. I will go over IW, IR and DW java docs as a start.

selckin thanks for reporting those failures... I fixed the TestIndexWriterException ones in LUCENE-3031 and LUCENE-3032. The TestOmitTf failure caused by a recently fixed bug on trunk (LUCENE-3027) which I haven't merged into RT branch yet. I just did the merge and that fixes that issue too. I will commit the merge in a minute

The issues you are reporting with addIndexes I can not reproduce though... I will spin off issues for them.

Simon Willnauer
added a comment - 15/Apr/11 08:55 selckin thanks for reporting those failures... I fixed the TestIndexWriterException ones in LUCENE-3031 and LUCENE-3032 . The TestOmitTf failure caused by a recently fixed bug on trunk ( LUCENE-3027 ) which I haven't merged into RT branch yet. I just did the merge and that fixes that issue too. I will commit the merge in a minute
The issues you are reporting with addIndexes I can not reproduce though... I will spin off issues for them.

Simon Willnauer
added a comment - 15/Apr/11 17:11 Attached patch.
awesome mike, I think you should commit that patch and we iterate once we are back from vacation?
The RT hudson build tolerates //nocommit
I think we lost this infoStream output from trunk?
do you recall where it was?
Can we rename Healthiness -> DocumentsWriterStallControl (or something like that)?
sure go ahead can you rename the test to testStalled instead of testHealthiness
It's looking good!
I think we are reasonably close!

selckin
added a comment - 16/Apr/11 09:18 fyi, since last night's trunk merge (r1092636) not a single test has failed in my 'while true; ant clean test' (addIndexes LUCENE-3033 has not failed since either)

Simon Willnauer
added a comment - 19/Apr/11 07:52 fyi, since last night's trunk merge (r1092636) not a single test has failed in my 'while true; ant clean test' (addIndexes LUCENE-3033 has not failed since either)
awesome... thanks for reporting back!
Attached patch.
mike I committed your patch to the branch so we can iterate!

Simon Willnauer
added a comment - 26/Apr/11 11:42 mike I committed your patch to the branch so we can iterate!
I just committed some fixes and added javadoc to several classes. We are down to 0 nocommits!! Yay!

I was helping Simon look at reintegrating this branch (produce a patch for easy review, etc), but I found some problems.

1. it looks like some commits were marked as merged from trunk, but not actually merged. so if we reintegrate into trunk we will lose some changes.
2. some files have lost their svn:eol-style, which makes the comparison.... difficult.

Robert Muir
added a comment - 27/Apr/11 19:18 I was helping Simon look at reintegrating this branch (produce a patch for easy review, etc), but I found some problems.
1. it looks like some commits were marked as merged from trunk, but not actually merged. so if we reintegrate into trunk we will lose some changes.
2. some files have lost their svn:eol-style, which makes the comparison.... difficult.
I'm looking at these issues now.

ok, i think these issues are resolved. I'm attaching the script Mike wrote that I used for checking that we don't lose any changes (I think its the same script we used for the flex branch).

the way I did it is to checkout a/ and b/, reintegrate the branch into b/, and run the script to produce a huge patch. if some things look suspicious like they are "lost" changes, then i reverse apply the huge patch to the branch with eclipse and only selectively apply those lost changes and then commit.

Robert Muir
added a comment - 27/Apr/11 23:12 ok, i think these issues are resolved. I'm attaching the script Mike wrote that I used for checking that we don't lose any changes (I think its the same script we used for the flex branch).
the way I did it is to checkout a/ and b/, reintegrate the branch into b/, and run the script to produce a huge patch. if some things look suspicious like they are "lost" changes, then i reverse apply the huge patch to the branch with eclipse and only selectively apply those lost changes and then commit.

Robert Muir
added a comment - 28/Apr/11 01:58 What about TestIndexWriter.testIndexingThenDeleting?
I noticed in the branch the test method is changed to _testIndexingThenDeleting (disabled).
However, if i re-enable it (rename it back) it never seems to finish...

Simon Willnauer
added a comment - 28/Apr/11 06:23
I noticed in the branch the test method is changed to _testIndexingThenDeleting (disabled).
However, if i re-enable it (rename it back) it never seems to finish...
I just reenabled, fixed and committed that testcase on branch.

Simon Willnauer
added a comment - 28/Apr/11 09:26 Can we rename Healthiness -> DocumentsWriterStallControl (or something like that)?
I added a TODO for this - lets do that on trunk
I think we lost this infoStream output from trunk?
just committed a fix for this to reenable it on branch.. kind of tricky since we now flush concurrently NumberFormat must be accessed single threaded. So I added it to DWPT#flush()

Simon Willnauer
added a comment - 28/Apr/11 12:19 here is my first review round. I manually ported some missed merges from trunk and fixed some really minor things.
I will commit shortly to branch if nobody objects

Michael McCandless
added a comment - 28/Apr/11 18:05 Modified the Python script a bit to do a recursive diff b/w two dirs to make an applyable patch – added a usage, and a -skipWhitespace option.
I put it under a new 'dev-tools/scripts' dir...

+1
mike can you add a little doc string to the script explaining what it does and how to use it? I think we should also have a wiki page that explains how to reintegrate a branch just like we have one for merging changes into a branch.

Simon Willnauer
added a comment - 29/Apr/11 07:33 I put it under a new 'dev-tools/scripts' dir...
+1
mike can you add a little doc string to the script explaining what it does and how to use it? I think we should also have a wiki page that explains how to reintegrate a branch just like we have one for merging changes into a branch.

I committed the CHANGES.TXT patch to branch. I think we should freeze the branch now so robert can create a last final patch. We should let that patch linger around for a while, yet I plan to commit this to trunk on monday. Good work everybody!

Simon Willnauer
added a comment - 29/Apr/11 10:21 I committed the CHANGES.TXT patch to branch. I think we should freeze the branch now so robert can create a last final patch. We should let that patch linger around for a while, yet I plan to commit this to trunk on monday. Good work everybody!

Uwe Schindler
added a comment - 29/Apr/11 13:17 I merged the freezed branch again.
Attached is a first patch for reviewing code changes (not SVN diff), created by the following command between 2 fresh checkouts, one of them "svn merge --reintegrate":
diff -urNb --strip-trailing-cr trunk-lusolr1 trunk-lusolr2 | filterdiff -x "*.svn*" --strip 1 --clean > LUCENE-3023.patch
This patch is not intended to be applied, its more to verify the changes (therefore all whitespace changes created by merging were excluded).

Here is the final SVN diff. To work around some itches with SVN, the following was done:

reverted everything outside lucene sub folder

used the previously created manual diff to get a list of all changed files (using patchutils command lsdiff)

used "svn -q status | sed 's/^........//' > ../svn-files.txt" to get all files affected after merge

intersect both files (lsdiff and svn status one) to find all files that are in reality unchanged, but still affected by SVN (these are all files that were added after branching - this is a known limitation of SVN. Files added after branching are "replaced" by merge reintegrate, loosing all history). Store those files in unchanged.txt

Uwe Schindler
added a comment - 29/Apr/11 14:15 Here is the final SVN diff. To work around some itches with SVN, the following was done:
reverted everything outside lucene sub folder
used the previously created manual diff to get a list of all changed files (using patchutils command lsdiff)
used "svn -q status | sed 's/^........//' > ../svn-files.txt" to get all files affected after merge
intersect both files (lsdiff and svn status one) to find all files that are in reality unchanged, but still affected by SVN (these are all files that were added after branching - this is a known limitation of SVN. Files added after branching are "replaced" by merge reintegrate, loosing all history). Store those files in unchanged.txt
use the intersected filelist and revert everything: cat ../unchanged.txt | xargs svn revert
finally do a record-only merge again to fix mergeprops reverted by the previous revert
My checkout is now ready to commit.
If we have some minor problems with the patch, please wait with fixing after commit. If there are serious problems, we can fix them in realtime branch and merge manuall (I can do that later).

Michael Busch
added a comment - 29/Apr/11 16:14 Just wanted to say: you guys totally rock! Great teamwork here with all the work involved of getting the branch merged back. I'm sorry I couldn't help much in the last few weeks.

I've started some ad-hoc testing via Solr.
A single threaded CSV upload (bulk indexing... no real-time reopens)
looks pretty much the same, and doing 2 CSV uploads at once was
36% faster (a bit apples-to-oranges since the number of resulting
segments was also higher... but even still, looks like a good improvement!)

Yonik Seeley
added a comment - 29/Apr/11 18:49 This looks awesome guys!
I've started some ad-hoc testing via Solr.
A single threaded CSV upload (bulk indexing... no real-time reopens)
looks pretty much the same, and doing 2 CSV uploads at once was
36% faster (a bit apples-to-oranges since the number of resulting
segments was also higher... but even still, looks like a good improvement!)

In ThreadAffinityDocumentsWriterThreadPool#getAndLock() we had talked about switching from a per-threadstate queue (safeway model) to a single queue (whole foods). I'm wondering if we should do that before we commit or change that later as a separate patch?

remove some commented out code in TestFlushByRamOrCountsPolicy#testHealthyness

Michael Busch
added a comment - 01/May/11 05:28 Just a few minor comments, otherwise +1 to commit! (I'm super excited this is finally happening after the branch was created a year ago or so!)
In DocumentsWriterPerThreadPool:
remove:
+ // public abstract void clearThreadBindings(ThreadState perThread);
+
+ // public abstract void clearAllThreadBindings();
+
In ThreadAffinityDocumentsWriterThreadPool#getAndLock() we had talked about switching from a per-threadstate queue (safeway model) to a single queue (whole foods). I'm wondering if we should do that before we commit or change that later as a separate patch?
remove some commented out code in TestFlushByRamOrCountsPolicy#testHealthyness

In ThreadAffinityDocumentsWriterThreadPool#getAndLock() we had talked about switching from a per-threadstate queue (safeway model) to a single queue (whole foods). I'm wondering if we should do that before we commit or change that later as a separate patch?

I will create a new issue for this I don't think this should block the commit merge!

all other minor stuff can be done after committing. I will take care of these issues.

Simon Willnauer
added a comment - 01/May/11 08:21 In ThreadAffinityDocumentsWriterThreadPool#getAndLock() we had talked about switching from a per-threadstate queue (safeway model) to a single queue (whole foods). I'm wondering if we should do that before we commit or change that later as a separate patch?
I will create a new issue for this I don't think this should block the commit merge!
all other minor stuff can be done after committing. I will take care of these issues.

As noted before, this is minor and should be done after the merge was committed, else it would produce lot of useless work

I will commit this tomorrow morning MEST.

After committing, I will move the branch as a new SVN tag (realtime_DWPT_latest or like that) and if somebody wants to start again with more realtime work, we should branch again (according to the SVN book: a reintegrated branch cannot be used for development anymore and should be removed).

Uwe Schindler
added a comment - 01/May/11 10:08 Thanks Simon!
As noted before, this is minor and should be done after the merge was committed, else it would produce lot of useless work
I will commit this tomorrow morning MEST.
After committing, I will move the branch as a new SVN tag (realtime_DWPT_latest or like that) and if somebody wants to start again with more realtime work, we should branch again (according to the SVN book: a reintegrated branch cannot be used for development anymore and should be removed).

In ThreadAffinityDocumentsWriterThreadPool#getAndLock() we had talked about switching from a per-threadstate queue (safeway model) to a single queue (whole foods). I'm wondering if we should do that before we commit or change that later as a separate patch?

I opened LUCENE-3060 for this. @buschmi maybe you can add some more info to that issue if you recall the discussion?

Simon Willnauer
added a comment - 02/May/11 08:20 In ThreadAffinityDocumentsWriterThreadPool#getAndLock() we had talked about switching from a per-threadstate queue (safeway model) to a single queue (whole foods). I'm wondering if we should do that before we commit or change that later as a separate patch?
I opened LUCENE-3060 for this. @buschmi maybe you can add some more info to that issue if you recall the discussion?
Committed merged branch to trunk revision: 1098427
Moved branch away as tag in revision: 1098428
AWESOME!

Uwe Schindler
added a comment - 02/May/11 08:37 The first full Jenkins Build also succeeded. When reviewing the first Clover Build report, I noticed 2 new final classes, that have no code coverage at all (see https://builds.apache.org/hudson/job/Lucene-trunk/1549/clover-report/org/apache/lucene/index/pkg-summary.html ):
DocFieldConsumers
DocFieldConsumersPerField
I am not sure if those are old relicts (dead code) or newly added ones, but not yet used.

Uwe Schindler
added a comment - 02/May/11 14:19 I reopen this one, as the merge added a reincarnation of quicksort in DocFieldProcessor (which was previously removed in the corresponding *PerThread class, but lost during the merge).
I will fix soon.

Michael McCandless
added a comment - 03/May/11 17:24 This cutover to concurrent flushing (DWPT) produces astounding
increases in indexing throughput:
http://people.apache.org/~mikemccand/lucenebench/indexing.html
186 GB plain text per hour (from 101 GB/hour the day before)!!!
It's not every day you see an 84% jump in indexing throughput! Wow.
This is on a machine that has substantial CPU+IO concurrency, ie, it
was bottlenecked by our non-concurrent flush.
Also, I can now tune up the IW settings I use in those nightly
benchmarks; it's now 6 threads and "only" 512 MB RAM buffer. I'll
wait a few days and then do that.
Looks like a few queries got a bit slower... I suspect this is because
the index segment count has changed. Before concurrent flushing it
was this:
36(4.0):C4977400 _69(4.0):C4977400 _9c(4.0):C4977400 _cf(4.0):C4977400 _fi(4.0):C4977400 _fq(4.0):C497740 _g1(4.0):C497740 _gc(4.0):C497740 _gn(4.0):C497740 _gy(4.0):C497740 _gx(4.0):C49774 _gz(4.0):C49774 _h0(4.0):C49774 _h1(4.0):C49774 _h2(4.0):C49774 _h3(4.0):C468
After concurrent flushing:
_3d(4.0):C4977400 _6h(4.0):C4977400 _9j(4.0):C4977400 _cn(4.0):C4977400 _fq(4.0):C4977400 _fu(4.0):C497740 _g6(4.0):C497740 _gh(4.0):C497740 _gs(4.0):C497740 _h2(4.0):C497740 _gy(4.0):C49774 _gz(4.0):C49774 _h0(4.0):C49774 _h5(4.0):C4105 _1(4.0):C2627 _h4(4.0):C16331 _h3(4.0):C28728 _h1(4.0):C48225
So we have 2 extra segments... it's interesting how this affects some
queries but not others.