Activity

I haven't had a chance to check out the rest of the patch/issue, but for this specifically, what about a convention? Anything under the "persistent" key in the commit data is carried over indefinitely. Or if persistent is the norm, then we could reverse it and have a "transient" map that is not carried over.

The persistent/transient map sounds like a good idea; I will take a look at how that can be implemented

Greg Bowyer
added a comment - 29/Nov/12 19:10 I haven't had a chance to check out the rest of the patch/issue, but for this specifically, what about a convention? Anything under the "persistent" key in the commit data is carried over indefinitely. Or if persistent is the norm, then we could reverse it and have a "transient" map that is not carried over.
The persistent/transient map sounds like a good idea; I will take a look at how that can be implemented

I haven't had a chance to check out the rest of the patch/issue, but for this specifically, what about a convention? Anything under the "persistent" key in the commit data is carried over indefinitely. Or if persistent is the norm, then we could reverse it and have a "transient" map that is not carried over.

Yonik Seeley
added a comment - 29/Nov/12 03:36 Should previous index commits carry forward in solr for ease of use ?
I haven't had a chance to check out the rest of the patch/issue, but for this specifically, what about a convention? Anything under the "persistent" key in the commit data is carried over indefinitely. Or if persistent is the norm, then we could reverse it and have a "transient" map that is not carried over.

Greg Bowyer
added a comment - 29/Nov/12 03:17 I gave this another attempt today, and went full bore on trying to find all the locations of where userCommitData would need to be exposed to clients of the SOLR API.
There are a few questions in my mind about this:
The backwards compat for javabin is not obvious, do we want to change up the version on javabin
What should be the exacting behavior around soft and autocommits
Should previous index commits carry forward in solr for ease of use ?

We (Etsy) are interested in this issue for the same use-case that Eks Dev mentions – passing around timestamps and other meta-data for use by incremental indexers. We currently write out and replicate custom property files for this – using commitUserData would be preferable.

It seems like another use-case the commitUserData could be useful for is doing an empty commit that actually triggered replication, as the updated commitUserData will cause the segments file to be updated.

For our purposes, we'd just be using CommitUpdateCommand and DUH2 as our interfaces for writing the commitUserData, but exposing commitUserData to the SolrJ/HTTP interfaces does seem like a nice feature. I wonder where it would be useful to expose reading the commitUserData via SolrJ/HTTP as right now you still need low-level code to extract the commitUserData from an IndexReader. Perhaps stats.jsp could expose each key in the commitUserData as a stat?

Gregg Donovan
added a comment - 28/Dec/11 03:03 We (Etsy) are interested in this issue for the same use-case that Eks Dev mentions – passing around timestamps and other meta-data for use by incremental indexers. We currently write out and replicate custom property files for this – using commitUserData would be preferable.
It seems like another use-case the commitUserData could be useful for is doing an empty commit that actually triggered replication, as the updated commitUserData will cause the segments file to be updated.
For our purposes, we'd just be using CommitUpdateCommand and DUH2 as our interfaces for writing the commitUserData, but exposing commitUserData to the SolrJ/HTTP interfaces does seem like a nice feature. I wonder where it would be useful to expose reading the commitUserData via SolrJ/HTTP as right now you still need low-level code to extract the commitUserData from an IndexReader. Perhaps stats.jsp could expose each key in the commitUserData as a stat?

@Erick, just go ahead and take it.
I am not going to be working on this any time soon. At the moment I am using quck'n dirty patched trunk version (moving target anyways) with extended CommitCommand to pass Map around (sub-optimal approach? but does the work for now).

Some thinking about it, maybe you find something useful:

Take care, optimize and autoCommit do implicit commit (from user perspective, there is no explicit transaction to commit where we could pass Map parameters). This, as a consequence, requires Map<String, String> to be alive somewhere (DUH2 looks like the best place for it). Of course, one needs to expose some user interfaces that will enable map mutation and inquiry. This Map then becomes cached key-value pairs holder a user can change and solr offers guaranties to commit it on implicit/explicit commit and read it on reload/rollback

Rollback and restart, e.g. what happens to this map after restart (core reload)? I would suggest populating it with committed values, on rollback as well.

As a summary:

One thing is low level mechanics, this is easy: all changes are local to DUH2, one Map<String, String> and passing this instance to all commit commands you see there. Of course, reloading it on index reload/rollback

Much harder (at least for me): designing good user interface to maintain it,
... explicit changes vie request handler (admin like operation)
... as parameter of the commit command (nice)
... somehow hooking into update chain elegantly (My primary use case! I keep track of the max timestamp in this map (actually in AtomicLong, just populating Map on commit) to control incremental updates, but my use case is dumb easy to support with patched CommitCommand as I have only explicit commits (this wold not work with e.g. autoCommit, you would need Map instance for it).

e.g. Look at DIH, it uses internal counters and file system to persist it for this, that could be much better served by lucene commit guaranties...

On another note, keeping real-time (not committed values) track of min/max values for user defined fields would make sense for incremental update scenarios, I do not know if there is something in lucene/solr for it already, but this is another, but somehow related issue...

Eks Dev
added a comment - 07/Nov/11 20:02 @Erick, just go ahead and take it.
I am not going to be working on this any time soon. At the moment I am using quck'n dirty patched trunk version (moving target anyways) with extended CommitCommand to pass Map around (sub-optimal approach? but does the work for now).
Some thinking about it, maybe you find something useful:
Take care, optimize and autoCommit do implicit commit (from user perspective, there is no explicit transaction to commit where we could pass Map parameters). This, as a consequence, requires Map<String, String> to be alive somewhere (DUH2 looks like the best place for it). Of course, one needs to expose some user interfaces that will enable map mutation and inquiry. This Map then becomes cached key-value pairs holder a user can change and solr offers guaranties to commit it on implicit/explicit commit and read it on reload/rollback
Rollback and restart, e.g. what happens to this map after restart (core reload)? I would suggest populating it with committed values, on rollback as well.
As a summary:
One thing is low level mechanics, this is easy: all changes are local to DUH2, one Map<String, String> and passing this instance to all commit commands you see there. Of course, reloading it on index reload/rollback
Much harder (at least for me): designing good user interface to maintain it,
... explicit changes vie request handler (admin like operation)
... as parameter of the commit command (nice)
... somehow hooking into update chain elegantly (My primary use case! I keep track of the max timestamp in this map (actually in AtomicLong, just populating Map on commit) to control incremental updates, but my use case is dumb easy to support with patched CommitCommand as I have only explicit commits (this wold not work with e.g. autoCommit, you would need Map instance for it).
e.g. Look at DIH, it uses internal counters and file system to persist it for this, that could be much better served by lucene commit guaranties...
On another note, keeping real-time (not committed values) track of min/max values for user defined fields would make sense for incremental update scenarios, I do not know if there is something in lucene/solr for it already, but this is another, but somehow related issue...
Cheers,
Eks

I'd like to move this forward, so I'm soliciting a bit of advice about how to proceed.

I'm interested here in getting this into SolrJ, it's not clear to me that this belongs in, say, an XML input file (and csv and json and...) since we have a nice clean document add format and trying to put index meta-data in there seems like a bad thing to do.

Anyway, if we do go down the SolrJ route, it seems like SolrServer needs either two more commit methods that take a Map<String, String> or something like a new addUserData method, the latter seems cleaner.

Then we'd have to do something with UpdateRequest to get the use-data passed over to the Solr server and from there pass it on through to the writer.commit.

Mostly, I'm looking for guidance on whether this is a reasonable approach or if it's wrong-headed from the start, in which case I'll take any suggestions gladly.. Haven't started to code anything yet, so changes in the approach are really cheap....

Erick Erickson
added a comment - 07/Nov/11 16:09 I'd like to move this forward, so I'm soliciting a bit of advice about how to proceed.
I'm interested here in getting this into SolrJ, it's not clear to me that this belongs in, say, an XML input file (and csv and json and...) since we have a nice clean document add format and trying to put index meta-data in there seems like a bad thing to do.
Anyway, if we do go down the SolrJ route, it seems like SolrServer needs either two more commit methods that take a Map<String, String> or something like a new addUserData method, the latter seems cleaner.
Then we'd have to do something with UpdateRequest to get the use-data passed over to the Solr server and from there pass it on through to the writer.commit.
Mostly, I'm looking for guidance on whether this is a reasonable approach or if it's wrong-headed from the start, in which case I'll take any suggestions gladly.. Haven't started to code anything yet, so changes in the approach are really cheap....
Eks Dev: Do you want to push this forward and/or work on it together?

Eks Dev
added a comment - 08/Aug/11 20:19 rather simplistic approach, adding userCommitData to CommitUpdateCommand.
So we at least have a vehicle to pass it to IndexWriter.
No advanced machinery to make it available to non-expert users. At least ti is not wrong to have it there?
Eclipse removed some unused imports from DUH2 as well

one hook for users to update content of this map would be to add beforeCommit callbacks. This looks simple enough in UpdateHandler2.commit() call, but there is a catch:

We need to invoke listeners before we close() for implicit commits... having decref-ed IndexWriter, the question is if we want to run beforeCommit listeners even if IW does not really get closed (user updates map more often than needed).

IMO, this should not be a problem, invoking callbacks a little bit more often than needed.

Another place where we have "implicit commit" is newIndexWriter() /
here we need only to add IndexWriterProvider.isIndexWriterNull() to check if we need callbacks

A solution for close() would be also simple by adding IndexWriterProvider.isIndexGoingToCloseOnNextDecref() before invoking decref() to condition callbacks

Any better solution? Are the callbacks good approach to provide user hooks for this?

-------
Another approach is to get beforeCommitCallbacks at lucene level and piggy-back there for solr callbacks?
We would only need to change IndexWriter.commit(Map..) and close() but commit is final...

Notice: I am very rusty considering solr/lucene codebase => any help would be appreciated. Last patch I made here is ages ago

Eks Dev
added a comment - 06/Aug/11 23:23 one hook for users to update content of this map would be to add beforeCommit callbacks. This looks simple enough in UpdateHandler2.commit() call, but there is a catch:
We need to invoke listeners before we close() for implicit commits... having decref-ed IndexWriter, the question is if we want to run beforeCommit listeners even if IW does not really get closed (user updates map more often than needed).
IMO, this should not be a problem, invoking callbacks a little bit more often than needed.
Another place where we have "implicit commit" is newIndexWriter() /
here we need only to add IndexWriterProvider.isIndexWriterNull() to check if we need callbacks
A solution for close() would be also simple by adding IndexWriterProvider.isIndexGoingToCloseOnNextDecref() before invoking decref() to condition callbacks
Any better solution? Are the callbacks good approach to provide user hooks for this?
-------
Another approach is to get beforeCommitCallbacks at lucene level and piggy-back there for solr callbacks?
We would only need to change IndexWriter.commit(Map..) and close() but commit is final...
Notice: I am very rusty considering solr/lucene codebase => any help would be appreciated. Last patch I made here is ages ago