Details

Description

It should be possible to reconfigure Solr plugins' resources and init params without directly editing the serialized schema or solrconfig.xml (see Hoss's arguments about this in the context of the schema, which also apply to solrconfig.xml, in the description of SOLR-4658)

The RESTManager should allow plugins declared in either the schema or in solrconfig.xml to register one or more REST endpoints, one endpoint per reconfigurable resource, including init params. To allow for multiple plugin instances, registering plugins will need to provide a handle of some form to distinguish the instances.

This RESTManager should also be able to create new instances of plugins that it has been configured to allow. The RESTManager will need its own serialized configuration to remember these plugin declarations.

We should aim for full CRUD over init params and structured resources. The plugins will bear responsibility for handling resource modification requests, though we should provide utility methods to make this easy.

However, since we won't be directly modifying the serialized schema and solrconfig.xml, anything configured in those two places can't be invalidated by configuration serialized elsewhere. As a result, it won't be possible to remove plugins declared in the serialized schema or solrconfig.xml. Similarly, any init params declared in either place won't be modifiable. Instead, there should be some form of init param that declares that the plugin is reconfigurable, maybe using something like "managed" - note that request handlers already provide a "handle" - the request handler name - and so don't need that to be separately specified:

All of the above examples use the existing plugin factory class names, but we'll have to create new RESTManager-aware classes to handle registration with RESTManager.

Core/collection reloading should not be performed automatically when a REST API call is made to one of these RESTManager-mediated REST endpoints, since for batched config modifications, that could take way too long. But maybe reloading could be a query parameter to these REST API calls.

Activity

Here is the first attempt at a solution for the RestManager and implementations for managing stop words and synonyms via a REST API.

A few things to notice about this implementation:

1) The RestManager needs to be able to read/write data from/to ZooKeeper if in cloud mode or the local FS if in standalone mode. This is the purpose of the ManagedResourceStorage.StorageIO interface. The idea here is that the RestManager receives the StorageIO in its constructor from the SolrCore during initialization. Currently, this is done in the SolrCore object, which has to do an instanceof on the SolrResourceLoader to determine if it is ZK aware. This is a bit hacky but I didn't see a better way to determine if a core is running in ZK mode from within the SolrCore object. Currently, I provide 3 implementations of StorageIO: ZooKeeperStorageIO, FileStorageIO, and InMemoryStorageIO.

2) A ManagedResource should be able to choose its own storage format, with the storageIO being determined by the container.
This gives the ManagedResource developer flexibility in how they store data without having to fuss with knowing how load/store bytes to ZK or local FS. Currently, the only provided storage format is JSON, see: ManagedResourceStorage.JsonStorage.

3) I'm using a "registry" object that is available from the SolrResourceLoader to capture Solr components that declare themselves as being "managed". This is needed because parsing the solrconfig.xml may encounter managed components before it parses and initializes the RestManager. Basically, I wanted to separate the registration of managed components from the initialization of the RestManager and those components as I didn't want to force the position of the <restManager/> element in the solrconfig.xml to be before all other components.

4) The design is based around the concept that there may be many different Solr components that share a single ManagedResource. For instance, there may be many ManagedStopFilterFactory instances declared in schema.xml that share a common set of managed English stop words. Thus, I'm using the "observer" pattern which allows Solr components to register as an observer of a shared ManagedResource. This way we don't end up with 10 different managers of the same stop word list.

5) ManagedResourceObserver instances are notified once during core initialization (load or reload) when the managed data is available. This is their signal to internalize the managed data, such as the ManagedStopFilterFactory converting the managed set of terms into a CharArraySet used for creating StopFilters. This is a critical part of the design in that updates to the managed data are not applied until a core is reloaded. This is to avoid having analysis components with different views of managed data, i.e. we don't want some of the replicas for a shard to have a different set of stop words than the other shards.

6) I've provided one concrete ManagedResource implementation for managing a word set, which is useful for stop words and protected words (KeywordMarkerFilter). This implementation shows how to handle initArgs and a managedList of words.

Known Issues:

a. The current RestManager attaches its registered endpoints using SolrRestApi, which is configured to process requests to /collection/schema. While this path works for stop words and synonyms, it doesn't work in the general case of any type of ManagedResource. We need to figure out a better path for which to configure the RestManager, but re-working that should be minor.

b. I had to make a few things public in the BaseSchemaResource class and extended the RestManager.ManagedEndpoint class from it. We should refactor BaseSchemaResource into a BaseSolrResource as it has usefulness beyond schema related resources.

c. Deletes - the ManagedResource framework supports deletes but I wasn't sure how to enable them in Restlet; again probably a minor issue in the restlet config / setup.

Timothy Potter
added a comment - 11/Feb/14 00:55 Here is the first attempt at a solution for the RestManager and implementations for managing stop words and synonyms via a REST API.
A few things to notice about this implementation:
1) The RestManager needs to be able to read/write data from/to ZooKeeper if in cloud mode or the local FS if in standalone mode. This is the purpose of the ManagedResourceStorage.StorageIO interface. The idea here is that the RestManager receives the StorageIO in its constructor from the SolrCore during initialization. Currently, this is done in the SolrCore object, which has to do an instanceof on the SolrResourceLoader to determine if it is ZK aware. This is a bit hacky but I didn't see a better way to determine if a core is running in ZK mode from within the SolrCore object. Currently, I provide 3 implementations of StorageIO: ZooKeeperStorageIO, FileStorageIO, and InMemoryStorageIO.
2) A ManagedResource should be able to choose its own storage format, with the storageIO being determined by the container.
This gives the ManagedResource developer flexibility in how they store data without having to fuss with knowing how load/store bytes to ZK or local FS. Currently, the only provided storage format is JSON, see: ManagedResourceStorage.JsonStorage.
3) I'm using a "registry" object that is available from the SolrResourceLoader to capture Solr components that declare themselves as being "managed". This is needed because parsing the solrconfig.xml may encounter managed components before it parses and initializes the RestManager. Basically, I wanted to separate the registration of managed components from the initialization of the RestManager and those components as I didn't want to force the position of the <restManager/> element in the solrconfig.xml to be before all other components.
4) The design is based around the concept that there may be many different Solr components that share a single ManagedResource. For instance, there may be many ManagedStopFilterFactory instances declared in schema.xml that share a common set of managed English stop words. Thus, I'm using the "observer" pattern which allows Solr components to register as an observer of a shared ManagedResource. This way we don't end up with 10 different managers of the same stop word list.
5) ManagedResourceObserver instances are notified once during core initialization (load or reload) when the managed data is available. This is their signal to internalize the managed data, such as the ManagedStopFilterFactory converting the managed set of terms into a CharArraySet used for creating StopFilters. This is a critical part of the design in that updates to the managed data are not applied until a core is reloaded. This is to avoid having analysis components with different views of managed data, i.e. we don't want some of the replicas for a shard to have a different set of stop words than the other shards.
6) I've provided one concrete ManagedResource implementation for managing a word set, which is useful for stop words and protected words (KeywordMarkerFilter). This implementation shows how to handle initArgs and a managedList of words.
Known Issues:
a. The current RestManager attaches its registered endpoints using SolrRestApi, which is configured to process requests to /collection/schema. While this path works for stop words and synonyms, it doesn't work in the general case of any type of ManagedResource. We need to figure out a better path for which to configure the RestManager, but re-working that should be minor.
b. I had to make a few things public in the BaseSchemaResource class and extended the RestManager.ManagedEndpoint class from it. We should refactor BaseSchemaResource into a BaseSolrResource as it has usefulness beyond schema related resources.
c. Deletes - the ManagedResource framework supports deletes but I wasn't sure how to enable them in Restlet; again probably a minor issue in the restlet config / setup.

Mark Miller
added a comment - 11/Feb/14 00:59 1. but I didn't see a better way to determine if a core is running in ZK mode from within the SolrCore object.
You can look at the descriptor for the core, get the corecontainer and call isZooKeeperAware.

Alan Woodward
added a comment - 11/Feb/14 09:30 Or maybe add a getStorageIO() method to SolrResourceLoader itself? We already abstract a lot of 'is this stuff in zk or on the file system' questions into here.

I've looked over your code, and it looks very well thought out - nice!

A few comments, in no particular order - I intentionally haven't looked at your synonyms/stopwords patches yet, so apologies in advance if that results in misunderstanding on my part:

In SolrDispatchFilter.doFilter(): The section starting if( path.startsWith("/rest") ) appears to be a vestige? It's not used anywhere. (May be related to your known issue a) about RestManager endpoints?)

There's no way to create a new resource (i.e., one not mentioned in schema.xml/solrconfig.xml) - maybe that's what the above vestige was intended for?

I don't see any REST API tests?

I think we should limit the permissible top-level registerable paths, currently to /schema/ and /config/. (This relates to your known issue a) about RestManager endpoints.)

In looking at the patch on SOLR-5641, which will need to be changed to use the RestManager solution developed here, I couldn't figure out a way to have a single Restlet Application subclass that covers more than one top-level path (/schema/ and /config/ in that issue). If you look at the patch on SOLR-5641, you'll see that it adds a new Restlet Application subclass. My sense is that even if it might be possible to trick Restlet into doing this, it's an anti-pattern. (This relates to your known issue b) about refactoring BaseSchemaResource.)

It's important that configuration in schema.xml/solrconfig.xml remains valid regardless of the operation of the RestManager interacting with managed resources; as written, it's possible to have initArgs in these files that are contradicted by initArgs stored in a manage resource. This is bad. I think we need to have an immutable initArgs concept, and it will either have to be per-arg, or we'll have to disallow mixing explicit initArgs with modifiable ones.

About your known issue c): "Deletes - the ManagedResource framework supports deletes but I wasn't sure how to enable them in Restlet" - not sure about Restlet, but there are two Solr code locations I know of that need to be changed to enable new HTTP verbs (I ran into this when I added PUT support): SolrRequestParsers.parseParamsAndFillStreams() and SolrDispatchFilter.remoteQuery().

In ManagedWordSetResource, onManagedDataLoadedFromStorage() and updateInitArgs(), the String.toLowerCase() calls should be given a Locale argument (likely Locale.ROOT) to escape the wrath of forbidden-apis. Running ant precommit at the top level will catch this kind of problem (and other things).

In ManagedWordSetResource.updateInitArgs(), if ignoreCase is changed from true to false, the previous downcasing operation can't be undone. Not sure the best way to handle this: maybe serialization should always capture the unprocessed original versions?

ManagedWordSetResource.doGet() ignores ignoreCase

In the current model, all managed resource GET calls will return dirty managed resource data, rather than live data. I think it's important to make the dirty data accessible, but we should consider whether live data should be accessible in addition, maybe as a param to the GET call?

In your tests, you escape double-quotes in JSON strings, but there's a nice method in SolrTestCaseJ4 named json() that will accept single-quoted string values and auto-convert to double-quotes. Makes the JSON much easier to look at without all the backslashes.

ManagedResource assumes JSON storage format, but the idea is to override methods to handle other formats, right? I think there should be a complete example of doing that in a test.

It seems weird to me that PUT requests in your patch respond with the same information as GET would against the same resource. Maybe just return a string indicating success? (I might be off-base here, I haven't looked at many examples of this elsewhere.)

The patch ignores the (wt param) on GET methods - this should be fairly simple to fix, by storing data structures rather than JSON strings on the response; see e.g. CopyFieldCollectionResource.get().

Restlet's Reference class, an instance of which is returned by getRequest.getResourceRef(), has a method getRelativeRef() that can extract a URI reference relative to an absolute base reference, so you could do something like (untested):

Steve Rowe
added a comment - 16/Feb/14 00:49 Hi Tim,
I've looked over your code, and it looks very well thought out - nice!
A few comments, in no particular order - I intentionally haven't looked at your synonyms/stopwords patches yet, so apologies in advance if that results in misunderstanding on my part:
In SolrDispatchFilter.doFilter() : The section starting if( path.startsWith("/rest") ) appears to be a vestige? It's not used anywhere. (May be related to your known issue a) about RestManager endpoints?)
There's no way to create a new resource (i.e., one not mentioned in schema.xml / solrconfig.xml ) - maybe that's what the above vestige was intended for?
I don't see any REST API tests?
I think we should limit the permissible top-level registerable paths, currently to /schema/ and /config/ . (This relates to your known issue a) about RestManager endpoints.)
In looking at the patch on SOLR-5641 , which will need to be changed to use the RestManager solution developed here, I couldn't figure out a way to have a single Restlet Application subclass that covers more than one top-level path ( /schema/ and /config/ in that issue). If you look at the patch on SOLR-5641 , you'll see that it adds a new Restlet Application subclass. My sense is that even if it might be possible to trick Restlet into doing this, it's an anti-pattern. (This relates to your known issue b) about refactoring BaseSchemaResource.)
It's important that configuration in schema.xml / solrconfig.xml remains valid regardless of the operation of the RestManager interacting with managed resources; as written, it's possible to have initArgs in these files that are contradicted by initArgs stored in a manage resource. This is bad. I think we need to have an immutable initArgs concept, and it will either have to be per-arg, or we'll have to disallow mixing explicit initArgs with modifiable ones.
About your known issue c) : "Deletes - the ManagedResource framework supports deletes but I wasn't sure how to enable them in Restlet" - not sure about Restlet, but there are two Solr code locations I know of that need to be changed to enable new HTTP verbs (I ran into this when I added PUT support): SolrRequestParsers.parseParamsAndFillStreams() and SolrDispatchFilter.remoteQuery() .
In ManagedWordSetResource , onManagedDataLoadedFromStorage() and updateInitArgs() , the String.toLowerCase() calls should be given a Locale argument (likely Locale.ROOT ) to escape the wrath of forbidden-apis. Running ant precommit at the top level will catch this kind of problem (and other things).
You didn't mention it here, but if we want to enable using PATCH REST calls via Restlet, we'll need to upgrade Restlet from the current v2.1.1 to at least v2.2M1 - note that v2.2RC1 was just released. (See the issue adding PATCH support and the v2.2RC1 release announcement .)
In ManagedWordSetResource.updateInitArgs() , if ignoreCase is changed from true to false , the previous downcasing operation can't be undone. Not sure the best way to handle this: maybe serialization should always capture the unprocessed original versions?
ManagedWordSetResource.doGet() ignores ignoreCase
In the current model, all managed resource GET calls will return dirty managed resource data, rather than live data. I think it's important to make the dirty data accessible, but we should consider whether live data should be accessible in addition, maybe as a param to the GET call?
In your tests, you escape double-quotes in JSON strings, but there's a nice method in SolrTestCaseJ4 named json() that will accept single-quoted string values and auto-convert to double-quotes. Makes the JSON much easier to look at without all the backslashes.
ManagedResource assumes JSON storage format, but the idea is to override methods to handle other formats, right? I think there should be a complete example of doing that in a test.
It seems weird to me that PUT requests in your patch respond with the same information as GET would against the same resource. Maybe just return a string indicating success? (I might be off-base here, I haven't looked at many examples of this elsewhere.)
The patch ignores the ( wt param) on GET methods - this should be fairly simple to fix, by storing data structures rather than JSON strings on the response; see e.g. CopyFieldCollectionResource.get() .
In RestManager.doInit() on line 128, you have a TODO:
// TODO: Feels like there should be a method in restlet to do this
String rootPath = getRequest().getRootRef().toString();
String resourceRef = getRequest().getResourceRef().toString();
String resourceId = resourceRef.substring(rootPath.length());
int qsAt = resourceId.indexOf( "?" );
if (qsAt != -1) resourceId = resourceId.substring(0,qsAt);
Restlet's Reference class, an instance of which is returned by getRequest.getResourceRef() , has a method getRelativeRef() that can extract a URI reference relative to an absolute base reference, so you could do something like (untested):
Reference rootRef = getRequest().getRootRef();
String resourceId = getRequest().getResourceRef().getRelativeRef(rootRef).getPath();

2. There's no way to create a new resource (i.e., one not mentioned in schema.xml/solrconfig.xml)

The RestManager's ManagedEndpoint class is now attached as the default resource class (no longer DefaultSchemaResource). If a request to create a new resource comes in, the PUT/POST request will be routed to the RestManager's RestManagerManagedResource, which knows how to create new managed resources and make them active in the server.

3. I don't see any REST API tests?

Most of the API testing is in the concrete endpoints (stop words / synonym) as most of what needs to be tested in the RestManager layer can be tested without making REST API calls. However, I've also added some additional REST API tests to TestRestManager.

4. I think we should limit the permissible top-level registerable paths, currently to /schema/ and /config/.

Done, but required the introduction of SolrConfigRestApi class and changes to Solr's web.xml. At this point, ManagedResource implementers have to choose which path makes the most sense for their resource, see ManagedStopFilterFactory as an example.
We may be able to get away with 1 class that extends Restlet's Application by implementing a custom Finder. However, I wanted to get something in place that solidifies the paths the API will support as soon as possible. If we determine a cleaner way to support /schema and /config, then we should be able to do that without breaking client code.

5. In looking at the patch on SOLR-5641, which will need to be changed to use the RestManager solution developed here ...

TBD - That's a large patch that will need more time to go over. There is some overlap in that this patch also includes a SolrConfigRestApi.

6. It's important that configuration in schema.xml/solrconfig.xml remains valid regardless of the operation of the RestManager interacting with managed resources

I've added a check for having additional args to the BaseManagedTokenFilterFactory however there's nothing in the RestManager framework that enforces this.
The main idea behind my implementation is that the "managed" attribute is the only one declared and then all initArgs are stored/managed in the managed data. That said, I could see the validity of supporting invariant initArgs so this might need some more work.

7. About your known issue c): "Deletes - the ManagedResource framework supports deletes but I wasn't sure how to enable them in Restlet" - not sure about Restlet, but there are two Solr code locations I know of that need to be changed to enable new HTTP verbs (I ran into this when I added PUT support): SolrRequestParsers.parseParamsAndFillStreams() and SolrDispatchFilter.remoteQuery().

Thanks. Deletes are working now. However, a ManagedResource cannot be deleted if there are Solr components still using it.

8. In ManagedWordSetResource, onManagedDataLoadedFromStorage() and updateInitArgs(), the String.toLowerCase() calls should be given a Locale argument (likely Locale.ROOT) to escape the wrath of forbidden-apis. Running ant precommit at the top level will catch this kind of problem (and other things).

Fixed.

9. You didn't mention it here, but if we want to enable using PATCH REST calls via Restlet, we'll need to upgrade Restlet from the current v2.1.1 to at least v2.2M1 - note that v2.2RC1 was just released. (See the issue adding PATCH support and the v2.2RC1 release announcement.)

We should tackle upgrading Restlet in a different JIRA issue and then PATCH support can be added later if desired.

10. In ManagedWordSetResource.updateInitArgs(), if ignoreCase is changed from true to false, the previous downcasing operation can't be undone. Not sure the best way to handle this: maybe serialization should always capture the unprocessed original versions?

This is a tough one. My thinking is that we shouldn't worry about this for now since it seems like more work / trouble supporting than it may be worth and seems like a client app issue vs. something that needs to be built-in to the RestManager framework. That said, I'm also open to hearing about cases where we should support this.

11. ManagedWordSetResource.doGet() ignores ignoreCase

Fixed. Improved the unit tests to verify this as well.

12. In the current model, all managed resource GET calls will return dirty managed resource data, rather than live data. I think it's important to make the dirty data accessible, but we should consider whether live data should be accessible in addition, maybe as a param to the GET call?

The use case I designed for was that updates would be applied using a core / collection reload very soon after submitting changes to the API. In other words, the time where live data and dirty data are different should very small and not all that important. Mainly, I don't want to start going down the path of implementing as version control system for what is supposed to be a simple API for changing config settings and then reloading the core to apply them.

13. In your tests, you escape double-quotes in JSON strings, but there's a nice method in SolrTestCaseJ4 named json() that will accept single-quoted string values and auto-convert to double-quotes. Makes the JSON much easier to look at without all the backslashes.

Fixed. Thanks for tip, much nicer to look at indeed.

14. ManagedResource assumes JSON storage format, but the idea is to override methods to handle other formats, right? I think there should be a complete example of doing that in a test.

Added. Please see - TestManagedResource#testCustomStorageFormat

15. It seems weird to me that PUT requests in your patch respond with the same information as GET would against the same resource. Maybe just return a string indicating success? (I might be off-base here, I haven't looked at many examples of this elsewhere.)

Agreed on the weird part and fixed. Now just returns 200 status code

16. The patch ignores the (wt param) on GET methods - this should be fairly simple to fix, by storing data structures rather than JSON strings on the response; see e.g. CopyFieldCollectionResource.get().

Hmmm... Not sure what's up with this one as using wt=xml works for me without changes. To be sure, I added test to verify XML as well as JSON.

Timothy Potter
added a comment - 25/Feb/14 04:00 Thanks for the thorough review Steve, very helpful! The updated patch cleans up a number of the issues you raised, specifically:
1. In SolrDispatchFilter.doFilter(): The section starting if( path.startsWith("/rest") ) appears to be a vestige?
Fixed (was vestige). Currently, the RestManager supports /schema and /config only.
2. There's no way to create a new resource (i.e., one not mentioned in schema.xml/solrconfig.xml)
The RestManager's ManagedEndpoint class is now attached as the default resource class (no longer DefaultSchemaResource). If a request to create a new resource comes in, the PUT/POST request will be routed to the RestManager's RestManagerManagedResource, which knows how to create new managed resources and make them active in the server.
3. I don't see any REST API tests?
Most of the API testing is in the concrete endpoints (stop words / synonym) as most of what needs to be tested in the RestManager layer can be tested without making REST API calls. However, I've also added some additional REST API tests to TestRestManager.
4. I think we should limit the permissible top-level registerable paths, currently to /schema/ and /config/.
Done, but required the introduction of SolrConfigRestApi class and changes to Solr's web.xml. At this point, ManagedResource implementers have to choose which path makes the most sense for their resource, see ManagedStopFilterFactory as an example.
We may be able to get away with 1 class that extends Restlet's Application by implementing a custom Finder. However, I wanted to get something in place that solidifies the paths the API will support as soon as possible. If we determine a cleaner way to support /schema and /config, then we should be able to do that without breaking client code.
5. In looking at the patch on SOLR-5641 , which will need to be changed to use the RestManager solution developed here ...
TBD - That's a large patch that will need more time to go over. There is some overlap in that this patch also includes a SolrConfigRestApi.
6. It's important that configuration in schema.xml/solrconfig.xml remains valid regardless of the operation of the RestManager interacting with managed resources
I've added a check for having additional args to the BaseManagedTokenFilterFactory however there's nothing in the RestManager framework that enforces this.
The main idea behind my implementation is that the "managed" attribute is the only one declared and then all initArgs are stored/managed in the managed data. That said, I could see the validity of supporting invariant initArgs so this might need some more work.
7. About your known issue c): "Deletes - the ManagedResource framework supports deletes but I wasn't sure how to enable them in Restlet" - not sure about Restlet, but there are two Solr code locations I know of that need to be changed to enable new HTTP verbs (I ran into this when I added PUT support): SolrRequestParsers.parseParamsAndFillStreams() and SolrDispatchFilter.remoteQuery().
Thanks. Deletes are working now. However, a ManagedResource cannot be deleted if there are Solr components still using it.
8. In ManagedWordSetResource, onManagedDataLoadedFromStorage() and updateInitArgs(), the String.toLowerCase() calls should be given a Locale argument (likely Locale.ROOT) to escape the wrath of forbidden-apis. Running ant precommit at the top level will catch this kind of problem (and other things).
Fixed.
9. You didn't mention it here, but if we want to enable using PATCH REST calls via Restlet, we'll need to upgrade Restlet from the current v2.1.1 to at least v2.2M1 - note that v2.2RC1 was just released. (See the issue adding PATCH support and the v2.2RC1 release announcement.)
We should tackle upgrading Restlet in a different JIRA issue and then PATCH support can be added later if desired.
10. In ManagedWordSetResource.updateInitArgs(), if ignoreCase is changed from true to false, the previous downcasing operation can't be undone. Not sure the best way to handle this: maybe serialization should always capture the unprocessed original versions?
This is a tough one. My thinking is that we shouldn't worry about this for now since it seems like more work / trouble supporting than it may be worth and seems like a client app issue vs. something that needs to be built-in to the RestManager framework. That said, I'm also open to hearing about cases where we should support this.
11. ManagedWordSetResource.doGet() ignores ignoreCase
Fixed. Improved the unit tests to verify this as well.
12. In the current model, all managed resource GET calls will return dirty managed resource data, rather than live data. I think it's important to make the dirty data accessible, but we should consider whether live data should be accessible in addition, maybe as a param to the GET call?
The use case I designed for was that updates would be applied using a core / collection reload very soon after submitting changes to the API. In other words, the time where live data and dirty data are different should very small and not all that important. Mainly, I don't want to start going down the path of implementing as version control system for what is supposed to be a simple API for changing config settings and then reloading the core to apply them.
13. In your tests, you escape double-quotes in JSON strings, but there's a nice method in SolrTestCaseJ4 named json() that will accept single-quoted string values and auto-convert to double-quotes. Makes the JSON much easier to look at without all the backslashes.
Fixed. Thanks for tip, much nicer to look at indeed.
14. ManagedResource assumes JSON storage format, but the idea is to override methods to handle other formats, right? I think there should be a complete example of doing that in a test.
Added. Please see - TestManagedResource#testCustomStorageFormat
15. It seems weird to me that PUT requests in your patch respond with the same information as GET would against the same resource. Maybe just return a string indicating success? (I might be off-base here, I haven't looked at many examples of this elsewhere.)
Agreed on the weird part and fixed. Now just returns 200 status code
16. The patch ignores the (wt param) on GET methods - this should be fairly simple to fix, by storing data structures rather than JSON strings on the response; see e.g. CopyFieldCollectionResource.get().
Hmmm... Not sure what's up with this one as using wt=xml works for me without changes. To be sure, I added test to verify XML as well as JSON.
17. In RestManager.doInit() on line 128, you have a TODO:
Good catch ... Thanks!

I'll look at your new patch this afternoon, but first I'll comment on some of your responses to my review:

4. I think we should limit the permissible top-level registerable paths, currently to /schema/ and /config/.

Done, but required the introduction of SolrConfigRestApi class and changes to Solr's web.xml. At this point, ManagedResource implementers have to choose which path makes the most sense for their resource, see ManagedStopFilterFactory as an example.

We may be able to get away with 1 class that extends Restlet's Application by implementing a custom Finder. However, I wanted to get something in place that solidifies the paths the API will support as soon as possible. If we determine a cleaner way to support /schema and /config, then we should be able to do that without breaking client code.

Cool, sounds like a good plan - I didn't realize a custom Finder would address the multiple Applications issue - I'll have a look.

6. It's important that configuration in schema.xml/solrconfig.xml remains valid regardless of the operation of the RestManager interacting with managed resources

I've added a check for having additional args to the BaseManagedTokenFilterFactory however there's nothing in the RestManager framework that enforces this.

The main idea behind my implementation is that the "managed" attribute is the only one declared and then all initArgs are stored/managed in the managed data. That said, I could see the validity of supporting invariant initArgs so this might need some more work.

Maybe it'll have to remain the case that each implementation ensures there are no non-"managed" attributes. If nothing else, this choice ("managed" is the only supported attribute) should be doc'd (if not already), so that implementers know they need to perform this check.

10. In ManagedWordSetResource.updateInitArgs(), if ignoreCase is changed from true to false, the previous downcasing operation can't be undone. Not sure the best way to handle this: maybe serialization should always capture the unprocessed original versions?

This is a tough one. My thinking is that we shouldn't worry about this for now since it seems like more work / trouble supporting than it may be worth and seems like a client app issue vs. something that needs to be built-in to the RestManager framework. That said, I'm also open to hearing about cases where we should support this.

If we don't support that use case, the request must fail. This limitation should also be doc'd (if not already).

12. In the current model, all managed resource GET calls will return dirty managed resource data, rather than live data. I think it's important to make the dirty data accessible, but we should consider whether live data should be accessible in addition, maybe as a param to the GET call?

The use case I designed for was that updates would be applied using a core / collection reload very soon after submitting changes to the API. In other words, the time where live data and dirty data are different should very small and not all that important. Mainly, I don't want to start going down the path of implementing as version control system for what is supposed to be a simple API for changing config settings and then reloading the core to apply them.

Fair enough. This should be doc'd (if not already): this API may temporarily lie to you.

16. The patch ignores the (wt param) on GET methods - this should be fairly simple to fix, by storing data structures rather than JSON strings on the response; see e.g. CopyFieldCollectionResource.get().

Hmmm... Not sure what's up with this one as using wt=xml works for me without changes. To be sure, I added test to verify XML as well as JSON.

Steve Rowe
added a comment - 25/Feb/14 18:22 Hi Tim,
I'll look at your new patch this afternoon, but first I'll comment on some of your responses to my review:
4. I think we should limit the permissible top-level registerable paths, currently to /schema/ and /config/.
Done, but required the introduction of SolrConfigRestApi class and changes to Solr's web.xml. At this point, ManagedResource implementers have to choose which path makes the most sense for their resource, see ManagedStopFilterFactory as an example.
We may be able to get away with 1 class that extends Restlet's Application by implementing a custom Finder. However, I wanted to get something in place that solidifies the paths the API will support as soon as possible. If we determine a cleaner way to support /schema and /config, then we should be able to do that without breaking client code.
Cool, sounds like a good plan - I didn't realize a custom Finder would address the multiple Applications issue - I'll have a look.
6. It's important that configuration in schema.xml/solrconfig.xml remains valid regardless of the operation of the RestManager interacting with managed resources
I've added a check for having additional args to the BaseManagedTokenFilterFactory however there's nothing in the RestManager framework that enforces this.
The main idea behind my implementation is that the "managed" attribute is the only one declared and then all initArgs are stored/managed in the managed data. That said, I could see the validity of supporting invariant initArgs so this might need some more work.
Maybe it'll have to remain the case that each implementation ensures there are no non-"managed" attributes. If nothing else, this choice ("managed" is the only supported attribute) should be doc'd (if not already), so that implementers know they need to perform this check.
10. In ManagedWordSetResource.updateInitArgs(), if ignoreCase is changed from true to false, the previous downcasing operation can't be undone. Not sure the best way to handle this: maybe serialization should always capture the unprocessed original versions?
This is a tough one. My thinking is that we shouldn't worry about this for now since it seems like more work / trouble supporting than it may be worth and seems like a client app issue vs. something that needs to be built-in to the RestManager framework. That said, I'm also open to hearing about cases where we should support this.
If we don't support that use case, the request must fail. This limitation should also be doc'd (if not already).
12. In the current model, all managed resource GET calls will return dirty managed resource data, rather than live data. I think it's important to make the dirty data accessible, but we should consider whether live data should be accessible in addition, maybe as a param to the GET call?
The use case I designed for was that updates would be applied using a core / collection reload very soon after submitting changes to the API. In other words, the time where live data and dirty data are different should very small and not all that important. Mainly, I don't want to start going down the path of implementing as version control system for what is supposed to be a simple API for changing config settings and then reloading the core to apply them.
Fair enough. This should be doc'd (if not already): this API may temporarily lie to you.
16. The patch ignores the (wt param) on GET methods - this should be fairly simple to fix, by storing data structures rather than JSON strings on the response; see e.g. CopyFieldCollectionResource.get().
Hmmm... Not sure what's up with this one as using wt=xml works for me without changes. To be sure, I added test to verify XML as well as JSON.
My bad - this was based on code inspection. I'll take another look.

No major changes in this patch, but the previous one incorrectly pulled in some dependencies on the patches for SOLR-5654 and SOLR-5655. In addition, one of the TestRestManager tests had a subtle dependency on the patch for SOLR-5655. So this patch ensures it works w/o having to apply the patches for the other tickets. Btw ... I'm not going to bother posting more patches for the other tickets until this one gets resolved.

Timothy Potter
added a comment - 28/Feb/14 05:36 - edited No major changes in this patch, but the previous one incorrectly pulled in some dependencies on the patches for SOLR-5654 and SOLR-5655 . In addition, one of the TestRestManager tests had a subtle dependency on the patch for SOLR-5655 . So this patch ensures it works w/o having to apply the patches for the other tickets. Btw ... I'm not going to bother posting more patches for the other tickets until this one gets resolved.

Removed DefaultSchemaResource.java; unknown URI paths under /schema and /config are now handled by RestManager.ManagedEndpoint

RestManager.Registry now protects against registration of resourceId-s that are already in use by the Schema REST API - protecting /config/managed and /schema/managed is now handled via this general mechanism

throw an exception if current ignoreCase = true and updated ignoreCase = false, since change this is not permitted

In RestManager.addManagedResource(), now assert'ing that the resourceId validation result from matches() is true, rather than throwing away the result; registry.registerManagedResource(), called earlier in addManagedResource(), already ensures that the regex matches against the resourceId.

Steve Rowe
added a comment - 10/Mar/14 15:49 Patch, building on Tim's patch.
Left to do:
init args should be moved to NamedList (typed nested values) instead of the current String->String map, to support solrconfig.xml plugin init args
javadocs should be added where there are none
This patch has some minor cleanups, as well as the following changes:
Renamed SolrRestApi -> SolrSchemaRestApi
Enabled short-form "solr.classname" class lookup for o.a.s.rest.schema.analysis (e.g. "solr.ManagedWordSetResource" )
Finished the BaseSchemaResource -> BaseSolrResource renaming by executing svn mv [...]/BaseSchemaResource [...]/BaseSolrResource (to retain svn history) and making all classes extending BaseSchemaResource extend BaseSolrResource instead
Removed DefaultSchemaResource.java ; unknown URI paths under /schema and /config are now handled by RestManager.ManagedEndpoint
RestManager.Registry now protects against registration of resourceId-s that are already in use by the Schema REST API - protecting /config/managed and /schema/managed is now handled via this general mechanism
TestRestManager :
added tests that already-spoken-for REST API endpoints can't be registered
added tests for switching ignoreCase of ManagedWordSetResource
added XML response format test
ManagedWordSetResource.updateInitArgs() :
compare current/updated ignoreCase vals as booleans, instead of as string args
throw an exception if current ignoreCase = true and updated ignoreCase = false, since change this is not permitted
In RestManager.addManagedResource() , now assert 'ing that the resourceId validation result from matches() is true, rather than throwing away the result; registry.registerManagedResource() , called earlier in addManagedResource() , already ensures that the regex matches against the resourceId.

Steve Rowe
added a comment - 11/Mar/14 19:46
Left to do:
init args should be moved to NamedList (typed nested values) instead of the current String->String map, to support solrconfig.xml plugin init args
This patch switches ManagedResource init arg handling to NamedList , and converts a couple of existing tests to illustrate use of typed nested args.
javadocs should be added where there are none
With the exception of the missing javadocs, I think this is ready to go. I'll put up a final patch later today.

I like the conversion over to using NamedList<?> for managedInitArgs in general. However, I think we should serialize into a JSON Map when returning initArgs in the API. Apparently, a NamedList converts to JSON as a list and not a map, ie.

"wordSet":{ "initArgs":["ignoreCase","true"], ...

This seems like an awkward format for exposing to client applications consuming the API. I think a map representation of initArgs seems better, i.e.

Also, previously, the ManagedResource would return default values of the initArgs, but the conversion over to NamedList<?> has made that go away somehow. In other words, this test now fails when doing a GET against the initial state of the stop word resource because the default value for ignoreCase is not returned with the initial state of the resource anymore:

Timothy Potter
added a comment - 11/Mar/14 23:05 Looking good Steve ...
I like the conversion over to using NamedList<?> for managedInitArgs in general. However, I think we should serialize into a JSON Map when returning initArgs in the API. Apparently, a NamedList converts to JSON as a list and not a map, ie.
"wordSet":{ "initArgs": ["ignoreCase","true"] , ...
This seems like an awkward format for exposing to client applications consuming the API. I think a map representation of initArgs seems better, i.e.
"wordSet":{ "initArgs":
{ "ignoreCase":true, ... }
}​
To address this, can you add the following method to ManagedResource:
/**
Converts a NamedList<Object> into a sorted Map for returning as JSON.
*/
protected Map<String,Object> convertNamedListToMap(NamedList<Object> args) {
Map<String,Object> argsMap = new TreeMap<String,Object>();
if (args != null && args.size() > 0)
Unknown macro: { Iterator<Map.Entry<String,Object>> argIter = args.iterator(); while (argIter.hasNext()) {
Map.Entry<String,Object> entry = argIter.next();
argsMap.put(entry.getKey(), entry.getValue());
} }
return argsMap;
}
And then change the buildMapToStore method to call it:
toStore.put(INIT_ARGS_JSON_FIELD, convertNamedListToMap(managedInitArgs));
Also, previously, the ManagedResource would return default values of the initArgs, but the conversion over to NamedList<?> has made that go away somehow. In other words, this test now fails when doing a GET against the initial state of the stop word resource because the default value for ignoreCase is not returned with the initial state of the resource anymore:
assertJQ(endpoint,
"/wordSet/initArgs/ignoreCase==\"false\"", <<< This now fails
"/wordSet/managedList==[]");
I think the initial state of a resource should return the default values for initArgs if the default is not null so that this test case would pass.
If you can post an updated patch with these changes, I can finish the patches for SOLR-5655 and 5654.
Thanks.

I think we should serialize into a JSON Map when returning initArgs in the API[..]
To address this, can you add the following method to ManagedResource:
protected Map<String,Object> convertNamedListToMap(NamedList<Object> args)

Done. I changed to using an ordered map instead of a sorted map, though, to preserve intentional ordering (if any) from the NamedList.

[...]
I think the initial state of a resource should return the default values for initArgs if the default is not null

Agreed, I've put this functionality back in ManagedWordSetResource and added a test to make sure the default ignoreCase gets serialized.

Steve Rowe
added a comment - 11/Mar/14 23:51 Patch attached with requested changes attached.
I think we should serialize into a JSON Map when returning initArgs in the API
[..]
To address this, can you add the following method to ManagedResource:
protected Map<String,Object> convertNamedListToMap(NamedList<Object> args)
Done. I changed to using an ordered map instead of a sorted map, though, to preserve intentional ordering (if any) from the NamedList.
[...]
I think the initial state of a resource should return the default values for initArgs if the default is not null
Agreed, I've put this functionality back in ManagedWordSetResource and added a test to make sure the default ignoreCase gets serialized.

I plan on waiting until SOLR-5654 and SOLR-5655 (the first concrete implementations of RestManager-enabled functionality) land on trunk, and letting them soak a bit, before backporting this to branch_4x.

Steve Rowe
added a comment - 12/Mar/14 21:57 Committed to trunk.
I plan on waiting until SOLR-5654 and SOLR-5655 (the first concrete implementations of RestManager-enabled functionality) land on trunk, and letting them soak a bit, before backporting this to branch_4x.

Steve Rowe
added a comment - 13/Mar/14 18:25 In terms of documenting the REST API, would something like Swagger be useful: https://helloreverb.com/developers/swagger ?
Thanks Alexandre Rafalovitch , looks interesting. Today somebody sent me a link to an interview with Reverb's CEO, where he talks about the evolution of Swagger among other things: < http://techcrunch.com/2014/03/12/founder-stories-when-it-comes-to-open-source-technologies-reverbs-tony-tam-has-a-word-for-it/ >

Further on documentation. to be generically editable, does this require self-documenting plugins? This issue was discussed a little bit in SOLR-5287 (Jan's entry on 30 Sep 2013) with an example syntax.

Right now, we seem to be relying a lot on magic keywords self-knowledge. But that seems to be more and more fragile as we are moving into UI-driven configuration.

Alexandre Rafalovitch
added a comment - 04/Apr/14 12:08 Further on documentation. to be generically editable, does this require self-documenting plugins? This issue was discussed a little bit in SOLR-5287 (Jan's entry on 30 Sep 2013) with an example syntax.
Right now, we seem to be relying a lot on magic keywords self-knowledge. But that seems to be more and more fragile as we are moving into UI-driven configuration.