We need a federating connector that has a much more sophisticated mapping capability than the current one has.

I have been making some brief notes about what we might need (included below) and am interested in any suggestions. You will notice from the notes below that a prime requirement is the ability to easily specify where a document is stored and the ability to relocate it to another location. As mentioned in earlier posts, my application needs to store large numbers of documents with storage requirements in excess of 20TB. In a production environment you need control over where the documents are physically stored and you need to be able to change your mind later and move them around (management tools will be required in the long run to view and manage this).

I have already written one readonly connector (prototype) for ModeShape over the last few weeks that connects a legacy store that uses a proprietary database and filesystem layout. It was a bit of a steep learning curve but I got there in the end. I would be interested in any comments on the best approach to an implementation of a new federating connector based on the ideas below. Start with the existing one or just start from scratch?

My timeframe is longish so I'd be aiming to do this in version 3.0.

The idea is to introduce a layer, like the current federating connector, that delegates the actual storage to underlying connectors of all types (filesystems, DB, other JCR implementations, NoSQL connectors, etc.) but has a very flexible mapping scheme so that a unified tree above can be arbitrarily split across multiple storage locations.

Requirements are

map any subtree within the document hierarchy to a particular connector based on the metadata at the root node of the subtree.

ability to do this mapping at any level, such that a subtree is on one connector but child subtrees of that subtree are on other connectors and so on.

able to move any subtree to a different connector - at runtime and transparently to the application above.

mapping algorithms should be plugable to allow the introduction of new mapping schemes (not necessarily at runtime though)

a map algorithm selects a node to be mapped and supplies a destination.

all child nodes of a mapped node are assumed to map to the same destination unless selected by a different algorithm

this connector could use an additional attribute on each node that determines the mapping.

The value if not specified is inherited from the parent

an operation that changes the value of this attribute results in a move of the node.

the move is transactional if the source and destination connectors are transactional

this might make bulk moves expensive

If a mapping attribute is used, you could introduce an extra layer above this that maps based on the document metadata by adding the extra attribute. This could be the pluggable mapping algorithm.

(I'll just keep talking to myself until people are back from the holidays :-)

To satisfy my requirements for control over where entries are physically stored I think that modeshape will need to support multiple infinispan caches and have some mechanism to map any subtree into a particular cache.

If I read the infinispan doco correctly, you really only have a single cacheloader per cache. You can have multiple, but for reading they are queried one at a time until one returns a value and writes are applied to all of the configured cacheloaders.

So is modeshape 3 going to have support for multiple caches and how are entries mapped into the cache?

Thanks for starting this thread. I also hope the new federation mechanism is much more capable, so it's great to hear your requirements.

Yes, the Next Generation ModeShape thread does talk about using Infinispan cache loaders as our new "connector" mechanism. And while I know that will work, we'd have to live within the constraints of the Infinispan cache loader API, and that has implications (some of which you mention above).

I think there's a better way. Have a read, and let me know what you think.

Stored and External Content

Although to the JCR client application all content should look like regular content, internally ModeShape still needs to distinguish between "stored content" and "external content". Stored content is owned and must be managed accordingly, while external content need only be cached. Additionally, stored content should only ever be accessed by ModeShape, while external content will almost certainly be accessed by the other applications without going through ModeShape. Federation always involves accessing external content and "projecting" it into the stored content. And, any user-supplied content that can't be pushed to the external systems must be stored.

If we were to replace the 2.x connector API with the Infinispan Cache Loader API, we'd lose the distinction. Yes, it would work, but it's not ideal.

Instead, I think we should just use the cache loaders as they are intended (as a way to persist/externalize the values placed in an Infinispan cache), and we should have components that interface with external systems. We can still call them "connectors", but their role would be different (they'd only be accessing external content) and it would be a completely different API. "Federation" then becomes the use of connectors to project external content into the stored content.

Configuring federation

I, too, thought about using properties and mixins on nodes to define how the federation is done. Like you said earlier, this would be significantly more powerful and extensible. But it likely is also something we can make pretty efficient - we could quickly determine if federation is needed on a node, and in most cases completely skip any federation overhead.

My idea was to define a federation namespace (e.g., with a "fed" prefix) that would define a "fed:federated" mixin with a few non-residual properties that would describe the federation rules for the node's children (and properties?). I think this isn't too far from what you suggested above.

But there are a lot more details to work out.

Should the federation metadata (these extra properties and/or mixins) be visible to all users, or perhaps to just administrators? I can see benefits to both options. But if we wanted to hide them (or let people configure a repository to hide them), then doing so would have to be efficient, and this may influence our design. (For example, it might be pretty efficient to define a set of namespaces such that properties with names in those namespaces are only visible to users with administrative privileges.)

Does the metadata fully define the connector, or does it only provide additional information used by the connectors? For example, a file system connector could be configured entirely through metadata, but other connectors might not. Is the metadata just used to say which connector should be used?

If a node is marked as being federated, are any of its properties federated or are only its children federated?

Binary content

We recently added a feature to 3.0 that decouples how binary values are stored. We've started with a file system store since that's easiest and is also good for very large files (>100MB), but we'll add implementations that store them in an Infinispan cache (that can be configured to use the same cache used for content or a different cache altogether) or even a database.

We'll probably want to figure out a way for each connector to optionally provide its own binary store that should be used to persist binary values within the connector.

I hadn't understood the distinction between stored and external content.

In my case, for a new installation, all content would be stored. We are not planning on having the content accessible via other means. That said, we also have the case of legacy data stored in the filesystem and some database tables (for which I wrote a 2.6 connector) and I suppose that this can be seen as external data.

I'm not sure I understand where the new type of connector sits. Would infinispan caching still be used for the external data? I assume that (at least in part) it is infinispan that gives you the performance benefits you mentioned in the Next Gen article. How is the new type of connector different from a cache loader under infinispan?

I think that the configuration of the connector needs to be independent of the metadata. The metadata needs to fully define the selection of a connector but not the configuration of that connector. The reason being that there could be multiple independent subtrees stored into a single connector not just a single subtree and its descendents. Similarly a tree of nodes could be stored into multiple connectors (up to one per node!) as you traverse down to the leaves.

I would think that if a node is marked as federated then it, it's properties and children would all be federated unless any of the children are marked as being federated somewhere else.

I hadn't really thought about binary content and my only requirement would be that however it is done it needs to be transactional.

Brian, thanks again for the discussion. BTW, I've only really started using the distinction between 'stored' and 'external' content in the last month. There was no mention of it in the earlier NG ModeShape post.

In my case, for a new installation, all content would be stored. We are not planning on having the content accessible via other means. That said, we also have the case of legacy data stored in the filesystem and some database tables (for which I wrote a 2.6 connector) and I suppose that this can be seen as external data.

I would say that if it's not stored in the main Infinispan cache(s) used by a repository, then I'd call it external. So for example, in 2.x the in-memory connector, the JPA connector, the disk connector, the Infinispan connector and the JBossCache connector are all examples of "storage" connectors, because whatever data they manage (store) is only meant to be accessed through ModeShape and not directly by other third-party applications. In other words, they "own" and are wholly in charge of the data.

In 2.x, the other connectors are examples of connectors that deal with what I'd now refer to as external content: file system connector, JCR connector, JDBC metadata connector, SVN connector. Data in all these are easily accessible/changeable from other applications. If your database connector accesses content from a database's tables that are or can be used by other appplications, I'd consider your connector in this bucked, too.

I'm not sure I understand where the new type of connector sits. Would infinispan caching still be used for the external data? I assume that (at least in part) it is infinispan that gives you the performance benefits you mentioned in the Next Gen article. How is the new type of connector different from a cache loader under infinispan?

Well, it's still only an idea at this point. To get a better understanding of why cache loaders are difficult to use, let's talk about how ModeShape 3 uses Infinispan.

ModeShape 3 places into an Infinispan cache a single entry for each JCR node. That entry is keyed by a string key (which you can think of as containing in part a UUID, tho this is not always the case), and the value is a JSON/BSON document object (using a class of our own creation). The document class has a marshaller that Infinispan uses when it needs to serialize or deserialize the data (note that Infinispan does not rely upon Java serialization), and our marshaller serializes and deserializes to and from the BSON binary format.

Now, each Infinispan cache can be configured a couple of different ways.

Keep everything in-memory - Each entry is distributed or replicated across the grid. IMO this offers a very intriguing performant but faul-tolerant option, since multiple copies of each entry can be effectively "backed up" across multiple machines and data centers and sites. In this case, Infinispan is a full-tilt data grid.

Keep most things in-memory, but persist only what's not in-memory - Most (recently-used) entries would be kept in-memory, but other entries (likely those that haven't been used in a while) would be persisted using a cache loader. In this case, Infinispan is still more like a data grid than a traditional cache.

Persist all entries and keep a subset in-memory - Again, whichever processes on which a value is distributed/replicated, uses the cache loader to persist every value outside of the processes memory. Only some of those that are needed are also cached in-memory. In this case, Infinispan is more like a traditional cache.

So if we were to use cache loaders for dealing with both "stored" and "external" data, we certainly could use different caches for each, and we'd have one cache for each "external" store (so they could use cache loaders with different configurations). But let's think about how that might work.

Consider what a cache loader "sees" when it accesses a particular node. The cache loader is asked to get the bytes for the node with a particular key, and it simply finds and returns the stream bytes for that node. What about persisting? Well, the cache loader is asked to "put" the node with a particular and a stream of bytes, and the cache loader merely streams those bytes into the appropriate slot in whatever its using for storage.

Now consider what might happen if we used cache loaders to access an external system. The cache loader is asked to get the bytes for the node with a particular key - and that's all the insight the cache loader has. How is it supposed to know how to access the external system? Sure, it could interrogate the key and assume there's information encoded in the key, but this is quite complicated. How about when the cache loader is asked to "put" (e.g., persiste) the node with a particular key and a stream of bytes? Now the cache loader has to re-materialize the JSON/BSON document from the streamed data before it can look at the properties stored in that node's document and do something with it. So while this step is feasible, it's not efficient.

So implementing a cache loader that is "dumb" is very straightforward, and but it's a lot harder to implement a "smarter" cache loader simply because the cache loader has to do more work, or because not all information might be available. "Storage" cache loaders are trivial, and in fact we can reuse all of Infinispan's existing cache loader implementations. Cache loaders that access "external" systems are (very) hard.

My current thought is that we can continue to use cache loaders as they were intended, but that we can (re-)introduce the notion of a connector to external data above the Infinispan layer. And that connector can have access to the node object itself, meaning it can leverage the node's properties (and even children, if that's useful) to figure out how to interact with the external system. We can even cache the result in a different Infinispan cache for a short (but configurable) period of time, so we don't have to keep going back to the external system every time it's accessed.

I think that the configuration of the connector needs to be independent of the metadata. The metadata needs to fully define the selection of a connector but not the configuration of that connector. The reason being that there could be multiple independent subtrees stored into a single connector not just a single subtree and its descendents. Similarly a tree of nodes could be stored into multiple connectors (up to one per node!) as you traverse down to the leaves.

I would think that if a node is marked as federated then it, it's properties and children would all be federated unless any of the children are marked as being federated somewhere else.

Good feedback. I think we might be on the same page here.

I hadn't really thought about binary content and my only requirement would be that however it is done it needs to be transactional.

This is probably another topic altogether (and this post is already way too long), but ...

Our new binary storage framework is actually not transactional per se. Instead, it immediately stores binary content as soon as you create a Binary value; a failure to store the binary content will result in an exception. Note this all happens before calling Session.save(). We don't need transactions for two reasons:

binary content is keyed by the SHA-1 of the content, so its independent of where it's being used.

even if the Session changes are aborted or not saved, the binary content will go used and will (eventually) get garbage collected by our framework.

I agree with this approach. In fact, when I first encountered connectors and federation, it seemed as if it was serving two fairly orthogonal purposes: a pluggable storage engine, and a way to provide a JCR view of existing data/systems. With the 3.0 architecture, it seems that Infinispan cache loaders will serve the purpose of the pluggable storage engine, but it's not clear (to me) how 3.0 will continue to be able to provide JCR views of existing systems. That does not seem to be the right use of Infinispan cache loaders.

Background: I need to manage millions of large (1-20GB) video files and associated metadata. The metadata is currently in an RDBMS, and the files are stored in several file systems/servers. The video must be stored in and accessible from the file system for a variety of reasons. For one they need to be accessible to media servers and editors. Second, due to the amount of data, some of the file systems are backed by a tape robot (the SAM-QFS file system), and thus must be stored there.

My approaches so far have been:

1) Use vanilla JCR for metadata, and store the URLS to the files as properties. The application is responsible for keeping JCR and the file systems in synch. All modifications must go through the application to enforce consistency. This is basically just replacing the RDBMS in our app with JCR.

2) Use Modeshape, storing metdata in a disk/JPA/infinispan connector, and the video files in the file system connector, and use federation and references to tie them together. After prototyping this, I found it to not be much better than (1).

3) Use vanilla JCR for metadata, and JCR observation of certain node types to reflect changes to JCR back to the file system. To keep things in synch in the other direction, I'm using file system monitoring (e.g. inotify on Linux) to detect changes in the files, and reflect that back into JCR.

So far, approach (3) has been the best. However, it is really just a two-way data synchronization, not true "federation". But, with a fast JCR observation implementation, and assuming the other system has some kind of monitoring/observation, this can go a very long way. If the other system happens to support XA, and your JCR supports JTA, then you have transactional synchronization....

Where it falls short, however, is a) when the other system is very dynamic, with a high volume of changes required to keep things in synch, and b) when you actually want to access binary data from JCR using Binary/InputStream. My scenario does not have these characteristics, but others might.

Regardless, I can envision an extension/integration facility in Modeshape that is, as discussed above, based upon node types. Some of the features of this facility might be:

Ability to override some methods on Item/Node/property, at least get/set property, to implement new/extended behavior.

Ability to store private state data both at the node and system level.

Extended nodes would be informed of changes to the node, ancestors, children, so the plugin could react accordingly.

Ability to trigger re-indexing of the node from the plugin, and to control whether a property or child is included in the index.

Ability to provide a plugin imlementation of Binary, and return instances of Binary as a property, to allow access to underlying data.

Some of this might be a duplication of what's already discussed in MODE-930.

Basing the extension on node type, not a subtree of the respository, allows the custom behavior to appear anywhere in the JCR tree, not restricted to a branch in the tree. (This is, IMO, a limitation of 2.x connectors/federation...)

I think we have a number of points being discussed here and it is a bit confusing. Lets see if I can separate them...

Modeshape 3.0 internal storage - the use of infinispan underneath as the (internal) storage engine. This is a single infinispan cache used to store the nodes of the document tree. Each node in the document tree is stored as a single cache entry with a unique key value that mostly resembles a UUID. The node data is stored as a document object in the cache entry value and this value can be serialised by a custom marshaller when required by infinispan.

Modeshap 3.0 external storage - pluginable software components somewhat analogous to the current connectors that can persist parts of a document tree in whatever manner they like.

Federation - the spliting of the document tree across multiple storage solutions one of which can be the internal infinispan storage. The destination storage engine for a given node may be based on additional attributes found in the node or in one of its ancestors.

Binary object storage in the filesystem

Is that about it?

Federation and internal storage

I've assumed above that there is a single internal storage represented as a single cache but I don't think this is how it should work and may just be my misunderstanding.

There should be the possibility for multiple caches representing multiple internal storage locations that are selected via the federation mechanism. Each of the configured caches could then have its own configuration and cache loader to distribute it or persist it as required. Each internal storage cache should also be able to have its own binary object storage.

So as far as federation is concerned it can select any external or internal storage location on a node by node basis. Federation should not be aware of any distinction between internal or external storage, it just selects one for a node and it's descendents.

As you may have gathered from my earlier posts, I need to have control over where the various nodes are stored. Our system is not a system that will ever have more than a few servers so the large distributed heap that infinispan can support is not a lot of use to us. When you have only two systems, distributed infinispan heaps don't make a lot of sense and keeping all the data in memory just isn't going to work.

I think we have a number of points being discussed here and it is a bit confusing. Lets see if I can separate them...

Modeshape 3.0 internal storage - the use of infinispan underneath as the (internal) storage engine. This is a single infinispan cache used to store the nodes of the document tree. Each node in the document tree is stored as a single cache entry with a unique key value that mostly resembles a UUID. The node data is stored as a document object in the cache entry value and this value can be serialised by a custom marshaller when required by infinispan.

Modeshap 3.0 external storage - pluginable software components somewhat analogous to the current connectors that can persist parts of a document tree in whatever manner they like.

Federation - the spliting of the document tree across multiple storage solutions one of which can be the internal infinispan storage. The destination storage engine for a given node may be based on additional attributes found in the node or in one of its ancestors.

Binary object storage in the filesystem

Is that about it?

I think that's a pretty good summary of the topics. So far, I've been talking more about topics #1, #2 and #4. But I generally agree that #3 is not reallly the same as #2 or #1.

One comment about the use of keys in #1 above: technically we're using strings for our unique keys that contain three parts: a "source" part, a "workspace" part, and some other identifier; usually we'll generate UUID for the identifier part, but it actually can be anything that's unique for the workspace. These unique key strings are what we use in Infinispan, so everything in them is available to the cache loaders. (BTW, we're currently using the first 7 characters of the SHA-1 of the source name for the key's source part of the key, and the first 7 characters of the SHA-1 of the workspace name for the key's workspace part.)

To date, I've not talked much about the "source" part, but it's there to give some flexibility to the system. For example, one possibility is to have a "federating cache loader" that simply wraps and delegates to other cache loaders, using each key's source part to determine which delegate class loader should get the call. Maybe this might be a straightforward approach to take for topic #3 above? We probably could use the node properties to determine the source part of a key, though it might have to happen after the fact (which I think is fine and likely fits one of your patterns where older things "moving" to a different storage location). We'd probably have to treat it (at least internally) as a "remove-and-replace" kind of operation: the key is effectively changing and the document would be removed from one cache loader and added to another. But I think it's might work.

I understand the workspace and unique ID parts, but what does the source part represent? Also, if you are only using the first seven chars of a hash of the source and workspace then it is not a guaranteed to be a unique identifier so is it useful for federation? In my view I would want federation based on the content or location in the tree of a node with guarantees about where a particular node is stored.

On the question of move, I agree that it is a remove and replace type of operation. I wouldn't expect it to maintain an internal unique key value as long as the external identifier for the node is maintained and since that is going to be an attribute of the node there is no reason it would change.

I understand the workspace and unique ID parts, but what does the source part represent?

I originally intended the source part to signify the key for the system where the node is stored, and is logically very similar to the name of the connector sources in 2.x. (Unlike 2.x, the source part is used within every node identifier.) So, for example, all the nodes stored in a cache named "Foo" would have a source part that is the SHA1 of "Foo". So if that's the only cache or store there is, then the source key is indeed a little superfluous. However, even with a single cache, we could implement a federating cache loader to direct different nodes to different wrapped cache loaders, effectively storing node in different locations based upon the store key. Or, if we had 3.x connectors, they would each be named and would assign the source part of their nodes with the SHA1 of their name.

Also, if you are only using the first seven chars of a hash of the source and workspace then it is not a guaranteed to be a unique identifier so is it useful for federation?

We're relying upon the characteristics of the SHA-1 algorithm: it's very evenly distributed, meaning that the first characters change even with small changes in the original string. We're trading off likelihood of collision (given the small number of source and workspace names used within a single repository) for smaller size (7 characters of the hex hash). And, in the off-chance that there is a collision in source names, the user can simply choose a slightly different name.

In my view I would want federation based on the content or location in the tree of a node with guarantees about where a particular node is stored.

We actually can set the source part of the key based upon properties. Obviously the property values would be set based upon the original source part of a node, but if those properties were changed, ModeShape could understand this as a change in the source part of the node, and then perform the operations required to move the node (and possibly its children) from one source to another.

I have a new question though, how does the version storage fit into all of this if versioning is enabled and used. The JSR talks about a repository wide version storage but I wouldn't think a single physical version store would be a good way to go with this. In our system the version store is likely to be as large if not larger than the current store (at least in number of nodes, maybe not in storage size). I would want to see the same federation control available over the version store that I would have on the current repository. There are probably a number of scenarios you would need.

Single repository wide version store independent of the federation of current node data. This is probably not a good idea.

Older versions of nodes are federated to the same storage area as the current node

For a given node the federation destination selection (however it works) would specify two different storage areas, one for current version of a node and one for older versions.

Federation for a node's current version storage and federation for older versions is specified independently.

I have a new question though, how does the version storage fit into all of this if versioning is enabled and used. The JSR talks about a repository wide version storage but I wouldn't think a single physical version store would be a good way to go with this. In our system the version store is likely to be as large if not larger than the current store (at least in number of nodes, maybe not in storage size). I would want to see the same federation control available over the version store that I would have on the current repository. There are probably a number of scenarios you would need.

Single repository wide version store independent of the federation of current node data. This is probably not a good idea.

Older versions of nodes are federated to the same storage area as the current node

For a given node the federation destination selection (however it works) would specify two different storage areas, one for current version of a node and one for older versions.

Federation for a node's current version storage and federation for older versions is specified independently.

others?

I think 3 might be a special case of 2, in that the connector would control it. Obviously the connector framework API would need to distinguish between nodes and version histories. But that brings up an interesting question: if a node N is actually stored in connector A, should the entire version history be owned by connector A? The alternative is that the version history for any node might contain versions stored in multiple places.

So the way version storage works now is that all version histories are stored under "/jcr:system/jcr:versionStorage/" in the "system" workspace, which is an internal workspace not visible to or directly accessible by Repository.login(). To the lower-level repository cache layer, these are merely just regular nodes. So if the repository cache layer is doing the federation, then its quite possible that the same federation properties could be used on the version history (and even version) nodes. In other words, it might just work! Of course, we'd have to make sure that (administrative) sessions are able to directly make changes to the nodes under "/jcr:system/jcr:versionStorage", but that's probably quite possible.

I think 3 might be a special case of 2, in that the connector would control it. Obviously the connector framework API would need to distinguish between nodes and version histories. But that brings up an interesting question: if a node N is actually stored in connector A, should the entire version history be owned by connector A? The alternative is that the version history for any node might contain versions stored in multiple places.

There might be cases where the version history location needs to be controlled independently of the current node. But directing a node and its version history nodes through the one connector that has knowledge of the distinction between history and current is probably sufficient for most if not all cases since the connector then has full control over the destination and could split the history up over multiple locations if it so desires.

So the way version storage works now is that all version histories are stored under "/jcr:system/jcr:versionStorage/" in the "system" workspace, which is an internal workspace not visible to or directly accessible by Repository.login(). To the lower-level repository cache layer, these are merely just regular nodes. So if the repository cache layer is doing the federation, then its quite possible that the same federation properties could be used on the version history (and even version) nodes. In other words, it might just work! Of course, we'd have to make sure that (administrative) sessions are able to directly make changes to the nodes under "/jcr:system/jcr:versionStorage", but that's probably quite possible.

Does this make sense?

It does and that might work. The cache layer can know the distinction between version storage and current storage based on the path to the node so it probably has all it needs to know to implement quite complex storeage federation schemes if it wants to.

I've just been reading Jochen Toppe's excellent JCR Deep Dive blog article at http://jtoee.com/jsr-170/the_jcr_primer which prompted me to think about the versioning. I can highly recommend this as a good introduction to the JCR and also as a refresher for those with some experience.

I've just been reading Jochen Toppe's excellent JCR Deep Dive blog article at http://jtoee.com/jsr-170/the_jcr_primer which prompted me to think about the versioning. I can highly recommend this as a good introduction to the JCR and also as a refresher for those with some experience.

It is a good tutorial, and is pretty good even for JCR 2.0 users. BTW, we're trying to expand our documentation, and we've organized our new docs such that we have a whole section on using the JCR API. There's not much there yet, but we'd love for people to contribute to it!

P.S. Looking forward to the first Alpha release :-)

We are, too! I'm trying to wrap up the query functionality as quickly as I can.