My company project require a Hierachical Datastore. Basically, the project is a Photo Album like Flickr which allow user to upload photo and store it using conventional file system in Windows (or Linux) and later will be used by the web presentation layer. The reason we choose to try Modeshape (JCR) is for storing the metadata of the photo stored inside the filesystem which allow for easy indexing/searching later. I have tried the Modeshape 2.2.0 using the following configuration following the Modeshape's UFOs example:

javax.jcr.RepositoryException: org.modeshape.graph.connector.RepositorySourceException: Primary type "UFOs" for path "nt:unstructured" in workspace "/" in workspace1 is not valid for the file system connector. Valid primary types are nt:file, nt:folder, nt:resource, and dna:resouce. at org.modeshape.jcr.SessionCache.save(SessionCache.java:412) at org.modeshape.jcr.JcrSession.save(JcrSession.java:1346)

*** if I try to add it at a folder named "FolderA" under root "/" ***

javax.jcr.nodetype.ConstraintViolationException: Unable to determine a valid node definition for the node "/{}FolderA/{}newfile.txt" in workspace "workspace1" of "JCR UFOs" at org.modeshape.jcr.SessionCache$NodeEditor.createChild(SessionCache.java:1565) at org.modeshape.jcr.AbstractJcrNode.addNode(AbstractJcrNode.java:1468) at org.modeshape.jcr.AbstractJcrNode.addNode(AbstractJcrNode.java:1344)

My problem is how do I add properties like "Author" to newfile.txt using Node.setProperty("Author","David") and later fetch it using Node.getProperty() ? I have tried the above setProperty and getProperty and all failed with the following exceptions:

javax.jcr.nodetype.ConstraintViolationException: Cannot find a definition for the property named 'author' on the node at '/newfile.txt' with primary type 'nt:file' and mixin types: [] at org.modeshape.jcr.SessionCache$NodeEditor.setProperty(SessionCache.java:1046) at org.modeshape.jcr.SessionCache$NodeEditor.setProperty(SessionCache.java:971) at org.modeshape.jcr.AbstractJcrNode.setProperty(AbstractJcrNode.java:1667)

Anybody can help out how do I create new properties to attach (as metadata) to the file (nt:file) ? I strongly believe that modeshape is a fantastic technology but only if we know how to use it properly . Thanks .

First, the graph API is a low-level API, and apart from the fact that it consists of very different interfaces and methods, the major difference between the graph API and the JCR API is that the graph API does not know about node types and therefore does no validation. It will happily set a "jcr:primaryType" to a date value. So, when using the graph API to create or update content that will be accessed by someone via the JCR API, you are responsible for ensuring that the content will be considered valid when coming from JCR. Some things, like default values, will appear within the JCR layer because of the validation work it does. But for the most part, the content set through the graph API should be what the JCR user expects to see. I think this explains why you were able to use the graph API without it complaining about validity.

Second, you didn't include any of your code that was creating the nodes, and by the errors I'm guessing that you were creating nodes with a primary type of 'nt:unstructured'. That's not really allowed by JCR when using 'nt:file' and 'nt:folder' nodes. All JCR implementations (even Jackrabbit and ModeShape) restrict what you can do once you enter the 'nt:file' or 'nt:folder' world, simply because the 'nt:file' and 'nt:folder' node types will restrict what kind of nodes, properties and children you're allowed to add. For example, let's say that you create a folder '/a/myFolder' of type 'nt:folder' as follows:

(1) Node a = session.getNode("/a");

(2) Node myFolder = a.addNode("myFolder","nt:folder");

I'm assuming that the primary type and mixin types of Node a allow a child of type 'nt:folder'; something like 'nt:unstructured' of course will.

Per the 'nt:folder' node type, the only children you can add under 'myFolder' are those with a primary type of 'nt:hierarchyNode' (the supertype of both 'nt:file' and 'nt:folder'). So you can do this:

since the 'nt:unstructured' node type does not extend 'nt:hierarchy'. In fact, you can't even do this:

(5) Node somethingElse = myFolder.addNode("somethingElse");

because the 'nt:folder' node does not define a default primary type for it's children. This error that you found is trying to say this:

javax.jcr.nodetype.ConstraintViolationException: Unable to determine a valid node definition for the node "/{}FolderA/{}newfile.txt" in workspace "workspace1" of "JCR UFOs"

at org.modeshape.jcr.SessionCache$NodeEditor.createChild(SessionCache.java:1565) at org.modeshape.jcr.AbstractJcrNode.addNode(AbstractJcrNode.java:1468) at org.modeshape.jcr.AbstractJcrNode.addNode(AbstractJcrNode.java:1344)

So you must specify the primary type as in line (3) above, and it must be either 'nt:file' or 'nt:folder' (or, the lone child of an 'nt:file' must be called 'jcr:content' and must have a primary type of 'nt:resource'). This is what the first error you listed is trying to explain (I think you're supplying "UFOs" as the primary type name (second parameter) to the 'addNode(...)' method:

javax.jcr.RepositoryException: org.modeshape.graph.connector.RepositorySourceException: Primary type "UFOs" for path "nt:unstructured" in workspace "/" in workspace1 is not valid for the file system connector. Valid primary types are nt:file, nt:folder, nt:resource, and dna:resouce.

at org.modeshape.jcr.SessionCache.save(SessionCache.java:412) at org.modeshape.jcr.JcrSession.save(JcrSession.java:1346)

The properties are also restricted based upon these node types, and neither 'nt:file' or 'nt:folder' allow you to add a property of any name - instead, there are only a few properties that are allowed. You can, of course, add mixins to each node where the mixins define other (residual or non-residual) properties and even children of other node types.

ModeShape's File System connector [1] is only able to store nodes with primary types of 'nt:file' or 'nt:folder', because it is essentially mapping every node onto a file or folder on the file system. In other words, every node in the repository (outside of '/jcr:system') backed by a FileSystemSource must have a primary type of 'nt:file' or 'nt:folder'. (Note that this connector is not trying to persist any content on the local file system; it is literally mapping one-for-one the files and folders under a certain location on your file system into 'nt:file' and 'nt:folder' nodes. If you want to persist any content on your local filesystem, I suggest using the JPA connector with HSQLDB, and configure HSQLDB to store it's data files on your file system.)

The File System connector also does not, out-of-the-box, allow you to store extra properties (defined via mixins) because it doesn't know where to store those extra properties. It does have an extension point that let's you define how to store and read those extra properties. This is the CustomPropertiesFactory [2], and using it is very simple (see [3] for an earlier discussion). Basically, the FileSystemSource will use your CustomPropertiesFactory to store and read those extra properties that your mixins would allow. The interface is pretty simple, and the JavaDoc does explain what each method is expected to do. Once implemented, simply set in your ModeShape configuration file the "customPropertiesFactory" property on the FileSystemSource to the name of your class. For example, using your configuration:

I wish we had a default implementation that simply reads/writes these extra properties to a file, using a particular naming convention and using the graph API's ValueFactories (accessible via getValueFactories() method on the ExecutionContext passed into the CustomPropertiesFactory methods) to serialize and deserialize the properties. We just haven't had the time to do it yet.

If you write one and want to contribute it, please do! I'd also be happy to answer any questions about how to write your own CustomPropertiesFactory implementation.

It seem like, an additional property "jcr:createdBy" will be attached automatically by the system when we are creating new nt:file or nt:folder type of Nodes and it will cause exception when we try to persist the nodes using session.save(). Things that I have tried is to manually remove the "jcr:createdBy" using ====> node.getProperty("jcr:createdBy").remove(); .Even though, now the "jcr:createdBy" is gone and can be removed from the memory (as shown by dump(root), but still causing the EXACTLY THE SAME exception above during session.save() as if it has never been removed before. That means "jcr:createdBy = admin" seem like appear back during session.save() which previously seem is GONE after Property.remove().

I will have to sort this out before moving on to CustomPropertiesFactory implementation. Thank you for your help.

Hmm... that's definitely a bug. Not sure why we're not seeing that in our integration tests. But the connector should be dealing with the 'jcr:createdBy' property much better, and your code shouldn't have to do a thing.

Would you care to file a new JIRA for this? I should be able to commit a fix into trunk in the next few days.

Just created the Jira [MODE-866] at https://jira.jboss.org/browse/MODE-866 . I also have attached all the source file used => Main.java and its configRepository.xml with Jira and also here. Thank for your help.

I just committed a fix to SVN in trunk. See MODE-866 for details (including the patch).

I've marked the defect as resolved, since I was able to replicate it in a new integration test, and after the fix the test passes without error. If you test this and still have problems, please reopen the issue.

Basically, by using CustomPropertiesFactory, we override the behavior of writting and reading of each Nodes' properties and by doing so, we manage to add extra metadata like "author" to a file as custom properties. Below is some metadata persistence strategy idea:

While this kind of persistence strategy may allow easy replication for scalability in multiple servers, will it suffer query performance issue ? let say, if i want to search for all files which the author is "danny". How could I twist it under Modeshape so that the node query performance will not be greatly affected?

Or should I store the metadata(node's custom properties) using high performance distributed Key-Value cache store like Infinispan (using Modeshape infinispan connector) and make my CustomPropertiesFactory to read/write the file's metadata from Infinispan while fetching the file's content using nt:file resource?

The CustomPropertiesFactory mechanism is only used for persisting those properties not already defined on "nt:file" and "nt:folder". This will thus affect the reading and writing of this information to disk, but generally these extra properties will be small and have very little impact on performance. Plus, like you suggest, it makes it very easy to replicate.

Executing queries never directly uses the connectors, but instead queries are executed by operating against internal Lucene indexes maintained by the engine. Certainly these indexes do need to be kept up to date as your content changes, but this is done automatically by ModeShape. Sometimes, this maintenance (initial populating or updates due to changes) may result in the engine reading the content via the connectors, and connector read performance will have an impact.

Thus, using CustomPropertiesFactory will have no impact on query execution, and only negligible impact on index maintenance (assuming the extra properties are small and inexpensive to read and write).