Jackrabbit Oak / AEM6

Opposed to JR2, in Oak almost nothing is indexed by default. Which means that if you would take a vanilla Oak and run a query, you have very good chances you’re going to traverse the repository (depending on your query).

This has the advantage that you can create very dedicated indexes that will overall perform better as they will be as tailored as possible to your query.

The disadvantage are that you’ll have to define each index and that you’ll have to know how fine tune your queries for getting the most out of this approach.

Not going deeply into the configuration of each individual available index type I think the two main properties, you’ll end-up tuning for better performances are

propertyNames

declaringNodeTypes

the first one will define what property your index is going to index while the second will restrict the index to a specific node type. In other words the condition for a node to be included into an index are

$nodetype in ($declaringNodeTypes) AND $property = $propertyNames

caveats

indexes on more than one property are not supported (yet)

an index cannot serve conditions where you ask something like WHERE property IS NULL.

This take us to the very topic of this post: be careful on how you use your property or structure your queries.

Remember the rule: the smaller the index the more efficient the query.

Let’s see how important is a property and a node type with an example then.

If you have a custom application in which you want to extract nodes after a specific date, a way of doing so would be

SELECT * FROM [nt:base]
WHERE [jcr:lastModified] >= CAST('...' AS DATE)

this query is very bad. It can’t really makes use of any index.

Let’s say you create an index on jcr:lastModified. The index itself will be almost as big as the repository as by default in AEM (almost?) every node as mix:lastModified.

A better way would be

SELECT * FROM [nt:base]
WHERE [myLastModified] >= CAST('...' AS DATE)

this will allow you to define an index on the property mylastModified which you’ll know it will contain only your application data. But we can get even better.

Let’s assume you have a very sparse and large content structure so you can’t apply path filters and you don’t want on the other side to create tons of myLastModified for addressing different aspects of your information.

Let’s assume then, for sake of example, that you categorise your data into:

comments

news

articles.

What you could do is create three different node types:

my:comments

my:news

my:articles

now you can define three different, very dedicated indexes

declaringNodeTypes = my:comments AND propertyNames = myLastModified

declaringNodeTypes = my:news AND propertyNames = myLastModified

declaringNodeTypes = my:articles AND propertyNames = myLastModified

One eventual query will look like

SELECT * FROM [my:comments]
WHERE [myLastModified] >= CAST('...' AS DATE)

Actually in the example above, assuming your nodes comes with mix:lastModified, as soon as you create a custom node type you could have simply used the jcr:lastModified date as they will be (I expect) the same size. You can change the exercise above with any property name like: colours, size, tags, etc.