Improve JcrResourceResolver#resolve performance when big number of vanityPath are present

Details

Description

At the moment the performance of JcrResourceResolver#resolve is tight with the number of sling:vanityPath present in the repository.
Large number of vanityPath means large response time specially in the worse case scenario (namely huge number of vanityPath and request that doesn't match any vanityPath) but also in the average cases.
Sling currently employs generic regexps also for vanityPath, but since the regex behind a vanityPath is well know there is room for optimization.
I'll attach a graphs that shows the situation and a potential patch.

Thanks Antonio for all the hard work! I've committed a slightly different version which is based on your ideas in revision 1300543

The vanity paths are added under their path to a map - when the map entries are queried from the resource resolver during resolving, an iterator is returned which internally iterates over the entries from the map. The iterator ensures that the sort order is respected without creating temporary arrays.

The tests and integration tests run fine with these changes, but I'll leave this issue open for further comments and tests.

Carsten Ziegeler
added a comment - 14/Mar/12 13:46 Thanks Antonio for all the hard work! I've committed a slightly different version which is based on your ideas in revision 1300543
The vanity paths are added under their path to a map - when the map entries are queried from the resource resolver during resolving, an iterator is returned which internally iterates over the entries from the map. The iterator ensures that the sort order is respected without creating temporary arrays.
The tests and integration tests run fine with these changes, but I'll leave this issue open for further comments and tests.

avoiding the loop.
Overall I do agree with your comments. The best approach would be a complete redesign in order to have a more clean approach.
About the two different maps I am not sure if you have noticed that the semantic for the #getResolveMaps method is the same since it merges and sorts the two maps.

As you said though this is really in the heart of Sling and any change here should be done really carefully. A more extensive test coverage might help here (but I must say the existing one for this area is pretty good).

Antonio Sanso
added a comment - 21/Nov/11 15:16 Thanks a lot for your comment Carsten.
Yes this
if (this.factory.getMapEntries().getVanityMaps().containsKey(getPathVanityKey(absPath))){
is supposed to be by "design" in order to match with
..
vanityMaps.put(pVanityPath,entries);
..
I have spot a bug in the #getPathVanityKey method though but can be easily corrected...
Moreover since I have created the patch I have noticed that this could be further optimized
This
if (this.factory.getMapEntries().getVanityMaps().containsKey(getPathVanityKey(absPath))){
for (MapEntry mapEntry : this.factory.getMapEntries().getResolveVanityMaps()) {
could be replaced with
if (this.factory.getMapEntries().getVanityMaps().containsKey(getPathVanityKey(absPath))){
MapEntry mapEntry= this.factory.getMapEntries().getResolveVanityMaps().get(getPathVanityKey(absPath))
avoiding the loop.
Overall I do agree with your comments. The best approach would be a complete redesign in order to have a more clean approach.
About the two different maps I am not sure if you have noticed that the semantic for the #getResolveMaps method is the same since it merges and sorts the two maps.
As you said though this is really in the heart of Sling and any change here should be done really carefully. A more extensive test coverage might help here (but I must say the existing one for this area is pretty good).

Thanks for your patch, Antonio! It definitely goes into the right direction, however I have some remarks

Is it by intention, that you do

if (this.factory.getMapEntries().getVanityMaps().containsKey(getPathVanityKey(absPath))){

in the JcrResourceResolver? Shouldn't it be requestPath instead of absPath?

For the whole checking of the mapping, we currently favour simple implementation over performance: everything is checked in the same way by using regexps. With your patch I think we're just going half of the way as still the old code is in place (more or less) but with additional checks to get a better performance.
I'm not 100% sure and have to further check, but I think the better approach would be to change the implementation completely and only use regexps where it really makes sense - so we would distinguish between vanity mappings where we really don't need regexps (at least not for each and every vanity path) and really regexp based matchings. Not sure if this is going to fly though....
The other potential problem I see atm is that the current implementation loads configurations and vanity paths into a single map and then sorts them - with your change, you don't mix these two anymore which might be a change in behaviour.

This mapping is at the heart of Sling and right now I feel not very confident in changing this in the described way. But regardless of my comments, really great work so far, Antonio! I'll further investigate to see what we can do.

Carsten Ziegeler
added a comment - 16/Nov/11 07:57 Thanks for your patch, Antonio! It definitely goes into the right direction, however I have some remarks
Is it by intention, that you do
if (this.factory.getMapEntries().getVanityMaps().containsKey(getPathVanityKey(absPath))){
in the JcrResourceResolver? Shouldn't it be requestPath instead of absPath?
For the whole checking of the mapping, we currently favour simple implementation over performance: everything is checked in the same way by using regexps. With your patch I think we're just going half of the way as still the old code is in place (more or less) but with additional checks to get a better performance.
I'm not 100% sure and have to further check, but I think the better approach would be to change the implementation completely and only use regexps where it really makes sense - so we would distinguish between vanity mappings where we really don't need regexps (at least not for each and every vanity path) and really regexp based matchings. Not sure if this is going to fly though....
The other potential problem I see atm is that the current implementation loads configurations and vanity paths into a single map and then sorts them - with your change, you don't mix these two anymore which might be a change in behaviour.
This mapping is at the heart of Sling and right now I feel not very confident in changing this in the described way. But regardless of my comments, really great work so far, Antonio! I'll further investigate to see what we can do.

Antonio Sanso
added a comment - 25/Oct/11 15:33 attaching potential patch.
The patch is an attempt to avoid to loop through the vanityPath when a vanityPath doesn't exist (leveraging the predictability of the vanityPath regex).

Antonio Sanso
added a comment - 25/Oct/11 15:21 the graph shows the response in ms vs # of nodes (with and without vanity path) of the worse case scenario described in the ticker for three different situation:
normal resolve
resolve having vanitypath (one vanityPath per node)
resolve having vanitypath (one vanityPath per node) with the patch applied