Resource lookup is slow

Description

When doing e.g., a PROPFIND on a large collection, performance is not good. This is because the children of the targeted resource are found using twisted.web2.server.locateResource. That method travserses from the site.resource down to the child for each look. Given that we know the parent resource and its URI there is no need to do the whole traversal when all we want are the child resources.

The attached patch adds a new locateChildResource method that starts with the known resource andworks down - its much faster. The patch also includes a new URL->resource map saved in the samway that the resource->URL map is. This helps avoid the doing a full lookup each time the same resource is looked up within the scope of one request.

The change to cache lookups in locateResource probably needs to be thought through a little more. While it does make some sense that, in the context of a single request, looking up the same thing twice should produce the same result, that isn't going to be the behavior currently. Of course, usually we only look up resources once per request...and that doesn't even go through the locateResource function.

Are there other times when resource lookups may be done? If so, do they all want caching behavior? Maybe so, I'm not sure.

I can clearly see the reason for the locateChildResource addition, that is solving an obvious deficiency. I'm less clear on the reason for the caching url->resource. What calls that repeatedly?

This change is clearly wrong:

- segments = path.split("/")
+ segments = unquote(path).split("/")

resourcesByURL and urlsByResource are named backwards. xByY means to me you get an x when looking up by a y.

url->resource caching is new here; the inverse cache already exists in trunk. One case in which it is looked up frequently is in implementing WebDAV ACLs, where we need to know a resource's URL in order to evaluate ACL rules. One example it in access control entries which are inherrited from another (typically a parent) resource. As a result, in a PROPFIND request, the same resources are looked up many times in a single request.

url->resource caching is new here; the inverse cache already exists in trunk

Right. The only part of my comment aimed at the inverse was that it's named wrong. And yes, it's named wrong in trunk too. :)

But, as to the rest, I still don't understand. Maybe if you point me at the code in question it'd be clearer.

I get the need for looking up child resources given a resource in hand, but I don't get the need to look up resources lots of times from the top (except that right now, without locateChildResource, that's the only way it can be done)

It is possible for a "/" character (0x2F) to be quoted in the path. e.g. '/test/left%2Fright/child'. Doing unquote before split would result in segements: 'test', 'left', 'right', 'child', which is wrong. The unquote must be done on the segments AFTER the split.

First off, if twisted/web2/server.py needs something in twisted/web2/dav/util.py then that something needs to go somewhere else. In this case it's joinURL. Which goes back to #1569. We should have a standard way to do this, preferably sooner rather than later.

I'm a little wary of the name locateChildResource and the fact that it's placed on the request. However the former is completely accurate description it just seems like a potential source of confusion with locateChild. As to the latter it is nothing if not consistent, and moving it to the resource would just cause confusion.

The test coverage is a big improvement. The branch is currently building on all the buildbots that were green when I started this review. I'll update when they finish.

Also should urlForResource raise an exception instead of returning None? It should almost always only return None if someone has instantiated and called urlFoResource on a resource that was not "located"

As a side note, the idea that a resource must always be "located" is starting to bother me more and more.

A resource much be located in order for us to know a URL for it. I'd love to have other options there, but web2's architecture doesn't otherwise give resources any knowledge of what their URL(s) are. And, it only makes sense to know a URL in the context of a request, because locateChild() can present a whole different URL hierarchy based on the request.

locateChildResource limited to immediate children because I don't need it. As soon as someone does need it, they are welcome to change childName to *children and do the recursion. Nothing here makes that impossible to do, but I don't think adding API before you need the functionality is a requirement.