After upgrading from 7.0.52 of Tomcat to 7.0.54 we found that our application was now returning 404 resource not found errors when the request uri starts with //.
eg. We have an embedded server created and started something like:
org.apache.catalina.startup.Embedded embedded = new Embedded();
org.apache.catalina.Engine engine engine = embedded.createEngine();
engine.setName("");
embedded.setContainer(engine);
embedded.addEngine(engine);
...
String startPathContextRoot = "c:\website\data\startPath";
org.apache.catalina.Context startPathContext = embedded.createContext("/startPath",startPathContextRoot);
embedded.start()
Then a request to http://host:port//startPath returns 404.
Whereas at Tomcat 7.0.52 it returns what we would expect from a request to
http://host:port/startPath.
The same behaviour is seen with requests to extended URLs eg:
http://host:port//startPath/anotherPath.
where they end up at the servlet as expected with 7.0.53 and not with 7.0.54
Debugging this a bit I found that the problem was introduced at 7.0.53 and by the changes under
https://issues.apache.org/bugzilla/show_bug.cgi?id=56501
which for Tomcat 7 were revision
http://svn.apache.org/viewvc?view=revision&revision=1594028
If I run our app without these changes in at 7.0.54 then it works fine.
Looking at the changes in the revision I saw some tests were added and so I tried adding some new tests to tomcat/trunk/test/org/apache/catalina/connector/TestRequest.java which I think example the problem:
@Test
public void testBug56501p() throws Exception {
doBug56501("/path", "//path", "/path");
}
@Test
public void testBug56501q() throws Exception {
doBug56501("/path", "//path/", "/path");
}
@Test
public void testBug56501r() throws Exception {
doBug56501("/path", "//path/bob", "/path");
}
@Test
public void testBug56501s() throws Exception {
doBug56501("/path", "//path/bob/", "/path");
If I run these at 7.0.53 they pass.
and running at 7.0.54 they fail with:
Testcase: testBug56501p took 0.307 sec
FAILED
expected:</[path]> but was:</[]>
Testcase: testBug56501q took 0.275 sec
FAILED
expected:</[path]> but was:</[]>
Testcase: testBug56501r took 0.246 sec
FAILED
expected:</[path]> but was:</[]>
Testcase: testBug56501s took 0.32 sec
FAILED
expected:</[path]> but was:</[]>
I can try and create this with a simple servlet/setup if required if the test additions are not enough.
David

This is going to get messy.
The Javadoc for HttpServletRequest.getContextPath() says the container should not decode the returned value.
Where this gets 'interesting' is when the URI is not normalized and is encoded. For example, what gets returned for a request to "%2Ffoo%2F%2E%2E%2Fpath"?
Is it:
"%2Fpath" ?
"%2Ffoo%2F%2E%2E%2Fpath" ?
Something else?
We know (from the mapper) how many '/' characters to include in the context path. The current approach of simply searching that many '/' characters down the request URI ignores issues of normalization and encoding. Doing that counting in a normalization and encoding aware manner is probably the answer but that is non-trivial to say the least.
Fixing this bug might not solve the problem you are seeing - particularly since the unit tests you provided are using the incorrect value for the expected context path. You should probably be using ServletContext.getContextPath().

(In reply to Mark Thomas from comment #1)
> For example, what gets returned for a request to
> "%2Ffoo%2F%2E%2E%2Fpath"?
RFC7230 2.7.3. "http and https URI Normalization and Comparison" says about http and https URIs:
...
such URIs are normalized and compared according to the
algorithm defined in Section 6 of [RFC3986]
http://tools.ietf.org/html/rfc3986
RFC3986 2.3. Unreserved Characters [1]
Characters that are allowed in a URI but do not have a reserved
purpose are called unreserved. These include uppercase and lowercase
letters, decimal digits, hyphen, period, underscore, and tilde.
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
URIs that differ in the replacement of an unreserved character with
its corresponding percent-encoded US-ASCII octet are equivalent: they
identify the same resource. However, URI comparison implementations
do not always perform normalization prior to comparison (see Section
6). For consistency, percent-encoded octets in the ranges of ALPHA
(%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
underscore (%5F), or tilde (%7E) should not be created by URI
producers and, when found in a URI, should be decoded to their
corresponding unreserved characters by URI normalizers.
RFC3986 6.2.2.2. Percent-Encoding Normalization
The percent-encoding mechanism (Section 2.1) is a frequent source of
variance among otherwise identical URIs. In addition to the case
normalization issue noted above, some URI producers percent-encode
octets that do not require percent-encoding, resulting in URIs that
are equivalent to their non-encoded counterparts. These URIs should
be normalized by decoding any percent-encoded octet that corresponds
to an unreserved character, as described in Section 2.3.
So it looks that RFC3986 says to url-decode the above listed "unreserved" characters before performing normalization, but only them.
"%2Ffoo%2F%2E%2E%2Fpath" becomes "%2Ffoo%2F..%2Fpath" but nothing more as %2F is not decoded.
In regards to r1640083 the "canonicalContextPath.equals(candidate)" comparison looks fragile.

Worth noting here that we have the system property org.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH
Regarding the fragility of canonicalContextPath.equals(candidate), better suggestions welcome.

(In reply to Mark Thomas from comment #4)
>
> Regarding the fragility of canonicalContextPath.equals(candidate), better
> suggestions welcome.
The code that was added to Request class is located far from the code that performs decoding and mapping (CoyoteAdapter) and one that performs URL-decoding (UDecoder) and it is hard to compare those and keep in sync.
Comparing the code highlighted an issue -> 1.
1. Using UDecoder.URLDecode(candidate) + canonicalContextPath.equals(candidate) is broken, as URLDecode() without second argument uses ISO-8859-1 charset. The equals() may return false.
2. Move the code to CoyoteAdapter.postParseRequest(). Evaluate the value there only once.
3. In unexpected situations, error out (400) instead of falling through.
4. Maybe add an utility methods to UDecoder to search for next decoded '/' in a ByteChunk?
5. In CoyoteAdapter.postParseRequest() when decodedURI.getType() is not bytes (e.g. when requestURI is changed by RewriteValve), normalization is skipped. I think that it should not be skipped.

I've fixed the issue identified in 1.
Regarding 2, that would cause the code to be executed for every request when it is only likely to be used for a small percentage.
3 makes sense if we do 2 but I don't think 2 is the way to go.
4 I'm neutral on.
5 I believe that was a deliberate implementaion decision. I don'tthink we need to revisit it as part of this unless suggestion 2 is followed.
The key issue is whether or not to follow suggestion 2. I'm currently leaning towards not because of performance but am prepared to be convinced otherwise.

Any reason not to have code available that does (2) but in a lazy way? That is, a utility method that can do the work to produce the result and also cache the value in the request in case it's requested again? Then, only call that utility method when the value is actually needed?

(In reply to Konstantin Kolinko from comment #5)
> 5. In CoyoteAdapter.postParseRequest() when decodedURI.getType() is not
> bytes (e.g. when requestURI is changed by RewriteValve), normalization is
> skipped. I think that it should not be skipped.
Skipping url-decoding step is also wrong. If RewriteValve provides a non-encoded requestUri, it means that there is a bug in RewriteValve.
Web Application should assume that requestURI needs url-decoding. It cannot find out that url-decoding shall be skipped. Implementation of Request.getContextPath() in r1640083/r1642766 is an example of a victim of this bug. It always performs url-decoding.
>> 3. In unexpected situations, error out (400) instead of falling through.
>
> 3 makes sense if we do 2 but I don't think 2 is the way to go.
I do not like that Request.getContextPath() falls through to returning requestUri. It may result in security issues.

This is ASF Bugzilla: the Apache Software Foundation bug system. In case
of problems with the functioning of ASF Bugzilla, please contact
bugzilla-admin@apache.org.
Please Note: this e-mail address is only for reporting problems
with ASF Bugzilla. Mail about any other subject will be silently
ignored.