Using the protocol="HTTP/1.1" connector (Coyote)
After upgrading a site to Tomcat 7.0.73 from 7.0.72 or from anything earlier, a url with an unencoded { or } (ie. http://my.com?filter={"search":"isvalid"} ), now returns a 400 error code and logs the following error message:
"INFO: Error parsing HTTP request header
Note: further occurrences of HTTP header parsing errors will be logged at DEBUG level.
java.lang.IllegalArgumentException: Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986"
Resolution:
Since this is a breaking change (aka regression failure), there should be an option to override and turn this off (still reporting the first occurrence as shown above), so that any existing site which experiences this can choose to ignore this failure and continue as before, so they can deal with changing their application at a later date, if they deem the security risk is appropriate.
Defaulting the option to true (to enable the check) is perfectly fine, as long as there is an option in a server and/or application config file to disable it, and proper documentation on it.
Either this, or you clearly state in the release notes of 7.0.73, exactly what will break, and recommend that users do not perform the Tomcat update, if they are not ready to change their applications to comply, but I think this would open up an even bigger can of worms.
Instead of just saying:
"Add additional checks for valid characters to the HTTP request line parsing so invalid request lines are rejected sooner. (markt)" - this tells us nothing about the impending doom we may face.
But, I would recommend just giving us the option to decide for ourselves.

Given that using an unencoded '{' or '}' in a URL is contrary to the RFCs and that the fix that tightened the validation rules was in response to a security vulnerability (CVE-2016-6816) I think it is unlikely that an option will be introduced to make this validation optional.
It is quite likely that some sites could safely tolerate some characters. However, it is also likely that the 'safe' set of invalid characters will vary from site to site. That would therefore require a more complex configuration option than simply allowing or disallowing a fixed set of characters.
Those interested in proposing a patch should look at lines 74-78 of org.apache.tomcat.util.http.parser.HttpParser although I'll repeat I think it is unlikely such a patch would be accepted.
All that code is static which means configuration via system properties - something I'd prefer to see less of rather than more of in Tomcat.
For completeness, '|' seems to be another character that is fairly widely used in unecoded form when it should be encoded.
Finally, changes related conformance to the relevant RFCs and Java EE specifications are not treated as regressions. Therefore, I have moved this to an enhancement request.

Created attachment 34684[details]
patch proposal
In response to the numerous complaints on the users list I decided to give this a shot. I added a system property which contains a blacklist that's used for validation of request targets rather than the long if statement that was there. If a users needs to allow unencoded | characters then they can just remove it from the blacklist defined in the tomcat.util.http.parser.HttpParser.blacklist property.
If this looks good to everyone I can push it to whichever versions of tomcat we want to allow an option for.

Allowing some of those (e.g. space) is extremely dangerous and should not be allowed under any circumstances.
I generally dislike configuration via system property. That said, making this per Connector will be significantly more invasive.
Any proposed patch needs to include documentation. That documentation needs to include a very large, very clear warning the deviating from the default is a security risk.
If this feature is implemented, I'd prefer to see the option to allow illegal characters limited to a much smaller sub-set.

(In reply to Mark Thomas from comment #4)
> I generally dislike configuration via system property. That said, making
> this per Connector will be significantly more invasive.
I agree on both points. The system property seemed to be the least invasive way to achieve the desired result.
> Any proposed patch needs to include documentation. That documentation needs
> to include a very large, very clear warning the deviating from the default
> is a security risk.
Also agreed. Where would that documentation go?
> If this feature is implemented, I'd prefer to see the option to allow
> illegal characters limited to a much smaller sub-set.
Other than space, which characters should absolutely be excluded in all cases? I can create a secondary list containing those and programmatically add them if a user tries to remove them from the blacklist.
Also, my initial patch used a whitelist instead of a blacklist so that the system property was either commented out by default, or contained a few characters that were the exception to the rule. I inversed it to a blacklist to remove some logic and make it perform better; do you think that a whitelist would work better here? I can provide that patch also.

I think I prefer the whitelist option but I'd like to see it limited to - at this point - '{', '}' and '|'. Other characters can be considered on a case by case basis.
Documentation should go in the system properties section of the config docs although I'm still mulling over what a Connector config option might look like.

Created attachment 34694[details]
whitelist proposal limiting characters with docs
OK, here's an updated whitelist patch restricting the characters that are accepted to '{', '}', and '|'. I also included documentation for the property.
Let me know if that works better for you :)

Thanks for the updated patch. I like the overall design. Some detail comments:
- I think a different name is required. We might want to override other restrictions in the future. Maybe requestTargetAllow
- The docs need to state which characters are valid in the allowed list
- What to do if some other invalid character is placed on the allowed list. Log a warning?
- I'm still undecided on whether this should be per connector configuration
We also need to decide which versions to add this to. I currently thinking:
- 7.0.x - yes
- 8.0.x - yes
- 8.5.x - maybe
- 9.0.x - no

Created attachment 34698[details]
Updated patch proposal including a warning message for characters that aren't allowed
(In reply to Mark Thomas from comment #10)
> Thanks for the updated patch. I like the overall design. Some detail
> comments:
No problem.
> - I think a different name is required. We might want to override other
> restrictions in the future. Maybe requestTargetAllow
That makes sense.
> - The docs need to state which characters are valid in the allowed list
Agreed.
> - What to do if some other invalid character is placed on the allowed list.
> Log a warning?
I thought about that but since there isn't any logging there at the moment I let it go. I think it's a good idea to log a warning though, so I'll add that.
> - I'm still undecided on whether this should be per connector configuration
That would be nice, but I haven't dug into the code enough to be able to quickly provide a patch for it.
> We also need to decide which versions to add this to. I currently thinking:
> - 7.0.x - yes
> - 8.0.x - yes
+1
> - 8.5.x - maybe
I'd vote yes on adding the option to 8.5.x because the stable version is already out and the behavior has changed. We obviously don't want to continue allowing broken clients to work, but I don't think we can change this behavior in a stable version, as evidenced by the users list complaints :)
> - 9.0.x - no
+1
I also noticed that the property being parsed was including the quotes, so I changed the commented out example accordingly.

Coty, the patch looks good to me, can you please add the following chars to the list of allowed characters ?
'\"' (double quote)
'#' (sharp)
'<' (left angle bracket)
'>' (right angle bracket)
'\\' (backslash)
'^' (accent)
'`' (accent)
I think in some case I would need the "space" too, but Mark remarked that is would be very dangerous

OK Mark at this moment I'm running a patch in production to make all the characters allowed.
I have evidence only on troubles for curly braches and pipe characters so the patch looks good for me.
I will wait for the release of this patch in an official 8.5.x Tomcat version and deploy it to production.
In case I need further characters a will create a new issue
Thank you

Hi
We have found that we have problems with some characters that are not allowed in request URI and would like to know if any filter or valve can be applied to encode until clients get updated instead of responding with 400 Bad Request.
We have millions of clients (both android and ios) that needs to follow the RFC but it will take time but until then there must be some work around that can be used.
BR
Lulseged

(In reply to Lulseged Zerfu from comment #19)
> Hi
>
> We have found that we have problems with some characters that are not
> allowed in request URI and would like to know if any filter or valve can be
> applied to encode until clients get updated instead of responding with 400
> Bad Request.
No. The invalid request is rejected long before the execution reaches a Valve or Filter.
As described above, Tomcat 8.5.x and earlier have a configuration option to allow '{', '}' and '|'. If you want to add other characters to the possible whitelist values, you need to make a case for them.
> We have millions of clients (both android and ios) that needs to follow the
> RFC but it will take time but until then there must be some work around that
> can be used.
Running Tomcat behind a more lenient reverse proxy that encodes the invalid characters before the request is passed to Tomcat is another solution. You should be aware that generally, and for the same reasons Tomcat tightened request target parsing, other web servers will head in the same direction as Tomcat over time and start rejecting these requests.

Can anyone see any adverse affects to adding angle brackets to the whitelist? I have a customer that is using unencoded angle brackets around their session IDs in the URL which they can't change at this point and the CVE fix broke their application. If there aren't any adverse affects I'll add them to the list for my distribution, and to tomcat if anyone else needs them.

You mean '<' and '>' ?
There is always the risk that unexpected reverse proxy behaviour will trigger a CVE-2016-6816 like issue but that risks exists for any white-listed character that should really be encoded.
I don't see it affecting the URL parsing in Tomcat.
If the undecoded URL is used in any XML like output it is likely to break it. But any user that is using '<' and '>' will be facing that problem already.
They look to be higher risk in terms of breaking stuff, but not in a security sense.
+1 to your approach.

(In reply to Mark Thomas from comment #22)
> You mean '<' and '>' ?
Yes.
> There is always the risk that unexpected reverse proxy behaviour will
> trigger a CVE-2016-6816 like issue but that risks exists for any
> white-listed character that should really be encoded.
>
> I don't see it affecting the URL parsing in Tomcat.
>
> If the undecoded URL is used in any XML like output it is likely to break
> it. But any user that is using '<' and '>' will be facing that problem
> already.
>
> They look to be higher risk in terms of breaking stuff, but not in a
> security sense.
>
> +1 to your approach.
OK, cool. Would we want to add them to tomcat then? It's a small code change, so I have no problems with Fedora/RHEL diverging a bit here if we don't want them.

Hi
A reverse proxy is not an option and I would like to make a case where we allow double quotes in request URLs as '{', '}' and '|' are allowed today by configuring:
tomcat.util.http.parser.HttpParser.requestTargetAllow="
How can I make this a case?
BR
Lulseged Zerfu

I'm neutral on adding '<' and '>' as allowed options.
I think '"' is in the same category. i.e. there is the risk that unexpected reverse proxy behaviour will trigger a CVE-2016-6816 like issue, no parsing issues and likelihood of breakage if the URL is used in HTML or similar without escaping.

Hi
We don't see anyway out when millions of terminals are not working and that tomcat restricted '"' from being a part of request URL.
Terminals will not comply overnight but are starting to comply slowly. Therefore we need to allow '"' under some transitional period before totally disallow the '"' char in a request URL.
Staying on tomcat version 8.0.36 still risky because CVE-2016-6816 can be triggered.
BR
Lulseged Zerfu

I would like to ask for the ^ character. I'm not sure how to make a case for this. Its kind of important for us because we have been using this to denote financial indexes (similar to yahoo finance) and we have a large number of client installs that would all have to change to enforce uri encoding.
This is basically holding up our migration to Tomcat.
I think it would be preferable if we could select whatever characters we want to override. Its our site and we are the ones responsible for the security and functionality. Every entity that uses Tomcat might need different characters for different reasons. It would be easier to transition if they had access to an override. Clearly the default should be to override nothing but some sites are going to need this or that character to transition.
I could ask to have our clients url encode everything but realistically that could take years to complete.
I would prefer that this exemption be extended rather that having to hack the code base on our own as security updates would be more timely.

URLs can contain sensitive information.
Access logs are expected to contains URLs and, if sensitive information is expected, those files can be cleansed.
It may be surprising to find a URL in a log file and it might not be protected in he way that an access log file would be.

The point regarding log files is a valid one but if the parsing of the request target fails, the access log will contain null rather than the request target.
Generally, we do allow potentially security sensitive information to appear in the logs if debug logging is enabled. I'll look to see what we can do in that case.

This is ASF Bugzilla: the Apache Software Foundation bug system. In case
of problems with the functioning of ASF Bugzilla, please contact
bugzilla-admin@apache.org.
Please Note: this e-mail address is only for reporting problems
with ASF Bugzilla. Mail about any other subject will be silently
ignored.