Created attachment 28251[details]
JSP file to reproduce the matter
(1) Overview
When I install Tomcat5.5.35＋jdk1.5.0_22 and run the JSP(please see the attached document),
I cannot get proper value of a request parameter.
I enter multibyte character (e.g. 10 or aa) into the textbox of the JSP,
it runs correcly and i can get the input value (e.g. 10 or aa).
But I enter 1 byte character (e.g. "1" or "a"),
it runs incorrectly and i can get nothing.
Please advise me.
(Our customers are also waiting for the reason.)
Thank you.
(2) Steps to Reproduce
[2-1] Install Tomcat5.5.35＋jdk1.5.0_22
[2-2] Deploy the JSP file in the following directory.
/apache-tomcat-5.5.35/webapps/jsp-examples
[2-3] Enter the 1 byte character (e.g. "1" or "0") to the textbox and push ok button.
(3) Actual Results
The "message" shows nothing.
(4) Expected Results
The "message" shows the input character.
(5) Build Date & Platform
Build 2012-02-02 on Windows7
(I suppose it does not depend on the Platform.)
(6) Additional Information
Tomcat5.5.34＋jdk1.5.0_22 runs correctly.
So the following codes may be the reason:
---
org.apache.tomcat.util.buf.ByteChunk.toStringInternal()
# Line514
CharBuffer cb;
cb = charset.decode(ByteBuffer.wrap(buff, start, end-start));
return new String(cb.array(), cb.arrayOffset(), cb.length());
---

Created attachment 28252[details]
Test.java - Test Charset.decode()
I am attaching a test class that I wrote based on reproduction scenario in bug 6196991 + charset enumeration code from r1140904.
This test prints names of charsets that cannot perform encoding+decoding roundtrip for single "A" character.
Here is the list of charsets that are affected by this issue,
tested with 1.5.0_20-b02, on Windows:
---
Big5
Big5-HKSCS
EUC-JP
EUC-KR
GB2312
GBK
ISO-2022-JP
JIS_X0212-1990
Shift_JIS
windows-31j
+ two dozens of non-standard charsets whose names start with "x-"
---
With 1.4.2_19-b04 on Windows the list is the same less GB2312 which is absent.
With 1.6.0_30-b12 on Windows the list contains this only charset:
----
JIS_X0212-1990
+ 4 non-standard charsets whose names start with "x-"
----
So:
1. The issue is indeed a bug in JRE.
It is present in latest public versions of 1.4 and 1.5 that I have. I do not know anything about later "Java for business" versions.
2. The issue is absent in Oracle/Sun JDK 1.6.30.
3. The issue affects only certain encodings.
If you can update your configuration and applications to use UTF-8, you would avoid this issue.

Created attachment 28257[details]
new implementation of ByteChunk.toStringInternal()
Hi All.
I am using Charaset affected by this issue.
Although I know this is a issue in Java,
I propose new implementation of ByteChunk.toStringInternal().
I will propose to STATUS.txt. (both 5.5.x and 6.0.x)

Thank you very much for the answer, Mr. Kolinko.
And Thank you for the patch to the issue, Mr. Fujino.
I tried to run the program from Mr. Kolinko,
and could get the "Broken charset" like Shift_JIS.
I could understand that the issue is a bug in JRE,
and it is sure that the support limitation of Java5 was over.
Thank you, sir.
On the other hand, there is a message"Tomcat5.5.x requires 5.0 or later"on the page.
http://tomcat.apache.org/tomcat-5.5-doc/building.html#Download_and_install_a_Java_Development_Kit_1.4.x_or_later
So, We hope to get the patch to the program.
Thank you very much.

(In reply to comment #3)
> Created attachment 28257[details]
> new implementation of ByteChunk.toStringInternal()
>
-1. There are two errors:
1) "return new String(buff, start, end-start);" is just wrong. It converts bytes to String using OS default encoding.
As far as I understand the "result.isUnderflow()" condition means that all input data has been processed. This "return new String" code just handles an unexpected state.
I suggest to replace that code by "cr.throwException();".
2) "charset.newDecoder()" is expected to be an expensive operation. In scenario of CVE-2012-0022 I expect it to have notable impact on performance.
Charset.decode() uses a ThreadLocal-based cache of decoders. Maybe we can implement something like that cache, or just use a simple ThreadLocal (or other way) to pass a Decoder instance around while processing the same request.

(In reply to comment #5)
> Maybe we can
> implement (...) just use a simple ThreadLocal
> to pass a Decoder instance around while processing the same request.
If a Decoder instance is obtained from a ThreadLocal a quick way to test it against required charset is to compare it with decoder.charset().
3) For large input data the current implementation that calls Charset.decode() is better than the proposed one, because it allocates less memory. The difference is between (size * averageCharsPerByte()) and (size * maxCharsPerByte()).
I think threshold can be around 10 bytes.
The Java bug #6196991 occurs when the value of (input size * decoder.averageCharsPerByte()) coerced to integer is 0. In this case in Java 5 the CharsetDecoder#decode(ByteBuffer) method erroneously treats it as if no input data were available. If input is > 10 bytes it should not trigger the bug #6196991.

Created attachment 28274[details]
patch v2
Many thanks for the comments.
I reimplement ByteChunk.toStringInternal().
> I suggest to replace that code by "cr.throwException();".
The code was replaced by result.throwException().
CharacterCodingException is thrown as RuntimeException.
> Charset.decode() uses a ThreadLocal-based cache of decoders. Maybe we can
> implement something like that cache, or just use a simple ThreadLocal (or other
> way) to pass a Decoder instance around while processing the same request.
Cache of Decoder was created using simple ThreadLocal.
This cache is very simple now.
Only one Decoder instance is always cached.
If you would like to cache two or more Decoder instances, it is necessary to refactor.
In that case, a code will become complicated to a slight degree.
> 3) For large input data the current implementation that calls Charset.decode()
> is better than the proposed one, because it allocates less memory. The
> difference is between (size * averageCharsPerByte()) and (size *
> maxCharsPerByte()).
>
> I think threshold can be around 10 bytes.
The threshold value was added.

I am still leaning heavily towards WONTFIX for this.
This issue affects a version of the JVM where fixes are no longer provided for free by Oracle. Users of such a JVM have two options:
1. Upgrade to a JVM release (minimum 1.6) where this is fixed and Oracle continue to make fixes freely available.
2. Pay for Oracle support.
I am extremely reluctant to start adding significant chunks of code into what is a very old Tomcat release in order to work around a bug in a JVM that no-one should be using unless they are paying for support.

This is ASF Bugzilla: the Apache Software Foundation bug system. In case
of problems with the functioning of ASF Bugzilla, please contact
bugzilla-admin@apache.org.
Please Note: this e-mail address is only for reporting problems
with ASF Bugzilla. Mail about any other subject will be silently
ignored.