[ https://issues.apache.org/jira/browse/COUCHDB-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749058#action_12749058 ]
Paul Joseph Davis commented on COUCHDB-345:
-------------------------------------------
Thanks for tracking this down to find a repeatable test. Couple notes:
I actually got caught by this just the other day, but the json rfc says that strings must be Unicode, not utf-8. Ie, utf-8, 16 and 32, and both BE and LE for 16 and 32. I don't think I've ever seen a completely compliant parser.
Does anyone know if this byte range is special? I'm no encoding expert so I don't have a good feel for what might be the issue.
While I appreciate your tracking this down, in the future could you submit test cases as JavaScript, erlang, or a bash script? Not many of us have a java dev environment setup to run junit tests.
> "High ASCII" can be inserted into db but not retrieved
> ------------------------------------------------------
>
> Key: COUCHDB-345
> URL: https://issues.apache.org/jira/browse/COUCHDB-345
> Project: CouchDB
> Issue Type: Bug
> Affects Versions: 0.9
> Environment: OSX 10.5.6
> Reporter: Joan Touzet
> Attachments: badtext.tar.gz, enctest.zip
>
>
> It is possible to PUT/POST a document into CouchDB with a "high ASCII" value that cannot be retrieved. This results from not escaping a non-ASCII value into \u#### when PUT/POSTing the document.
> The attached sample code will recreate the problem using the hex value D8 (Ø) in a possibly unsavoury test string.
> Sample output against 0.9.0 is as follows:
> ================================================
> {
> "ok": true
> }
> {
> "id": "fail",
> "ok": true,
> "rev": "1-76726372"
> }
> {
> "error": "ucs",
> "reason": "{bad_utf8_character_code}"
> }
> ================================================
> Please note this defect turned up another problem, namely that the bad_utf8_character_code exception thrown by a design document attempting to map() the bad document caused Futon to fail silently in building the view, with no indication (except via debug log) that there was a failure. The log indicated two attempts to build the view, both failing, followed by an uncaught exception error for Futon.
> Based on this, there are likely other areas in the codebase that do not handle the bad_utf8_character_code exception correctly.
> My belief is that CouchDB shouldn't accept this input and should have rejected the PUT/POST, or should have escaped the input itself before the insertion.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.