I have not looked at the actual test failures yet, but are the failures simplified at all away from having to wade through various dependent libraries? If it is possible to reduce the test cases it will be much easier for us to fix the issues. If not, we still appreciate the report...

Thomas E Enebo
added a comment - 31/Aug/11 1:17 PM I have not looked at the actual test failures yet, but are the failures simplified at all away from having to wade through various dependent libraries? If it is possible to reduce the test cases it will be much easier for us to fix the issues. If not, we still appreciate the report...

Diego Plentz
added a comment - 31/Aug/11 1:59 PM Yup. It depends only on activemodel and rspec-rails(used to create the test). The README shows that it works well using MRI 1.9.2p290. I'm updating the project with one more failing case(and rails 3.1), but this time relate to ActiveRecord 3.1.0 and reload method. For the encoding problems, you could checkout this commit https://github.com/plentz/jruby_report/commit/7f8256d14b838cb1733f9841411b86360ccc4d30
(I renamed the project to jruby_report since it don't have only encoding tests)

Ok, I think there's a number of issues at play in the first non-JDBC failure.

My Psych impl was missing various bits of encoding logic. I'm adding those now, but they alone don't seem to fix this. Also, it appears that what comes out of the YAML file is encoded as UTF-8 regardless of my changes, so I don't think that's the source of problems. I'm going to put the missing logic in anyway.

The actual issue may be that the string produced from the array of error messages is not encoded properly. I'm not sure how that list is subbed into the localized string yet, but if it's using something like Array#to_s that could be the problem...it does not appear that Array#to_s (or Hash#to_s) consider encoding at all right now. I'm going to work on that next.

Charles Oliver Nutter
added a comment - 19/Sep/11 10:02 PM Ok, I think there's a number of issues at play in the first non-JDBC failure.
My Psych impl was missing various bits of encoding logic. I'm adding those now, but they alone don't seem to fix this. Also, it appears that what comes out of the YAML file is encoded as UTF-8 regardless of my changes, so I don't think that's the source of problems. I'm going to put the missing logic in anyway.
The actual issue may be that the string produced from the array of error messages is not encoded properly. I'm not sure how that list is subbed into the localized string yet, but if it's using something like Array#to_s that could be the problem...it does not appear that Array#to_s (or Hash#to_s) consider encoding at all right now. I'm going to work on that next.

Ok, the first encoding problem seems to have boiled down to Array#to_s not preserving/negotiating encodings properly. I modified the logic to match how MRI does it, via rb_str methods rather than through direct updates (as we did before).

Also pushed to that branch is the parser-side encoding logic for Psych. I'm not sure if it actually plays a role in this bug or not.

I'm going to call it a night on this one for now and hope that Diego is able to come up with a better reproduction of the second issue. It will probably be a similarly stupid lack-of-encoding-support somewhere.

Charles Oliver Nutter
added a comment - 20/Sep/11 2:19 AM Ok, the first encoding problem seems to have boiled down to Array#to_s not preserving/negotiating encodings properly. I modified the logic to match how MRI does it, via rb_str methods rather than through direct updates (as we did before).
I've pushed that along with related tweaks in 7eaef8b5ef61d562c46d15f00610d1d9a876cca0 to my JRUBY-6033 topic branch available on http://github.com/headius/jruby .
Also pushed to that branch is the parser-side encoding logic for Psych. I'm not sure if it actually plays a role in this bug or not.
I'm going to call it a night on this one for now and hope that Diego is able to come up with a better reproduction of the second issue. It will probably be a similarly stupid lack-of-encoding-support somewhere.

I've isolated the tests a bit more(removed AS:Json call) and looks like the problem isnt related with json parsing, but with json generation(changing the json engines with multi_json don't change the behavior fot the test tried OkJson and yajl).

Diego Plentz
added a comment - 20/Sep/11 1:22 PM Charles,
I've isolated the tests a bit more(removed AS:Json call) and looks like the problem isnt related with json parsing, but with json generation(changing the json engines with multi_json don't change the behavior fot the test tried OkJson and yajl).
You could take a look at https://github.com/plentz/jruby_report/blob/8c22492579ba33e995c1c7d7c185b0d8251b724f/spec/models/json_spec.rb#L17

Diego Plentz
added a comment - 20/Sep/11 10:10 PM Charles, the test with different charpoints that I commented wasnt pushed that time. That one is here now https://github.com/plentz/jruby_report/blob/72bd8b86b2/spec/models/json_spec.rb#L17
Working now to isolate more the tests.

Diego Plentz
added a comment - 08/Feb/12 12:05 PM Charles/Thomas,
I isolated the tests more and I think it's easier to find out the problems. Now we're down to 2 problems:
OkJson.engine should generate the same char codepoint in both implementations(actually it generates \u00C3\u00A1 for JRuby and \xC3\xA1 for MRI)
Yajl::Encoder should encode the YAML in both implementations, even with special chars (actually it only works in MRI)
The updated code is here: https://github.com/plentz/jruby_report

Started examining this for a quick fix. When I explicitly use json/pure in money rspecs all the encoding issues go away. json-java fails and I suspect the native extension might need some additional love for string creation in 1.9 mode. Currently, it seems to make all strings it creates as ASCII-8BIT when the source string starts as UTF-8.

This particular issue should probably be replaced by multiple individual issues (I suspect for each native extension), but I will leave it open until I get some traction on json-java.

Thomas E Enebo
added a comment - 08/Feb/12 1:57 PM Started examining this for a quick fix. When I explicitly use json/pure in money rspecs all the encoding issues go away. json-java fails and I suspect the native extension might need some additional love for string creation in 1.9 mode. Currently, it seems to make all strings it creates as ASCII-8BIT when the source string starts as UTF-8.
This particular issue should probably be replaced by multiple individual issues (I suspect for each native extension), but I will leave it open until I get some traction on json-java.

Thomas E Enebo
added a comment - 08/Feb/12 3:29 PM As I mentioned in last comment json_pure works right now. json-java will be fixed once this is merged:
https://github.com/flori/json/pull/119
For the ambitious OkJson you should be able to examine my commit and figure out what needs to change in this gem.
Diego, can you update your example to use json-java (you will need to build it yourself [jrake jruby_gem] ) and make sure this gem is at least functioning as you expect?

Diego Plentz
added a comment - 21/Feb/12 11:53 PM Thomas,
Thanks for that fix in the flori/json lib. I think that everything is fine now. OkJson also has been updated and I'm just waiting for a reply here https://github.com/brianmario/yajl-ruby/issues/97 , but either way, looks like that none of the problems( https://github.com/intridea/multi_json/issues/25 ) are with Jruby itself, so I think its better to close this issue and open another if a new problem appears. Thanks!