Broken UTF-8 handling in newest Rubygems when environment locales are not set

Paweł Wilk

27 Feb, 2011 01:05 AM

Hi,

I'm encountering some problems after upgrade to newest
Rubygems.

It happens when locale settings from the environment are empty
(not even "C" locale) and someone tries to install some gem that
was created using the newest Rubygems. The major precondition for
error to occur is a UTF-8 character in some descriptive field (e.g.
developer). The locale settings used when building a gem are
irrevelant.

RubyGems 1.5.2 on Ruby 1.8 seems unaffected (but since 1.8 is
often encoding-unaware
it might be just a bug covering other bug).

Note, that this error occurs when installing packages
containing
UTF-8 characters in some fields that were built with RubyGems
1.5.2.
Packages built using previous version of RubyGems are installed
successully.

There is a difference in specfiles, that might help in tracking
cause
of this problem. Package i18n-inflector-2.5.0 was build with old
RG,
package i18n-inflector-2.5.1 with new RG. Both were installed
using
new RG, but in case of i18n-inflector-2.5.1 installation failed
with the error quoted in my previous post.

(I used Pastie since this markup makes backslashes with
character codes interpretable)

See: the authors line. New RubyGems produce
unescaped version.
I don't know, maybe that's intended, but the new RG client
has some problems with installing such a gem when environment
locales are not set.

Since I am not in control of systems that are using my gem this
workaround cannot be applied by me.

Yesterday I've discovered that it may be related to psych
parser. When using syck parser whlie building gem the problem
dissapears during installation since UTF-8 characters are escaped
in a gem manifest file.

To apply the workaround the developer has to set the YAML engine
to syck in the Rakefile and/or gemspec: