Hi:
As I imported sax_builder to write a DOM parser, I've found
that my Chinese words can not represent as usual. Then I
tracked down the codes in core.py, and the problem seemed
to be located at the class Text. Here is the skeleton:
class Text(CharacterData):
childNodeTypes = []
nodeName = "#text"
# Methods
def __repr__(self):
if len(self._node.value)<20: s=self._node.value
else: s=self._node.value[:17] + '...'
return '<Text node %s>' % (repr(s),)
The built-in function repr() makes conversions to fit with eval()
, which then damage the encoding defined by other locales. It is
better to use str() to replace repr() for now. str() will return a string
if the passed value is the same as a string without any conversion:
return '<Text node %s>' % (str(s),)
I deem it necessary that the truncated ouput constrained by the above
if-else blocks needs to be refined for a better output solution, or
mutiple-bytes words will still have bad outputs. It shouldn't do anything
harm to print all contents out within these elements.
~Frank Chen