thanks for the report. lxml needs to byte encode filenames in order to pass them to libxml2, and it uses the encoding given by sys.getfilesystemencoding() for that ("mbcs" on Windows, as per spec). It looks like libxml2 has a heuristic on Windows that converts UTF-8 encoded file names back to UCS2, so it might be enough to always set the file system encoding to UTF-8 on that platform. I don't have Windows available, could you try this override hack on your side? (You need Cython 0.14.1 for the source build.)

If this doesn't work for you, you can look at _encodeFilename() in apihelpers.pxi. That's where the filename encoding happens. There's also a heuristic in there that tries to recognise file system paths (as opposed to URLs). Maybe you can experiment a bit with that to see if it actually works as expected in your case.