Scratch pad for things I learn as I go along

Python on Windows – Unicode environment variables

Say you want to open a file picker dialog in the User’s profile root, or log to a file under AppData, or do anything else involving environment variables in Windows relating to file paths. You could use os.environ/os.getenv() for this however both these methods only return ASCII. If your user happens to have characters higher than codepoint 128 in their name (using some system codepage) then these methods will likely return a mangled approximation of the path. If the user has codepoints higher than 255 then it’ll just return question marks for the most part. Hence these paths:

"C:\Users\Rosnička""C:\Users\发涩"

Are returned as:

"C:\Users\Rosnicka""C:\Users\??"

Which is clearly unacceptable.

The function os.path.expanduser() suffers from the same problem since it uses environment variables internally.

Given that these paths have already been mangled in converting them to ASCII you can’t decode them using the system encoding (as you can do for some other file paths on windows using the sys.getsystemencoding() function as a second argument to unicode()).

The solution is to use ctypes to query the win32 API and get the actual unicode values of the environment variables. This function (cribbed from here) allows you to do this and returns a python-native unicode string.

Also, it should be checking the return value from the second call to GetEnvironmentVariableW, and maybe doing more robust error handling to distinguish between a missing environment variable and other errors.

daira
08:15 on August 22, 2014

Sigh, it didn’t survive the sanitisation. The Unicode string should have one character (it doesn’t actually matter which one, unless the second GetEnvironmentVariableW call fails); it should not be the empty string.