mido has asked for the
wisdom of the Perl Monks concerning the following question:

Hello perl monks,

a perl-noob is seeking your wisdom.

I've tried to write a script, which does a directory traversal (using File::Find), gets the mtime of files and moves them.

But i've run into the problem, that some of the UTF-16le encoded filenames use wide-chars, which cannot be interpreted correctly as UTF-8.

I've found some interesting discussions on this problem but no workaround.

I've read that there was a -C switch on perl < 5.8.1 (afair) which told perl to use the windows wide-char syscalls for filesystem stuff (like FindNextW or CreateFileW). This switch does not exist anymore.

Alas, it's still a ridiculous situation. Not counting the various work-arounds using modules from the Win32/Win32API namespace (tye and ikegami usually give answers involving Windows-specific code, see Super Search), you can use Path::Class::Unicode and PerlIO::fse for reasonably portable code.

Looking at the code of Path::Class::Unicode, it is broken at least on some non-Windows operating systems (well, actually on some filesystems). It seems to assume that all file systems will export their entities as UTF-8, which is a fairly broad assumption given VFAT and NFS, which only since v4 in 2009 makes claims on the encoding.

Hmm, it's just that Win32::GetLongPathName() returns the perl string I'd most expect in Win32-land. By "expect", I mean "jives with what I see in Windows Explorer".

Using Explorer, I created a file "snowman ☃" in a new folder "my_dir". That file was created by renaming an empty text file with "snowman " first, and then copy+pasting the snowman character. Then I ran the following:

The results tell me that the return string from Win32::GetLongPathName() is then fit for Unicode semantics in Perl. Nevermind the underlying filesystem encoding of NTFS (UTF-16LE ? I don't know), I can now treat the path as characters from then on.

Sure, long path names are opposite of short path names. What I'm saying is that Win32::GetLongPathName() is handy to get at the characters instead of octets given by File::Find.