IB2017 has asked for the
wisdom of the Perl Monks concerning the following question:

Dear Monks

I've been having issues with UNICODE since I started programming in Perl/Windows. Things work, but they mostly require so much adaptation, at least for me. Today I have a new problem I wasn't able to solve: connecting to a SQLite database saved in a directory containing unicode characters. The strange thing is (in my eyes): I am able to create the database without any problem, but I fail to open/access it. In the following (non-sense) script I create 2 databases in two directories (one with and one without unicode characters) and try to access them. Creation is okay for both. Access only for the database in the directory without unicode characters. What I am not understanding?

Unixish filesystems (and the APIs) usually expose the filename as a binary blob, which matches well with using UTF-8 encoded filenames.

Windows filesystems (and the APIs) usually expose the filename as Wide Characters, so if you get the filename as UTF-8, you need to translate it to Wide Characters and you also need to use the Wide APIs (CreateFileW etc) to access such files.

I'm not convinced that the non-ascii character in your code is actually utf-8 (at least not as represented here on this site). Replacing this with its named character (slightly different from Corion's example) I get this test script which passes fine on Linux (having first run mkdir a ü of course). YMMV with other OSes.

I'm not sure if you're trying to work with Windows and/or a non-Windows OS. You mention Windows in your first sentence, but your code appears to be more for Linux (based on the first line and the file paths used). The comments below would only apply to Windows.

I'm probably not going to be able to explain this fully and might not use the correct terminology. In Windows, there's an attempt to maintain backwards compatibility. As a result, the default filesystem API has some limits (such as no Unicode support and a max path length of about 260 characters). Most programs (including file explorer and the command prompt) use this API. And this is also the API used by most Perl modules.

There is a second filesystem API available that will allow for Unicode characters and a significantly larger max path length. The one Perl module that I've had success in dealing with longer path names is Win32::LongPath and it does support Unicode characters in paths.

I don't have experience using the DBI module, so I took a quick look to see if it will take a file handle instead a file path. Unless I missed it, I don't think that it does. I would suggest trying to take a look at the shortpathL function from the Win32::LongPath module. This function will attempt to return the "short path" (which I'm assuming is the path in DOS 8.3 format). You might have better luck using the "short path" of the file that has Unicode characters in its path.