All the Perl that's Practical to Extract and Report

Navigation

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Without JavaScript enabled, you might want to
use the classic discussion system instead. If you login, you can remember this preference.

Please Log In to Continue

It'd be nice to do, if it weren't so difficult. I've already somewhat documented the annoyances of trying to get UTF8 support in MP3::Info, and additionally in getting that to play nicely with Apache::MP3 (what if your MP3s are in UTF-8 and your directory names, also printed to the browser, are in Latin-1?). It is not an easy thing to do, and you need to weigh the cost versus the benefit.

Consider that charsets are difficult to understand for those that don't already understand them, which a truism, but relevant since most American computer programmers don't need to understand them. Consider that, similarly, most American computer programmers don't have a use for them, so adding support for them not only has no direct benefit, but additionally doesn't scratch that developer's itches. Blah blah blah. Is magical handling of I18N the next killer app?

What do you mean by "Apache::MP3 still needs to know how to encode the specific characters."? What encoding to declare the HTML as being in? It doesn't matter, if everything outside of 00-7F is turned into &#number; (or %xx in a URL -- which you do to the bytes, not the characters, incidentally).

If Latin-1 is a subset of Unicode, then why do Latin-1 characters get munged when read as part of a UTF-8 document? I changed one letter of a directory to be ï (i with an umlaut) in Latin-1, and when read as UTF-8, it was messed up. When read as Latin-1, it was fine. In Latin-1, it has a value of decimal 239. Does it have the same value in UTF-8? If so, then what good would it be to print ï, since it's already known to be byte 239... wouldn't it still need to be specially encoded somehow so