Tuesday, September 20, 2011

Update: changed new to path - with new it would be reasonable to require that the uri fed to the parser is already an ASCI string containing the already URI encoded url.
Consider this code:
use 5.010;
use Encode 'encode';
use URI;

If your page is encoded in UTF8 - then the first one is correct: %C2%B4 is the URI encoded UTF8 encoding of Unicode Character 'ACUTE ACCENT' (U+00B4). If your page encoding is Latin1 - then the second one would be correct - but this is only by accident - in that case you should still use encode("iso-8859-1", ...).

There are probably many other string manipulating libs that should document if their input should be binary encoded data or decoded character strings.