So I was handing over some CSV export functionality to a client who loaded it into Excel as it is without using the import wizard. This resulted in misinterpreted UTF-8 as WIN-1252. I quickly wrote this little function to add a BOM (error handling omitted for brevity):

A quick test of the function showed it working, so I patched the CSV export to call it, deployed it on the dev server and... it died on the first accented character. I have checked on the dev server from command line and it worked. W.T.F. I compared the mbstring ini values, all the same. W.T.F, no, really, this can't be.

Well, there must be something different, right? What could be? Locale? But what's locale? Environment variables. Hrm, proc_open has environment variables too. Well then let's see whether my shell feeds something into this script that makes it work: env -i php x.php. It breaks! Yay! It's always such relief when I can reproduce a bug that refuses to be reproduced. The solution is always easy after -- the LANG environment variable is en_US.utf8 in the shell, and C in Apache:

Ps. Curiously enough, -f utf-8 as an uconv argument didn't help -- but -f utf-8 -t utf-8 did. Morale of the story: uconv defaults to the value LANG both to and from. This is not documented and it's very hard to discover.