On Wed, May 11, 2011 at 8:13 PM, Eric U <ericu@google.com> wrote:
> So it's not locale-sensitive unless it is, but nobody does that
> anyway, so don't worry about it? I'm a bit uneasy about that in
> general, but Windows not supporting it is a good point.
It's not locale-sensitive at all, unless the one special case, Turkish, is
enabled explicitly. I think the norm is to ignore Turkish entirely for
purposes of case folding. (I wasn't even able to find a way to do a
Turkish-enabled case folding with libicu, though the header constant
"U_FOLD_CASE_EXCLUDE_SPECIAL_I" suggests it's in there somewhere.)
Anyone know about Mac or Linux systems?
>
Native Linux filesystems are case-sensitive, so I'm not sure there's
anything to compare against there. (glibc itself doesn't have direct
support for case folding, as far as I know; you use a libraries like libicu
for that sort of thing, and libicu does consider "i" == "I" when case
folding, including in Turkish locales.)
> I'm not liking the backslash exception. It's the only thing that prevents
> > this API from being a complete superset, as far as I can see, of all
> > production filesystems. Can we drop that rule? It might be a little
> > surprising to developers who have only worked in Windows, but they'll be
> > surprised anyway, and it shouldn't lead to latent bugs.
>
> It can't be a complete superset of all filesystems in that it doesn't
> allow forward slash in filenames either.
> However, I see your point. You could certainly have a filename with a
> backslash in it on a Linux/ext2 system. Does anyone else have an
> opinion on whether it's worth the confusion potential?
>
Of all production end-user filesystems--on any systems where they're
allowed, users are going to be used to this being incompatible with the rest
of the world already.
I guess there's one other case where it's not necessarily a superset:
filenames containing invalid byte sequences which can't be represented in
UTF-16. I do end up with these from time to time, eg. when extracting a ZIP
containing non-UTF-8 filenames. I think I'm not very worried about this (at
least for the sandbox case)--this is an error recovery case, where
backslashes in filenames are legitimate, if uncommon.
--
Glenn Maynard