> On 2006-07-13 14:25, Ted Dennison said:
>
>>> 3) utf16 files should be autodetected as text, not binary
>>
>> I'm not a Subversion dev, but I'm curious how would one go about
>> doing
>> this? Normal text files can be detected by the fact that no byte
>> in them
>> is larger than 7F.
>
> That's only true if you're talking about plain ASCII. Other encodings
> like MacRoman, UTF-8, ISO 8859, use the upper 128 chars for things
> like
> accented characters, see:
> <http://en.wikipedia.org/wiki/Extended_ASCII>
>
> I don't know, but I figured svn would be looking for patterns of CR/LF
> chars to guess if something is text. There is no sure-fire way to
> know
> of course.
>
> UTF-16 could also be detected by the BOM:
> <http://en.wikipedia.org/wiki/Byte_Order_Mark>

It discusses how Subversion should store the encoding of files either
within the svn:mime-type property (e.g. svn:mime-type = "text/html;
charset=utf-16") or in a new svn:encoding property (the latter of
which sounds better to me). If Subversion did (or allowed) this, then
svn diff and svn merge and friends could make use of this information
to properly handle UTF-16 and other encodings. UTF-16 (UTF-32, etc.)
files with byte-order markers could automatically be assigned the
correct encoding property, and files without a BOM could still have
it set manually by the user.