---- |-- Module : Codec.Binary.UTF8.String-- Copyright : (c) Eric Mertens 2007-- License : BSD3-style (see LICENSE)-- -- Maintainer: emertens@galois.com-- Stability : experimental-- Portability : portable---- Support for encoding UTF8 Strings to and from @[Word8]@--moduleCodec.Binary.UTF8.String(encode,decode,encodeString,decodeString)whereimportData.Word(Word8)importData.Bits((.|.),(.&.),shiftL,shiftR)importData.Char(chr,ord)default(Int)-- | Encode a string using 'encode' and store the result in a 'String'.encodeString::String->StringencodeStringxs=map(toEnum.fromEnum)(encodexs)-- | Decode a string using 'decode' using a 'String' as input.-- | This is not safe but it is necessary if UTF-8 encoded text-- | has been loaded into a 'String' prior to being decoded.decodeString::String->StringdecodeStringxs=decode(map(toEnum.fromEnum)xs)replacement_character::Charreplacement_character='\xfffd'-- | Encode a Haskell String to a list of Word8 values, in UTF8 format.encode::String->[Word8]encode=concatMap(mapfromIntegral.go.ord)wheregooc|oc<=0x7f=[oc]|oc<=0x7ff=[0xc0+(oc`shiftR`6),0x80+oc.&.0x3f]|oc<=0xffff=[0xe0+(oc`shiftR`12),0x80+((oc`shiftR`6).&.0x3f),0x80+oc.&.0x3f]|otherwise=[0xf0+(oc`shiftR`18),0x80+((oc`shiftR`12).&.0x3f),0x80+((oc`shiftR`6).&.0x3f),0x80+oc.&.0x3f]---- | Decode a UTF8 string packed into a list of Word8 values, directly to String--decode::[Word8]->Stringdecode[]=""decode(c:cs)|c<0x80=chr(fromEnumc):decodecs|c<0xc0=replacement_character:decodecs|c<0xe0=multi1|c<0xf0=multi_byte20xf0x800|c<0xf8=multi_byte30x70x10000|c<0xfc=multi_byte40x30x200000|c<0xfe=multi_byte50x10x4000000|otherwise=replacement_character:decodecswheremulti1=casecsofc1:ds|c1.&.0xc0==0x80->letd=((fromEnumc.&.0x1f)`shiftL`6).|.fromEnum(c1.&.0x3f)inifd>=0x000080thentoEnumd:decodedselsereplacement_character:decodeds_->replacement_character:decodecsmulti_byte::Int->Word8->Int->[Char]multi_byteimaskoverlong=auxics(fromEnum(c.&.mask))whereaux0rsacc|overlong<=acc&&acc<=0x10ffff&&(acc<0xd800||0xdfff<acc)&&(acc<0xfffe||0xffff<acc)=chracc:decoders|otherwise=replacement_character:decodersauxn(r:rs)acc|r.&.0xc0==0x80=aux(n-1)rs$shiftLacc6.|.fromEnum(r.&.0x3f)aux_rs_=replacement_character:decoders