-- | This module provides fast, validated encoding and decoding functions-- between 'ByteString's and 'String's. It does not exactly match the-- output of the Codec.Binary.UTF8.String output for invalid encodings-- as the number of replacement characters is sometimes longer.moduleData.ByteString.UTF8(B.ByteString,decode,replacement_char,uncons,splitAt,take,drop,span,break,fromString,toString,foldl,foldr,length,lines,lines')whereimportData.BitsimportData.WordimportqualifiedData.ByteStringasBimportPreludehiding(take,drop,splitAt,span,break,foldr,foldl,length,lines)importCodec.Binary.UTF8.String(encode)-- | Converts a Haskell string into a UTF8 encoded bytestring.fromString::String->B.ByteStringfromStringxs=B.pack(encodexs)-- | Convert a UTF8 encoded bytestring into a Haskell string.-- Invalid characters are replaced with '\xFFFD'.toString::B.ByteString->StringtoStringbs=foldr(:)[]bs-- | This character is used to mark errors in a UTF8 encoded string.replacement_char::Charreplacement_char='\xfffd'-- | Try to extract a character from a byte string.-- Returns 'Nothing' if there are no more bytes in the byte string.-- Otherwise, it returns a decoded character and the number of-- bytes used in its representation.-- Errors are replaced by character '\0xFFFD'.-- XXX: Should we combine sequences of errors into a single replacement-- character?decode::B.ByteString->Maybe(Char,Int)decodebs=do(c,cs)<-B.unconsbsreturn(choose(fromEnumc)cs)wherechoose::Int->B.ByteString->(Char,Int)chooseccs|c<0x80=(toEnum$fromEnumc,1)|c<0xc0=(replacement_char,1)|c<0xe0=bytes2(maskc0x1f)cs|c<0xf0=bytes3(maskc0x0f)cs|c<0xf8=bytes4(maskc0x07)cs|otherwise=(replacement_char,1)mask::Int->Int->Intmaskcm=fromEnum(c.&.m)combine::Int->Word8->Intcombineaccr=shiftLacc6.|.fromEnum(r.&.0x3f)follower::Int->Word8->MaybeIntfolloweraccr|r.&.0xc0==0x80=Just(combineaccr)follower__=Nothing{-# INLINE get_follower #-}get_follower::Int->B.ByteString->Maybe(Int,B.ByteString)get_followeracccs=do(x,xs)<-B.unconscsacc1<-followeraccxreturn(acc1,xs)bytes2::Int->B.ByteString->(Char,Int)bytes2ccs=caseget_followerccsofJust(d,_)|d>=0x80->(toEnumd,2)|otherwise->(replacement_char,1)_->(replacement_char,1)bytes3::Int->B.ByteString->(Char,Int)bytes3ccs=caseget_followerccsofJust(d1,cs1)->caseget_followerd1cs1ofJust(d,_)|(d>=0x800&&d<0xd800)||(d>0xdfff&&d<0xfffe)->(toEnumd,3)|otherwise->(replacement_char,3)_->(replacement_char,2)_->(replacement_char,1)bytes4::Int->B.ByteString->(Char,Int)bytes4ccs=caseget_followerccsofJust(d1,cs1)->caseget_followerd1cs1ofJust(d2,cs2)->caseget_followerd2cs2ofJust(d,_)|d>=0x10000->(toEnumd,4)|otherwise->(replacement_char,4)_->(replacement_char,3)_->(replacement_char,2)_->(replacement_char,1)-- | Split after a given number of characters.-- Negative values are treated as if they are 0.splitAt::Int->B.ByteString->(B.ByteString,B.ByteString)splitAtxbs=loop0xbswhereloopan_|n<=0=B.splitAtabsloopanbs1=casedecodebs1ofJust(_,y)->loop(a+y)(n-1)(B.dropybs1)Nothing->(bs,B.empty)-- | @take n s@ returns the first @n@ characters of @s@.-- If @s@ has less then @n@ characters, then we return the whole of @s@.take::Int->B.ByteString->B.ByteStringtakenbs=fst(splitAtnbs)-- | @drop n s@ returns the @s@ without its first @n@ characters.-- If @s@ has less then @n@ characters, then we return the an empty string.drop::Int->B.ByteString->B.ByteStringdropnbs=snd(splitAtnbs)-- | Split a string into two parts: the first is the longest prefix-- that contains only characters that satisfy the predicate; the second-- part is the rest of the string.-- Invalid characters are passed as '\0xFFFD' to the predicate.span::(Char->Bool)->B.ByteString->(B.ByteString,B.ByteString)spanpbs=loop0bswhereloopacs=casedecodecsofJust(c,n)|pc->loop(a+n)(B.dropncs)_->B.splitAtabs-- | Split a string into two parts: the first is the longest prefix-- that contains only characters that do not satisfy the predicate; the second-- part is the rest of the string.-- Invalid characters are passed as '\0xFFFD' to the predicate.break::(Char->Bool)->B.ByteString->(B.ByteString,B.ByteString)breakpbs=span(not.p)bs-- | Get the first character of a byte string, if any.-- Malformed characters are replaced by '\0xFFFD'.uncons::B.ByteString->Maybe(Char,B.ByteString)unconsbs=do(c,n)<-decodebsreturn(c,B.dropnbs)-- | Traverse a bytestring (right biased).foldr::(Char->a->a)->a->B.ByteString->afoldrconsnilcs=caseunconscsofJust(a,as)->consa(foldrconsnilas)Nothing->nil-- | Traverse a bytestring (left biased).-- This fuction is strict in the acumulator.foldl::(a->Char->a)->a->B.ByteString->afoldladdacccs=caseunconscsofJust(a,as)->letv=addaccainseqv(foldladdvas)Nothing->acc-- | Counts the number of characters encoded in the bytestring.-- Note that this includes replacment characters.length::B.ByteString->Intlengthb=loop0bwhereloopnxs=casedecodexsofJust(_,m)->loop(n+1)(B.dropmxs)Nothing->n-- | Split a string into a list of lines.-- Lines are termianted by '\n' or the end of the string.-- Empty line may not be terminated by the end of the string.-- See also 'lines\''.lines::B.ByteString->[B.ByteString]linesbs|B.nullbs=[]linesbs=caseB.elemIndex10bsofJustx->let(xs,ys)=B.splitAtxbsinxs:lines(B.tailys)Nothing->[bs]-- | Split a string into a list of lines.-- Lines are termianted by '\n' or the end of the string.-- Empty line may not be terminated by the end of the string.-- This function preserves the terminators.-- See also 'lines'.lines'::B.ByteString->[B.ByteString]lines'bs|B.nullbs=[]lines'bs=caseB.elemIndex10bsofJustx->let(xs,ys)=B.splitAt(x+1)bsinxs:lines'ysNothing->[bs]