{-# LANGUAGE TypeSynonymInstances, FlexibleInstances, PatternGuards #-}{-|
This module is for working with HTML/XML. It deals with both well-formed XML and
malformed HTML from the web. It features:
* A lazy parser, based on the HTML 5 specification - see 'parseTags'.
* A renderer that can write out HTML/XML - see 'renderTags'.
* Utilities for extracting information from a document - see '~==', 'sections' and 'partitions'.
The standard practice is to parse a 'String' to @[@'Tag' 'String'@]@ using 'parseTags',
then operate upon it to extract the necessary information.
-}moduleText.HTML.TagSoup(-- * Data structures and parsingTag(..),Row,Column,Attribute,moduleText.HTML.TagSoup.Parser,moduleText.HTML.TagSoup.Render,canonicalizeTags,-- * Tag identificationisTagOpen,isTagClose,isTagText,isTagWarning,isTagPosition,isTagOpenName,isTagCloseName,-- * ExtractionfromTagText,fromAttrib,maybeTagText,maybeTagWarning,innerText,-- * Utilitysections,partitions,-- * CombinatorsTagRep(..),(~==),(~/=))whereimportText.HTML.TagSoup.TypeimportText.HTML.TagSoup.ParserimportText.HTML.TagSoup.RenderimportData.CharimportData.ListimportText.StringLike-- | Turns all tag names and attributes to lower case and-- converts DOCTYPE to upper case.canonicalizeTags::StringLikestr=>[Tagstr]->[Tagstr]canonicalizeTags=mapfwheref(TagOpentagattrs)|Just('!',name)<-unconstag=TagOpen('!'`cons`ucasename)attrsf(TagOpennameattrs)=TagOpen(lcasename)[(lcasek,v)|(k,v)<-attrs]f(TagClosename)=TagClose(lcasename)fa=aucase=fromString.maptoUpper.toStringlcase=fromString.maptoLower.toString-- | Define a class to allow String's or Tag str's to be used as matchesclassTagRepawheretoTagRep::StringLikestr=>a->TagstrinstanceStringLikestr=>TagRep(Tagstr)wheretoTagRep=fmapcastStringinstanceTagRepStringwheretoTagRepx=caseparseTagsxof[a]->toTagRepa_->error$"When using a TagRep it must be exactly one tag, you gave: "++x-- | Performs an inexact match, the first item should be the thing to match.-- If the second item is a blank string, that is considered to match anything.-- For example:---- > (TagText "test" ~== TagText "" ) == True-- > (TagText "test" ~== TagText "test") == True-- > (TagText "test" ~== TagText "soup") == False---- For 'TagOpen' missing attributes on the right are allowed.(~==)::(StringLikestr,TagRept)=>Tagstr->t->Bool(~==)ab=fa(toTagRepb)wheref(TagTexty)(TagTextx)=strNullx||x==yf(TagClosey)(TagClosex)=strNullx||x==yf(TagOpenyys)(TagOpenxxs)=(strNullx||x==y)&&allgxswhereg(name,val)|strNullname=val`elem`mapsndys|strNullval=name`elem`mapfstysgnameval=nameval`elem`ysf(TagCommentx)(TagCommenty)=strNullx||x==yf(TagWarningx)(TagWarningy)=strNullx||x==yf(TagPositionx1x2)(TagPositiony1y2)=x1==y1&&x2==y2f__=False-- | Negation of '~=='(~/=)::(StringLikestr,TagRept)=>Tagstr->t->Bool(~/=)ab=not(a~==b)-- | This function takes a list, and returns all suffixes whose-- first item matches the predicate.sections::(a->Bool)->[a]->[[a]]sectionsp=filter(p.head).init.tails-- | This function is similar to 'sections', but splits the list-- so no element appears in any two partitions.partitions::(a->Bool)->[a]->[[a]]partitionsp=letnotp=not.pingroupBy(constnotp).dropWhilenotp