---- Stores Infernal Covariance Model data. Built for ease of use, not speed but-- should work reasonably well.---- TODO Data.Vector?---- TODO use generics?---- TODO functions to change to probabilities!---- TODO cmCanonize function!---- TODO put functions into their own modules, a bit of cleanup---- TODO add functions to insert a new node between two already existing nodes; think about how to handle BIF---- TODO add ability to remove node; think about how to handle BIF---- NOTE maybe BIF should not be insertable/removable right now?moduleBiobase.Infernal.CMwhereimportData.Array.IArrayimportData.List(genericLength)importBiobase.RNAhiding(nucE)-- "E type" nucleotides do not happen in CMs!-- * Data types for Covariance Models-- {{{ Data types-- | A complete covariance model. Each node and each state can be tagged with-- additional data. Typically, say after parsing, the tag will be ().dataCMns=CM{nodes::ArrayInt(Noden),states::ArrayInt(States),header::[(String,String)]-- keeps the list of header entries sorted!,localBegin::ArrayIntDouble,localEnd::ArrayIntDouble,cmType::CMType,nullModel::ArrayNucleotideDouble}deriving(Show)-- | Describes one nodedataNoden=Node{nid::Int,ntype::NodeType,nparents::[Int]-- TODO can there be more than one?,nchildren::[Int],nstates::[Int],ntag::n}deriving(Show)-- | One statedataStates=State{sid::Int,stype::StateType,snode::Int,sparents::[Int],schildren::[Transition],semission::[Emission],stag::s}deriving(Show)-- | CMType is important if we want to set localBegin / localEnd!dataCMType=CMProb|CMScorederiving(Show,Eq)-- | can emit either one nucleotide or a pairdataEmission=EmitS{eNuc::Nucleotide,escore::Double}|EmitP{eNucL::Nucleotide,eNucR::Nucleotide,escore::Double}deriving(Show)-- | branches are transition without attached probability becaue both branches are always takendataTransition=Branch{tchild::Int}|Transition{tchild::Int,tscore::Double}deriving(Show)-- | the different node typesdataNodeType=MATP|MATL|MATR|BIF|ROOT|BEGL|BEGR|ENDderiving(Read,Show,Eq,Ord,Enum,Bounded)-- | the different state typesdataStateType=MP|IL|IR|D|ML|MR|B|S|Ederiving(Read,Show,Eq,Ord,Enum,Bounded)-- }}}-- * make a local model out of a global one-- | generate a local model with local begin prob and local end probcmMakeLocal::Double->Double->CMns->CMnscmMakeLocalpbeginpendcm=cmMakeLocalBeginpbegin$cmMakeLocalEndpendcmcmMakeLocalBegin::Double->CMns->CMnscmMakeLocalBeginpbegincm=cm{localBegin=localBegincm//changes}wherechanges=rootS:(start:intern)rootS=(0,prob2Score01.0)-- root disabled!start=(head.nstates$nodescm!1,prob2Score(1-pbegin)1.0)-- the first state after "root 0"intern=map(\k->(sid$nodeMainStatecmk,prob2Score(pbegin/l)1.0))ndsnds=filter(localBeginPossiblecm).elems$nodescml=genericLengthnds-- TODO have to change the transition score, too!cmMakeLocalEnd::Double->CMns->CMnscmMakeLocalEndpendcm=cm{localEnd=localEndcm//changes}wherechanges=map(\k->(sid$nodeMainStatecmk,prob2Score(pend/l)1.0))ndsnds=filter(localBeginPossiblecm).elems$nodescml=genericLengthnds-- * Transform between score and probability mode-- | given a CM in score mode, change it to probability modecmScore2Prob::CMns->CMnscmScore2Probcm'=ifcmTypecm'==CMProbthencm'elseCM(nodescm)(statesScore2Probcm$statescm)(headercm)(localBeginScore2Prob$localBegincm)(localEndScore2Prob$localEndcm)CMProbnmwherenm=amap(flipscore2Prob0.25)$nullModelcm'cm=cm'{nullModel=nm}-- | Given a CM in prob mode, change to score modecmProb2Score::CMns->CMnscmProb2Scorecm'=ifcmTypecm'==CMScorethencm'elseCM(nodescm)(statesProb2Scorecm$statescm)(headercm)(localBeginProb2Score$localBegincm)(localEndProb2Score$localEndcm)CMScorenmwherenm=amap(flipprob2Score0.25)$nullModelcm'cm=cm'{nullModel=nm}-- | normalize all PROBabilities in a CMcmNormalizeProbabilities::CMns->CMnscmNormalizeProbabilitiescm|cmTypecm==CMScore=error"cannot normalize score-type CM"|otherwise=cm-- TODO have to map normalization over all scores!-- {{{ CM score/prob conversion helpersstatesScore2Prob::CMns->ArrayInt(States)->ArrayInt(States)statesScore2ProbcmsA=amapfsAwherefs=s{schildren=mapfT$schildrens,semission=mapfE$semissions}fTb@(Branch_)=bfT(Transitionkv)=Transitionk(score2Probv1.0)fE(EmitSkv)=EmitSk(score2Probv$nullModelcm!k)fE(EmitPk1k2v)=EmitPk1k2(score2Probv$(nullModelcm!k1)*(nullModelcm!k2))localBeginScore2Prob::ArrayIntDouble->ArrayIntDoublelocalBeginScore2ProbsA=amapfsAwherefs=score2Probs1.0localEndScore2Prob::ArrayIntDouble->ArrayIntDoublelocalEndScore2ProbsA=amapfsAwherefs=score2Probs1.0statesProb2Score::CMns->ArrayInt(States)->ArrayInt(States)statesProb2ScorecmsA=amapfsAwherefs=s{schildren=mapfT$schildrens,semission=mapfE$semissions}fTb@(Branch_)=bfT(Transitionkv)=Transitionk(prob2Scorev1.0)fE(EmitSkv)=EmitSk(prob2Scorev$nullModelcm!k)fE(EmitPk1k2v)=EmitPk1k2(prob2Scorev$(nullModelcm!k1)*(nullModelcm!k2))localBeginProb2Score::ArrayIntDouble->ArrayIntDoublelocalBeginProb2ScoresA=amapfsAwherefs=prob2Scores1.0localEndProb2Score::ArrayIntDouble->ArrayIntDoublelocalEndProb2ScoresA=amapfsAwherefs=score2Probs1.0-- }}}-- * Helper Functions-- {{{ helper functions-- | extract the main state for each node (eg MP state for MATP node)-- TODO shouldn't this just be "head $ nstates n"?nodeMainState::CMns->Noden->StatesnodeMainStatecmn=head$filter((==st).stype)sswhere(Justst)=(ntypen)`lookup`nodeMainStateAssocsss=map(statescm!)$nstatesn-- | Checks for each node, if it can be target of a local begin.localBeginPossible::CMns->Noden->BoollocalBeginPossiblecmn=ifntypen`elem`okNodes&&(not.any(==0)$nparentsn)-- nodes reachable from "root" (that is: node 1) have handled speciallythenTrueelseFalsewhereokNodes=[MATP,MATL,MATR,BIF]-- | Checks for each node if it can lead to a local end.localEndPossible::CMns->Noden->BoollocalEndPossiblecmn=ifntypen`elem`okNodes&&(END/=(ntype$nodescm!(nidn+1)))thenTrueelseFalsewhereokNodes=[MATP,MATL,MATR,BEGL,BEGR]-- | transform scores into probabilities, given a nullmodel for x-- TODO quickcheck!score2Probxnull|x==(-1/0)=0|otherwise=exp(x*log2)*null-- | back into scoresprob2Scorexnull|x==0=(-1/0)|otherwise=log(x/null)/log2-- | Transform a state, setting probabilities instead of scores. Requires CM-- knowledge for background model.-- TODO actually use the background modelstateScore2Prob::CMns->States->StatesstateScore2Probcms=error"implement me"-- | Transform a state, setting scores instead of probabilities.stateProb2Score::CMns->States->StatesstateProb2Scorecms=error"implement me"transitionTargets::[Transition]->[Int]transitionTargetsxs=mapfxswheref(Branchx)=xf(Transitionx_)=xnodeMainStateAssocs::[(NodeType,StateType)]nodeMainStateAssocs=[(MATP,MP),(MATL,ML),(MATR,MR),(BIF,B),(ROOT,S),(BEGL,S),(BEGR,S),(END,E)]-- }}}-- TODO score -> prob : exp (x / log 2) -- CHECK THIS!!!-- score -> prob : exp (x * log 2)-- prob -> score : (log x) / log 2-- TODO mkGlobal (needed?)-- TODO use logsum : log (exp x + exp y) = x + log (1 + exp (y-x)), where x>y