3
What are Motifs? Motifs are biologically significant elements that are responsible for common structures or functions Motifs are statistically significant substrings in bio-sequences Assumption: if two entities share same function or structure, common over- represented elements might be responsible for observed similarity C.Pizzi, DEI – Univ. Of Padova (Italy)3

38
LA over a Suffix Array C.Pizzi, DEI – Univ. Of Padova (Italy) In terms of suffix trees, skp[i] is the lexicographically next leaf that does not occur in the subtree below the branching node corresponding to the longest common prefix of Ssuf[i-1] and Ssuf[i]. skp[i] = min({n + 1} U [ j in [i + 1; n] | lcp[i] > lcp[j])

39
LA over Truncated ST Build TST with truncation factor h L = max length of a matrix in the DB if h=L, simply work as ST if h
{
"@context": "http://schema.org",
"@type": "ImageObject",
"contentUrl": "http://images.slideplayer.com/12/3635079/slides/slide_39.jpg",
"name": "LA over Truncated ST Build TST with truncation factor h L = max length of a matrix in the DB if h=L, simply work as ST if h

47
AC and profile matching Build AC automaton for all the words that are a match for the matrix LA partial threshold limits the number of words to those that actually match O(|D||Σ|m + m|Σ|) pre-processing |D|≤|Σ| m depends on matrix and threshold Search the text with AC automaton O(n) search C.Pizzi, DEI – Univ. Of Padova (Italy)

58
Minimum Gain for ACE Dual Concept of look-ahead Compute for every prefix the minimum contribution of the remaining positions in the pattern If current_score(i) + min_gain(i) > Th Report a match Adv: in the automaton save a full subtree of height m-i C.Pizzi, DEI – Univ. Of Padova (Italy)

78
Conclusions Searching matrix is a core step for many bioinformatics applications (searching, discovery, classification…) Several approaches have been developed in recent years Online methods based on filtering are currently the most efficient C.Pizzi, DEI – Univ. Of Padova (Italy)78