Description: Prof. Gifford talks about library complexity as it relates to genome sequencing. He explains how to create a full-text minute-size (FM) index, which involves a Burrows-Wheeler transform (BWT). He ends with how to deal with the problem of mismatching.

It'seasytotellnow,right?Becauseyoucanactuallyanalyzethedistributionofthereadsthat theygotandyoucangobackandyoucouldestimatethemarginalvalueofadditionalsequencing.Andthewayyoudothatisyougobacktothedistributionthatyoufitthisnegativebinomialandaskifyouhavermorereads,howmanymoreuniquemoleculesareyougoingtoget?Andtheansweristhatyoucanseethatifyouimaginethatthisisartificialdata,butifyouimaginethat you had acomplexityof10tothe6molecules,thenumbersequencingregionsis onthex-axis,thenumberofobserveddistinctmoleculesisonthey-axis,andasyouincreasethesequencingdepth,yougetmoreandmorebacktothelibrary.

Sothatmeansthatwhenwedothisrotation,thatthistextualoccurrenceofawillhavethesamerankinthefirstcolumnandinthelastcolumn.AndyoucanseeI'veannotatedthevariousbasesherewiththeirranks.Thisisthefirstg,thefirstc,thefirstendofline,end of stringcharacter.Firsta,seconda,thirda,secondc.AndcorrespondinglyIhavethesameannotationsoverhereandthusthethirdahereisthesamelexicaloccurrenceasthethirdaontheleftinthestring,sametextoccurrence.

Or I canlookrightoverhereandseethatin fact it's6,right?Becausethisoccurrenceofg1isrighthere.SothisLFvalueis,it's640a1isin1,a2isin2,a3isin3,c2isin5.Sothisisthe LFfunction,6401235.AndIdon'tneedanyofthistocomputeit.Becauseitsimplyisequalto,goingbackoneslide,it'sequaltoOccofcpluscount.Soit'goingtobeequaltowherethatparticularcharacterstartsonthelefthandsideanditsrankminus1.

PROFESSOR: What aboutgaps?BWA,Ibelieve,processesgaps.Butgapsaremuch,muchlesslikelythanmissedbases.Theotherthingisthatifyou'redoingasequencinglibrary,andyouhavearead thatactuallyhasagapinit,it'sprobablythecaseyou haveanotherreadthatdoesn't.Forthesamesequence.Soitislessimportanttoprocessgapsthan it is toprocessdifferences.

About MIT OpenCourseWare

MIT OpenCourseWare makes the materials used in the teaching of almost all of MIT's subjects available on the Web, free of charge. With more than 2,400 courses available, OCW is delivering on the promise of open sharing of knowledge. Learn more »