ebook img

Probabilistic Models for Collecting, Analyzing, and Modeling Expression Data PDF

182 Pages·2013·15.12 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Probabilistic Models for Collecting, Analyzing, and Modeling Expression Data

Probabilistic Models for Collecting, Analyzing, and Modeling Expression Data Hai-Son Phuoc Le May 2013 CMU-ML-13-101 Probabilistic Models for Collecting, Analyzing, and Modeling Expression Data Hai-SonPhuocLe May2013 CMU-ML-13-101 MachineLearningDepartment SchoolofComputerScience CarnegieMellonUniversity ThesisCommittee ZivBar-Joseph,Chair ChristopherLangmead RoniRosenfeld QuaidMorris Submittedinpartialfulfillmentoftherequirements fortheDegreeofDoctorofPhilosophy. Copyright@2013Hai-SonLe ThisresearchwassponsoredbytheNationalInstitutesofHealthundergrantnumbers 5U01HL108642and1R01GM085022,theNationalScienceFoundationundergrantnum- bers DBI0448453 and DBI0965316, and the Pittsburgh Life Sciences Greenhouse. The viewsandconclusionscontainedinthisdocumentarethoseoftheauthorandshould notbeinterpretedasrepresentingtheofficialpolicies,eitherexpressedorimplied,ofany sponsoringinstitution,theU.S.governmentoranyotherentity. Keywords: genomics,geneexpression,generegulation,microarray,RNA-Seq, transcriptomics,errorcorrection,comparativegenomics,regulatorynetworks, cross-species,expressiondatabase,GeneExpressionOmnibus,GEO,orthologs, microRNA,targetprediction,DirichletProcess,IndianBuffetProcess,hiddenMarkov model,immuneresponse,cancer. ToMomandDad. i Abstract Advancesingenomicsallowresearcherstomeasurethecompletesetoftranscripts incells.ThesetranscriptsincludemessengerRNAs(whichencodeforproteins)and microRNAs,shortRNAsthatplayanimportantregulatoryroleincellularnetworks. Whilethisdataisagreatresourceforreconstructingtheactivityofnetworksincells, italsopresentsseveralcomputationalchallenges. Thesechallengesincludethedata collectionstagewhichoftenresultsinincompleteandnoisymeasurement,developing methodstointegrateseveralexperimentswithinandacrossspecies,anddesigning methodsthatcanusethisdatatomaptheinteractionsandnetworksthatareactivated in specific conditions. Novel and efficient algorithms are required to successfully addressthesechallenges. Inthisthesis, wepresentprobabilisticmodelstoaddressthesetofchallenges associatedwithexpressiondata.First,wepresentanovelprobabilisticerrorcorrection methodforRNA-Seqreads. RNA-Seqgenerateslargeandcomprehensivedatasets thathaverevolutionizedourabilitytoaccuratelyrecoverthesetoftranscriptsincells. However,sequencingreadsinevitablycontainerrors,whichaffectalldownstream analyses.Toaddresstheseproblems,wedevelopanefficienthiddenMarkovmodel- basederrorcorrectionmethodforRNA-Seqdata.Second,fortheanalysisofexpression dataacrossspecies,wedevelopclusteringanddistancefunctionlearningmethodsfor queryinglargeexpressiondatabases. ThemethodsuseaDirichletProcessMixture Modelwithlatentmatchingsandinfersoftassignmentsbetweengenesintwospeciesto allowcomparisonandclusteringacrossspecies.Third,weintroducenewprobabilistic modelstointegrateexpressionandinteractiondatainordertopredicttargetsand networksregulatedbymicroRNAs. Combined,themethodsdevelopedinthisthesisprovideasolutiontothepipeline ofexpressionanalysisusedbyexperimentalistswhenperformingexpressionexperi- ments. iii Acknowledgements APh.D.maybethehighestpersonalacademicrewardwhichmanywishtoachieve,but the road leading to a Ph.D. is certainly not a work of a single person. I would like to expressmydeepestgratitudetothemultitudeofpeoplewhohavetaught,helped,and supportedmeduringthejoyfulbutalsoadventurousandchallengingtimeatCarnegie MellonUniversity. Certainlyforme,writingthisacknowledgementsisoneofthemost wonderfulexercisesingraduateschool. Iamindebtedtothegeneroussupportofmyadvisor,ZivBar-Joseph,whoistruly a source of inspiration and ideas. Not too long after I started school, it immediately becamecleartomethathisacademicsuccessisaproductofaremarkablebalanceof work,family,andlife-longpassions. Heisnotonlyanacademicfatherbutalsoaliferole model. Zivisalwayspersistentandpatientwithansweringamyriadofmyquestions. Everyweek,Ilookforwardtoourmeetingwithalistofquestionsandalwaysleavewith moreideastoworkon. Hisinstinctandfastthinkingabilitycutthroughconceptuallayers ofmanyproblemssoquicklyandleadtoquestions,forwhichusuallytakemeweeksto findgoodanswers. NotonlydidIlearnthetechnicalandresearchmethodology,butI alsodevelopedanappreciationforhigh-impactresearch,whichmustbewell-motivated anddrivenbydeliberateapplicationsandsubstantialfindings. Heissodedicatedtothe researchanddetail-orientedtotheresults. Ononeoccasion,Zivshowedupatmyoffice lateintheeveningtomysurprise. Itturnedoutthathewenthomeearlierforgettingto sendmaterialsneededforourpapersubmissiondueatmidnight. Hewalkedbackto schoolandgaveittomeinperson. Iappreciatethecommittee,RoniRosenfeld,ChrisLangmead,andQuaidMorrisfor theiradvice,comments,andsuggestionstoimprovetheworkinthisthesisandmyoral presentation. Inparticular, Quaidmeticulouslyreadthedraftandsuggestedwaysto makethedraftmorereadable. Roniinsistedonmakingthepresentationmoreaccessible totheaudience. I want to thank past and current members of the Systems Biology group at CMU. MarcelSchulzisremarkableatsellingnewideasanddedicatedtonewresearchcollabora- tion. Hisopennesstoshareknowledgeledtothefirstpartofthisthesis. SaketNavlakha is instrumental in helping me improve my presentation skills. Our discussion about research, philosophical aspects of life, and religion is always entertaining and makes lunchmoreenjoyable. IenjoysmallchatswithAnthonyGitter,ShanZhong,AaronWise, andGuyZinman,withwhomIsharedroomandexplorednewcitiesduringconferences. IamgratefulforadministrativehelpofDianeStidleandMichelleMartininscheduling meetings,talksandpaperwork. Iwouldliketothankmyparents,HaiLeandLanhChau,fortheirunconditionallove andsupport. Mydad,whotaughtmemathsinfirstgrade,introducedmetotheworld oflogicalthinking. Myhumbleandwarm-heartedmomtaughtmehowtolistenand treatpeoplewithcareandrespect. Althoughbothwerenotphysicallywithmeduring myundergraduateandgraduatestudy,theirpresencewasalwaysinmyheart. Mysister, v TramLe,andherfamilyisasourceofcomfortandencouragementindifficulttime. IcherishmytimespendingwithmanynewfriendsinPittsburgh. ThaoPham,Hoang TranandHaNguyencookdeliciousfoodandalwayswelcomemetosharetheirculinary delights. IenjoylisteningtoHangNguyendiscussing,debatingandrantingaboutpolitics, historyorhoroscope. HoanHoalwaysremindsmeofadeterminedandstrong-willed personwhenIfaceadifficulttask. PhuongPhamandThangHomotivatedmetorun andtrainedwithme. IamthankfulforSuzeNinh’sdiligentcareandeffortinrevisingmy writingandlisteningtomypracticetalks. Hersincerelovecomfortsmeinstressfultime. TuanNguyen’ssenseofhumorputsawayworriesandtroubles. Theyarealwaysavailable tolistentomyproblemsandcheerfullyenjoyssipsofwhiskeyorabottleofbeer. I also thank other friends that I have exchanged ideas and interacted with: Lucia Castellanos,RobHall,Tzu-KuoHuang,WooyoungLee,AnkurParikh,LiangXiong,Min XuandYangXu. IwillmissplayingtennisandgoingtothegymwithHoan, Marcel, Saket,Yang,ChaoShen,andHuaShan. Pittsburgh,PA May2013 vi

Description:
microRNA, target prediction, Dirichlet Process, Indian Buffet Process, hidden Markov model . Tran and Ha Nguyen cook delicious food and always welcome me to share their culinary 2 SEECER: a probabilistic method for error correction of RNA-Seq. 13 . A Supplementary materials for Chapter 2.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.