ebook img

Computational text analysis for functional genomics and bioinformatics PDF

313 Pages·2006·2.188 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Computational text analysis for functional genomics and bioinformatics

Computational Text Analysis for Functional Genomics and Bioinformatics This page intentionally left blank Computational Text Analysis for Functional Genomics and Bioinformatics Soumya Raychaudhuri 1 3 GreatClarendonStreet,Oxfordox26dp OxfordUniversityPressisadepartmentoftheUniversityofOxford. ItfurtherstheUniversity’sobjectiveofexcellenceinresearch,scholarship, andeducationbypublishingworldwidein Oxford NewYork Auckland CapeTown DaresSalaam HongKong Karachi KualaLumpur Madrid Melbourne MexicoCity Nairobi NewDelhi Shanghai Taipei Toronto Withofficesin Argentina Austria Brazil Chile CzechRepublic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore SouthKorea Switzerland Thailand Turkey Ukraine Vietnam OxfordisaregisteredtrademarkofOxfordUniversityPress intheUKandincertainothercountries PublishedintheUnitedStates byOxfordUniversityPressInc.,NewYork (cid:1)OxfordUniversityPress,2006 Themoralrightsoftheauthorhavebeenasserted DatabaserightOxfordUniversityPress(maker) Firstpublished2006 Allrightsreserved.Nopartofthispublicationmaybereproduced, storedinaretrievalsystem,ortransmitted,inanyformorbyanymeans, withoutthepriorpermissioninwritingofOxfordUniversityPress, orasexpresslypermittedbylaw,orundertermsagreedwiththeappropriate reprographicsrightsorganization.Enquiriesconcerningreproduction outsidethescopeoftheaboveshouldbesenttotheRightsDepartment, OxfordUniversityPress,attheaddressabove Youmustnotcirculatethisbookinanyotherbindingorcover andyoumustimposethesameconditiononanyacquirer BritishLibraryCataloguinginPublicationData Dataavailable LibraryofCongressCataloginginPublicationData Dataavailable TypesetbySPIPublisherServices,Pondicherry,India PrintedinGreatBritain onacid-freepaperby BiddlesLtd.,King’sLynnNorfolk ISBN0-19-856740-5 978-0-19-8567400 ISBN0-19-856741-3(Pbk.) 978-0-19-8567417(Pbk.) 1 3 5 7 9 10 8 6 4 2 Dedicated to my grandfather and role model Professor Sahadeb Banerjee (4/1/914–4/20/2005) This page intentionally left blank Preface Thisbookisanintroductiontothenewlyemergingfieldoftextualanalysis in genomics. It presents some of the newest methods, and demonstrates applicationstoproteomics,sequenceanalysis,andgeneexpressiondata. Mypersonalinterestinthisfieldbeganearlyduringmygraduateschool yearsasthesemethodswererapidlyemerging.Mycolleagueswereexcitedly utilizingnewhighthroughputtechnologiesinbiologywithwhichtheycould collect data at unprecedented rates. Gene expression arrays, for example, offeredtheopportunitytosimultaneouslyexploreexpressionofallgenesin acell.However,manywerehittingthesameroadblocks;makingsenseofall of that data was tedious and frustrating. Even differentiating signal from noisewasachallenge;certainlyfindingsubtlepatternsinthedataprovedto bemuchmoredifficultthananyoneexpected.Ahostofstatisticalmethods were emerging to analyze the numerical data, but yet they lacked the necessarycontexttofullyharnessthepowerofthesecomplexexperimental results.Thedifficultyisthatcompleteinterpretationrequiresunderstanding allofthelargenumberofgenes,theircomplexfunctions,andinteractions. But,justkeepingupwiththeliteratureon asinglegene canbeachallenge itself, and for thousands of genes it is simply impossible! At that time I became interested in the promise of statistical natural language processing algorithms,andtheirpotentialinbiology.Thesemethodsoftenaretheonly reasonablewaytoincludetheliteratureonthousandsofgenesingenomics dataanalysisandtogivecontexttothedata. Wedescribeanalyticalmethodsthatutilizethescientificliteratureinthe contextofspecificexperimentalmodalitiesinthisbook.Butmuchofwhat isdiscussedherecaneasilybegeneralizedtomostlarge-scaleexperimental methods.Forexample,theexpressionarraymethodscanbegeneralizedto any numerical data set, and the protein interaction methods can be gener- alized to anytype ofinteraction. Inaddition to devisingthe theory behind the methods, we emphasize real world examples and evaluations in this book to demonstrate how methods can be applied practically and what performancebenefittheyoffer. This book can be used as a primary text in a graduate course in a genomics or computational biology curriculum, or as an adjunct text in an advanced computational biology course. The book has been written with sufficient background material and the prerequisites for this book viii Preface arefew.Abasicunderstandingofprobabilityandstatisticsishelpfulatthe level of an introductory undergraduate course. Basic biological and bioin- formatics concepts are reviewed to the extent that is necessary. No back- groundincomputationaltextanalysisisnecessary,butiscertainlyhelpful. We are hopeful that this text will encourage the reader to develop and utilize these methods in their own work, and to maximize the potential of large-scalebiology. Acknowledgements ThisbookwastoalargeextenttheproductofworkthatIstartedunderthe guidance of Russ Altman, who has mentored me through the years. In addition, Jeffrey Chang and Hinrich Schutze have been great influences in thesepursuits.PatrickSutphin,FarhadImam,JoshuaStuart,NipunMehra, AmatoGiaccia,PeterSmall,andDavidBotsteinareallcolleaguesthathave influenced and shaped the content of this book. It has been a pleasure working with Alison Jones and her associates at Oxford University Press. Sourobh Raychaudhuri, my brother, has given me feedback on specific sections. Finally, I thank Meenakshy Chakravorty, my wife, whose critical suggestionsonthismanuscripthavebeeninvaluable. SoumyaRaychaudhuri Boston,USA,2005

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.