Table Of Content

Mourad Elloumi Editor Algorithms for Next-Generation Sequencing Data Techniques, Approaches, and Applications Algorithms for Next-Generation Sequencing Data Mourad Elloumi Editor Algorithms for Next-Generation Sequencing Data Techniques, Approaches, and Applications 123 Editor MouradElloumi LaTICE Tunis,Tunisia UniversityofTunis-ElManar Tunis,Tunisia ISBN978-3-319-59824-6 ISBN978-3-319-59826-0 (eBook) DOI10.1007/978-3-319-59826-0 LibraryofCongressControlNumber:2017950216 ©SpringerInternationalPublishingAG2017 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland To myparentsandmychildren. Preface A deoxyribonucleicacid (DNA)macromoleculecan be codedbya sequenceover a four-letteralphabet.Theselettersare A,C, G, andT, andtheycoderespectively thebasesAdenine,Cytosine,GuanineandThymine.DNAsequencingconsiststhen indeterminingtheexactorderofthesebasesinaDNAmacromolecule.Asamatter of fact, DNA sequencing technology is playing a key role in the advancementof molecular biology. Compared to previous sequencing machines, Next-Generation Sequencing (NGS) machines function much faster, with significantly lower pro- duction costs and much higher throughput in the form of short reads, i.e., short sequencescodingportionsofDNAmacromolecules. AsaresultoftheextendedspreadofNGSmachines,wearewitnessinganexpo- nentialgrowthinthenumberofnewlyavailableshortreads.Hence,wearefacing thechallengeofstoringthemtoanalyzehugenumbersofreadsrepresentingsetsof portionsofgenomes,orevenwholegenomes.Theanalysisofthishugenumberof reads will help, among others, to decode life’s mysteries, detect pathogens, make bettercrops,andimprovequalityoflife.Thisisadifficulttask,anditismadeeven moredifficultnotonlybytheshortlengthsofthereadsandthehugenumberofthese readsbutalsobythepresenceofhighsimilaritybetweentheconcernedportionsof genomes,or whole genomes,and by the presence of manyrepetitive structuresin these genomes, or whole genomes. Such a task requires the development of fast algorithmswithlowmemoryrequirementsandhighperformance. This book surveys the most recent developments on algorithms for NGS data, offeringenoughfundamentalandtechnicalinformationonthesealgorithmsandthe related problems, without overcrowding the reader’s head. It presents the results of the latest investigations in the field of NGS data analysis. The algorithms presented in this book deal with the most important and/or the newest topics encounteredinthisfield.Thesealgorithmsarebasedonnew/improvedapproaches and/or techniques. The few published books on algorithms for NGS data either lack technical depth or focus on specific topics. This book is the first overview on algorithms for NGS data with both a wide coverage of this field and enough depth to be of practical use to working professionals. So, this book tries to find a balance between theoretical and practical coverage of a wide range of issues in vii viii Preface the field of NGS data analysis. The techniques and approaches presented in this book combine sound theory with practicalapplications in life sciences. Certainly, the list of topics covered in this book is not exhaustive, but it is hoped that these topics will get the reader to think of the implications of the presented algorithms on other topics. The chapters presented in this book were carefully selected for qualityandrelevance.Thisbookalsopresentsexperimentsthatprovidequalitative andquantitativeinsightsintothefieldofNGSdataanalysis.Itishopedthatthisbook willincreasetheinterestofresearchersinstudyingawiderrangeofcombinatorial problemsrelatedtoNGSdataanalysis. Preferably,thereaderofthisbookshouldbesomeonewhoisfamiliarwithbioin- formaticsandwouldliketolearnaboutalgorithmsthatdealwiththemostimportant and/orthenewesttopicsencounteredinthefieldofNGSdataprocessing.However, this book could be used by a wider audience such as graduate students, senior undergraduatestudents,researchers,instructors,andpractitionersinbioinformatics, computer science, mathematics, statistics, and life sciences. It will be extremely valuableandfruitfulforthesepeople.Theywillcertainlyfindwhattheyarelooking foror,atleast,acluethatwillhelpthemtomakeanadvanceintheirresearch.This bookisquitetimelysinceNGStechnologyisevolvingatabreathtakingspeedand will certainly point the reader to algorithms for NGS data that may be the key to newandimportantdiscoveriesinlifesciences. This book is organizedinto four parts: Indexing, Compression, and Storage of NGSData;ErrorCorrectioninNGSData;AlignmentofNGSData;andAssembly ofNGSData.The14chapterswerecarefullyselectedtoprovideawidescopewith minimaloverlapbetweenthechapterstoreduceduplication.Eachcontributorwas asked to presentreview material as well as currentdevelopments.In addition,the authorswerechosenfromamongtheleadersintheirrespectivefields. Tunis,Tunisia MouradElloumi April2017 Contents PartI Indexing,Compression,andStorageofNGSData 1 AlgorithmsforIndexingHighlySimilarDNASequences.............. 3 NadiaBenNsira,ThierryLecroq,andMouradElloumi 2 Full-TextIndexesforHigh-ThroughputSequencing................... 41 DavidWeeseandEnricoSiragusa 3 SearchingandIndexingCircularPatterns .............................. 77 CostasS.Iliopoulos,SolonP.Pissis,andM.SohelRahman 4 DeNovoNGSDataCompression......................................... 91 GaetanBenoit,ClaireLemaitre,GuillaumeRizk,ErwanDrezen, andDominiqueLavenier 5 CloudStorage-ManagementTechniquesforNGSData................ 117 EvangelosTheodoridis PartII ErrorCorrectioninNGSData 6 ProbabilisticModels forErrorCorrectionofNonuniform SequencingData............................................................ 131 MarcelH.SchulzandZivBar-Joseph 7 DNA-SeqErrorCorrectionBasedonSubstringIndices............... 147 DavidWeese,MarcelH.Schulz,andHuguesRichard 8 ErrorCorrectioninMethylationProfilingFromNGSBisulfite Protocols .................................................................... 167 GuillermoBarturen,JoséL.Oliver,andMichaelHackenberg ix x Contents PartIII AlignmentofNGSData 9 ComparativeAssessmentofAlignmentAlgorithmsforNGS Data:Features,Considerations,Implementations,andFuture ....... 187 CarolShen,TonyShen,andJimmyLin 10 CUSHAWSuite:ParallelandEfficientAlgorithmsforNGS ReadAlignment............................................................. 203 YongchaoLiuandBertilSchmidt 11 String-Matchingand Alignment Algorithms for Finding MotifsinNGSData......................................................... 235 GiuliaFisconandEmanuelWeitschek PartIV AssemblyofNGSData 12 TheContigAssemblyProblemandItsAlgorithmicSolutions ........ 267 GéraldineJean,AndreeaRadulescu,andIrenaRusu 13 An Efficient Approach to Merging Paired-End Reads andIncorporationofUncertainties....................................... 299 Tomáš Flouri, Jiajie Zhang, Lucas Czech, Kassian Kobert, andAlexandrosStamatakis 14 Assembly-FreeTechniquesforNGSData ............................... 327 MatteoCominandMicheleSchimd Contributors Ziv Bar-Joseph Computational Biology Department and Machine Learning Department,SchoolofComputerScience,CarnegieMellonUniversity,Pittsburgh, PA,USA GuillermoBarturen CentreforGenomicsandOncologicalResearch (GENYO), Granada,Spain NadiaBenNsira LaboratoryofTechnologiesofInformationandCommunication andElectricalEngineering(LaTICE),Tunis,Tunisia UniversityofTunis-ElManar,Tunis,Tunisia The Computer Science, InformationProcessing and Systems Laboratory (LITIS), EA4108,UniversityofRouen-Normandy,Normandy,France GaetanBenoit GenScale,Rennes,France INRIA,Rennes,France Matteo Comin Department of Information Engineering, University of Padova, Padova,Italy LucasCzech HeidelbergInstituteforTheoreticalStudies,Heidelberg,Germany ErwanDrezen GenScale,Rennes,France INRIA,Rennes,France MouradElloumi LaTICE,Tunis,Tunisia UniversityofTunis-ElManar,Tunis,Tunisia Giulia Fiscon Institute for Systems Analysis and Computer Science “Antonio Ruberti”(IASI),NationalResearchCouncil(CNR),Rome,Italy TomášFlouri HeidelbergInstituteforTheoreticalStudies,Heidelberg,Germany Michael Hackenberg Department of Genetics, University of Granada, Granada, Spain xi

Algorithms for Next-Generation Sequencing da,ta. Techniques, Approaches, and Applications PDF

356 Pages·2017·7.003 MB·English

by Mourad Elloumi

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Algorithms for Next-Generation Sequencing da,ta. Techniques, Approaches, and Applications PDF Free - Full Version

by Mourad Elloumi| 2017| 356 pages| 7.003| English

Download Algorithms for Next-Generation Sequencing da,ta. Techniques, Approaches, and Applications by Mourad Elloumi in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Algorithms for Next-Generation Sequencing da,ta. Techniques, Approaches, and Applications

No description available for this book.

Detailed Information

Author:	Mourad Elloumi
Publication Year:	2017
ISBN:	2118649
Pages:	356
Language:	English
File Size:	7.003
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Algorithms for Next-Generation Sequencing da,ta. Techniques, Approaches, and Applications Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Algorithms for Next-Generation Sequencing da,ta. Techniques, Approaches, and Applications PDF?

Yes, on https://PDFdrive.to you can download Algorithms for Next-Generation Sequencing da,ta. Techniques, Approaches, and Applications by Mourad Elloumi completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Algorithms for Next-Generation Sequencing da,ta. Techniques, Approaches, and Applications on my mobile device?

After downloading Algorithms for Next-Generation Sequencing da,ta. Techniques, Approaches, and Applications PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Algorithms for Next-Generation Sequencing da,ta. Techniques, Approaches, and Applications?

Yes, this is the complete PDF version of Algorithms for Next-Generation Sequencing da,ta. Techniques, Approaches, and Applications by Mourad Elloumi. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Algorithms for Next-Generation Sequencing da,ta. Techniques, Approaches, and Applications PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.