ebook img

Computational Life Sciences: Data Engineering and Data Mining for Life Sciences PDF

593 Pages·2023·19.16 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Computational Life Sciences: Data Engineering and Data Mining for Life Sciences

Studies in Big Data 112 Jens Dörpinghaus Vera Weil Sebastian Schaaf Alexander Apke   Editors Computational Life Sciences Data Engineering and Data Mining for Life Sciences Studies in Big Data Volume 112 SeriesEditor JanuszKacprzyk,PolishAcademyofSciences,Warsaw,Poland Theseries“StudiesinBigData”(SBD)publishesnewdevelopmentsandadvances in the various areas of Big Data- quickly and with a high quality. The intent is to coverthetheory,research,development,andapplicationsofBigData,asembedded inthefieldsofengineering,computerscience,physics,economicsandlifesciences. The books of the series refer to the analysis and understanding of large, complex, and/or distributed data sets generated from recent digital sources coming from sensorsorotherphysicalinstrumentsaswellassimulations,crowdsourcing,social networks or other internet transactions, such as emails or video click streams and other. The series contains monographs, lecture notes and edited volumes in Big Data spanning the areas of computational intelligence including neural networks, evolutionary computation, soft computing, fuzzy systems, as well as artificial intelligence, data mining, modern statistics and Operations research, as well as self-organizing systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, whichenablebothwideandrapiddisseminationofresearchoutput. Thebooksofthisseriesarereviewedinasingleblindpeerreviewprocess. IndexedbySCOPUS,EICompendex,SCIMAGOandzbMATH. AllbookspublishedintheseriesaresubmittedforconsiderationinWebofScience. · · · Jens Dörpinghaus Vera Weil Sebastian Schaaf Alexander Apke Editors Computational Life Sciences Data Engineering and Data Mining for Life Sciences Editors JensDörpinghaus VeraWeil FederalInstituteforVocationalEducation DepartmentforMathematicsandComputer andTraining(BIBB) Science Bonn,Germany UniversityofCologne Cologne,Germany GermanCenterforNeurodegenerative Diseases(DZNE) AlexanderApke Bonn,Germany DepartmentforMathematicsandComputer Science SebastianSchaaf UniversityofCologne GermanCenterforNeurodegenerative Cologne,Germany Diseases(DZNE) Bonn,Germany ISSN 2197-6503 ISSN 2197-6511 (electronic) StudiesinBigData ISBN 978-3-031-08410-2 ISBN 978-3-031-08411-9 (eBook) https://doi.org/10.1007/978-3-031-08411-9 ©TheEditor(s)(ifapplicable)andTheAuthor(s),underexclusivelicensetoSpringerNature SwitzerlandAG2022 Thisworkissubjecttocopyright.AllrightsaresolelyandexclusivelylicensedbythePublisher,whether thewholeorpartofthematerialisconcerned,specificallytherightsoftranslation,reprinting,reuse ofillustrations,recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,and transmissionorinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilar ordissimilarmethodologynowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface Thelifescienceshavelongbeenconsideredadescriptive science—10yearsago,thefieldwasrelativelydatapoor,and scientistscouldeasilykeepupwiththedatatheygenerated.But withadvancesingenomics,imagingandothertechnologies, biologistsarenowgeneratingdataatcrushingspeeds. —EmilySinger,2013 Inanutshell,thesewordsfromEmilySinger1describethestartingpointofourbook: Nowadays, most life scientists have to handle enormous amounts of data. That is, theymustbeabletousetechniquesandapplyadaptedtoolstotheirspecificproblem athandinordertoapproachitssolution.Inotherwords,possessingandgenerating (big) data is one thing, but the heart of the matter is to retrieve information out of thisdata—efficientlyandreliably. WhyLifeSciencesandData Let’sleavethenutshell:Biologyisanexcitingfield,whichhasrecentlydeveloped to what we call life sciences. On top of at least 500 years of exciting science and development,lifesciencesfaceadditional,computer-relatedissues.Everydayand world-wide,thefieldoflifesciencesischangingintermsofsoftware,algorithms,file formatsandmuchmore.Regardingthis,togetherwiththealreadymentionedissue of handling a great amount of rapidly increasing data, it is nearly unavoidable to developandapplynew,efficient,robustandthusreliableheuristicsandalgorithms. In other words, a lot of problems in life sciences and bioinformatics have to be approached withalgorithmic solutions,presuming technical understanding and, of 1See https://www.wired.com/2013/10/big-data-biology/: Emily Singer: Biology’s Big Problem: There’sTooMuchDatatoHandle,www.wired.com,10.11.2013. v vi Preface course,biologicalexpertise.Thisappliestostudents,researchersandpracticioners tothesameextent.Andhereliestheentrypointofourbook. BookPurpose Thepurposeofthisbookistoofferyoutheoreticalknowledgeaswellaspractical adviceondiverseyetfundamentaltopicsofcomputationallifesciences.Thisencloses at least a sketch (and often much more than this) of the theoretical foundations of aspecificfieldaswellasthethereofevolvingpracticalaspects.Insomecases,this leadsuptothepresentationofan(implemented)solutionfoundforaproblemthat aroseinaspecificapplication. Hence,everychapterofthisbookiseitherofhighpracticalrelevanceorofgreat scientific interest, or both. If you are interested in Data Science and Life Sciences ingeneral,ofifyouwanttoconsolidate your knowledge inthesetopics,ofifyou are interested in applications and evolving technologies in this field or if you just needaninspirationinordertoapproachyourownproblemathand,webelievethat this book offers you a strong helping hand. Or, to make it short: We believe that students,juniorandseniorresearchersbenefitfromthisbookaswellasteachersand practitioners. BookOverview Thisbookisdividedintofiveparts.Thefirstpart,SolvingProblemsinLifeSciences: OnProgrammingLanguagesandBasicConcepts,offersfoundationsondifferent topics. Nowadays, issues being bound to a certain language are rare, hence the first chapter considers the most common programming languages used in the Life Sciences.ItcloseswithashortexplanationwhywechoseJavatoplayamainrole in most of our examples. The first chapter is followed by an introduction to the programminglanguageJava,includinginformationofusingcollaborativetoolslike git.ThisintroductionoffersaquickstartguidetoJava,andcomesinhandwiththe thirdchapter,BasicDataProcessing.Amongstothertopics,commondatastructures andtheirusageaswellasapectsoftheobject-orientiedprogrammingparadigmare introduced.ThepartcloseswiththechapterAlgorithmDesign,inwhichweconsider themodelingofrealworldproblemsaswellasfundamentalalgorithmicprinciples. Thesecondpart,DataMiningandKnowledgeDiscovery,startswithanintro- ductorychapteronthemanagementofdataandknowledge.Thisisfollowedbythe chapter on databases and how the contained information can be structured using knowledge graphs. Applied statistics and AI approaches with regard to the appli- cations in life sciences and medicine are the core elements of the third chapter. It closeswithachapteronlongitudinaldata,thatis,roughlyspeaking,datarepeatedly collectedoveranextendedperiodoftime. Preface vii ThethirdpartonDistibutedComputingandCloudsoffersinsightsoncompu- tationalgridsaswellasoncloudcomputing.Bothofthesechaptersareflankedby examples emerging from applications in the life sciences. The part closes with a chapteronstandardswhichhelptocreateinteroperablesolutionsthatarethenable tointeractwithothersolutionsusingthesamestandards. Working and doing research in the life sciences often requires knowledge in at least one of the following topics: graphs, optimiziation, image processing and sequenceanalysis.Hence,inthefourthpart,AdvancedTopicsinComputational LifeSciences,startswithachapterongraphsandhowtoimplementandusethem with Java. It is followed by a chapter in which the fields of linear programs and combinatorial optimization are illuminated. Images play a crucial role in the life sciencesandmedicine,hencewededicatedthethirdchaptertoimageprocessingand imagemanipulation.Weclosethefourthpartwithachapteronsequenceanalysis,an almostclassicalfieldintheareaoflifesciencesandastandardexamplethatshows theconnectionbetweenthedecodingofDNA-sequencesandtextminingproblems incomputerscience. Asoneofthekeyfeatures,inthelastpartofthebook,ApplicationsandEmerging Technologies,youwillalsofindsomemoresophisticatedapplicationsarisingfrom scientific projects. These do not only show interesting results but might also be helpfulguidelinesforyourownupcomingprogrammingproject. Contributors Especiallythecreationoflastpartofthebookcouldonlybeaccomplishedbythehelp ofthemanycontributorsthatauthoredthosechapters.Youwillfindtherespective authorslistedatthebeginningofeach.Further,alsointheotherpartsofthisbook youwillfindsomechapterswrittenbyadditionalcontributors.Wheneverthisisthe case,thosecontributorsarementionedexplicitlyatthebeginningoftheaccording chapter.Allthechapterswithnoexplicitlymentionedauthorsarecontributedbythe editors. Thanks to this variety of contributors and their scientific background, you willfindawiderangeofdifferentstylesofwritingandthematicfocussesthroughout this book. This leads to a rather dynamic than monotonous ductus that hopefully makes it even more fun to read. We would like to thank all these authors that all contributedasignificantpartofthisbook. Acknowledgements Weacknowledgethemanypeoplethathelpedtoimprovethisbook.Itisourpleasure to explicitly mention Christof Meigen and Dr. Wolfgang Ziegler who played an importantrole.Inaddition,wethankallstudentsfromourteachingattheUniversities of Cologne and Bonn that helped to improve the quality of several chapters, in viii Preface particular Regina Wehler, Colin Birkenbihl and Olivier Morelle. We would like to extendourthankstoSpringer—fortheirhelpandpatiencethroughthepublication processofthisbook. Thisworkisdedicatedtoourfamiliesandchildren. Cologne,Germany AlexanderApke Bonn,Germany JensDörpinghaus Bonn,Germany SebastianSchaaf Cologne,Germany VeraWeil January2021 Contents SolvingProblemsinLifeSciences:OnProgrammingLanguages andBasicConcepts InterestingProgrammingLanguagesUsedinLifeSciences ............ 3 ChristofMeigen IntroductiontoJava ............................................... 21 JensDörpinghaus,VeraWeil,SebastianSchaaf,andAlexanderApke BasicDataProcessing .............................................. 55 JensDörpinghaus,VeraWeil,SebastianSchaaf,andAlexanderApke AlgorithmDesign .................................................. 79 AlexanderApke,VeraWeil,JensDörpinghaus,andSebastianSchaaf DataMiningandKnowledgeDiscovery DataandKnowledgeManagement .................................. 101 ChristofMeigen,JensDörpinghaus,VeraWeil,SebastianSchaaf, andAlexanderApke DatabasesandKnowledgeGraphs .................................. 121 TobiasHübenthal KnowledgeDiscoveryandAIApproachesfortheLifeSciences ........ 183 AlexanderApke,VeraWeil,JensDörpinghaus,andSebastianSchaaf LongitudinalData ................................................. 231 ChristofMeigen DistributedComputingandClouds ComputationalGrids .............................................. 247 WolfgangZiegler ix

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.