ebook img

Translational Biomedical Informatics: A Precision Medicine Perspective PDF

331 Pages·2016·6.277 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Translational Biomedical Informatics: A Precision Medicine Perspective

Advances in Experimental Medicine and Biology 939 Bairong Shen Haixu Tang Xiaoqian Jiang Editors Translational Biomedical Informatics A Precision Medicine Perspective Advances in Experimental Medicine and Biology Volume 939 Editorial Board IRUN R. COHEN, The Weizmann Institute of Science, Rehovot, Israel N.S. ABEL LAJTHA, Kline Institute for Psychiatric Research, Orangeburg, NY, USA JOHN D. LAMBRIS, University of Pennsylvania, Philadelphia, PA, USA RODOLFO PAOLETTI, University of Milan, Milan, Italy More information about this series at http://www.springer.com/series/5584 Bairong Shen (cid:129) Haixu Tang (cid:129) Xiaoqian Jiang Editors Translational Biomedical Informatics A Precision Medicine Perspective Editors BairongShen HaixuTang CenterforSystemsBiology SchoolofInformaticsandComputing SoochowUniversity IndianaUniversity Jiangsu,China Bloomington,IA,USA XiaoqianJiang DepartmentofBiomedicalInformatics UniversityofCaliforniaSanDiego LaJolla,CA,USA ISSN0065-2598 ISSN2214-8019 (electronic) AdvancesinExperimentalMedicineandBiology ISBN978-981-10-1502-1 ISBN978-981-10-1503-8 (eBook) DOI10.1007/978-981-10-1503-8 LibraryofCongressControlNumber:2016957788 ©SpringerScience+BusinessMediaSingapore2016 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilarmethodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthis book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained hereinorforanyerrorsoromissionsthatmayhavebeenmade. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerScience+BusinessMediaSingaporePteLtd. Contents 1 NGSforSequenceVariants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 ShaoleiTeng 2 RNABioinformaticsforPrecisionMedicine. . . . . . . . . . . . . . . . . . 21 JiajiaChenandBairongShen 3 ExploringHumanDiseasesandBiologicalMechanisms byProteinStructurePredictionandModeling. . . . . . . . . . . . . . . . 39 JuexinWang,JosephLuttrellIV,NingZhang,SaadKhan, NianQingShi,MichaelX.Wang,Jing-QiongKang, ZhengWang,andDongXu 4 ComputationalMethodsinMassSpectrometry- BasedProteomics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 SujunLiandHaixuTang 5 InformaticsforMetabolomics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 KanthidaKusonmano,WanwipaVongsangnak, andPramoteChumnanpuen 6 MetagenomicsandSingle-CellOmicsDataAnalysisforHuman MicrobiomeResearch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 MaozhenHan,PengshuoYang,HaoZhou,HongjunLi, andKangNing 7 TextMiningforPrecisionMedicine:BringingStructuretoEHRs andBiomedicalLiteraturetoUnderstandGenesandHealth. . . . . . . 139 MichaelSimmons,AyushSinghal,andZhiyongLu 8 MedicalImagingInformatics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 WilliamHsu,SuzieEl-Saden,andRickyK.Taira 9 LIMSandClinicalDataManagement. . . . . . . . . . . . . . . . . . . . . . 225 YalanChen,YuxinLin,XuyeYuan,andBairongShen v vi Contents 10 BiobanksandTheirClinicalApplicationandInformatics Challenges. . . . .. . . . . . . . .. . . . . . . . . .. . . . . . . . .. . . . . . . . . .. 241 LanYang,YalanChen,ChunjiangYu,andBairongShen 11 XML,Ontologies,andTheirClinicalApplications. . . . . . . . . . . . . 259 ChunjiangYuandBairongShen 12 BayesianComputationMethodsforInferringRegulatory NetworkModelsUsingBiomedicalData. . . . . . . . . . . . . . . . . . . . . 289 TianhaiTian 13 Network-BasedBiomedicalDataAnalysis. . .. . . .. . . .. . . .. . . .. 309 YuxinLin,XuyeYuan,andBairongShen Chapter 1 NGS for Sequence Variants ShaoleiTeng Abstract Recent technological advances in next-generation sequencing (NGS) provideunprecedentedpowertosequencepersonalgenomes,characterizegenomic landscapes, and detect a large number of sequence variants. The discovery of disease-causing variants in patients’ genomes has dramatically changed our per- spective on precision medicine. This chapter provides an overview of sequence variant detection and analysis in NGS study. We outline the general methods for identifyingdifferenttypesofsequencevariantsfromNGSdata.Wesummarizethe common approachesfor analyzingand visualizingcasualvariants associated with complexdiseasesonprecisionmedicineinformatics. Keywords Sequence variants (cid:129) Next-generation sequencing (cid:129) Sequence alignment (cid:129) Variant calling (cid:129) Association testing (cid:129) Visualization (cid:129) Precision medicineinformatics 1.1 Introduction Overthelastdecade,next-generationsequencing(NGS)hasdramaticallychanged the precision medicine field by characterizing patients’ genomic landscapes and identifying the casual variants associated with human diseases. The Sanger-based sequencing [48] (“first-generation sequencing”) was used to sequence the first human reference genome for the Human Genome Project [3], which took 13yearstofinish thedraftgenome at atotal costof$3 billion. NGS technologies makethesequencingatremarkablepriceandunprecedentedspeedbycarryingout hundredsofmillionsofsequencingreactionsatonce[52,57].Withtherevolution- arytechnology,wecansequencethousandsofgenomesinjust1month,addressthe biological questions at a large scale, identify the genetic risk factors for human diseases,andprovideamoreprecisewaytohealthcare[24].Inparticular,NGScan beusedtodetectalargenumberofsequencevariantsinthepatients’genomesand identifythecasualvariantsassociatedwithhumandiseases,whichhasdramatically S.Teng(*) DepartmentofBiology,HowardUniversity,Washington,DC20059,USA e-mail:[email protected] ©SpringerScience+BusinessMediaSingapore2016 1 B.Shenetal.(eds.),TranslationalBiomedicalInformatics,Advancesin ExperimentalMedicineandBiology939,DOI10.1007/978-981-10-1503-8_1 2 S.Teng changed our perspective on genetic variants, human diseases, and precision medicine. Discoveryofcasual sequence variants associatedwith certain traits ordiseases hasbecomeafundamentalaimofgeneticsandbiomedicalresearch.Thesequence variantscanbeclassifiedtosinglenucleotidevariants(SNVs),smallinsertionsand deletions(INDELs),andlargestructuralvariants(SVs)basedontheirsequencesin length. SNVs, the mostcommon type of sequence variants, are single DNA base- pairdifferencesinindividuals.INDELsaredefinedassmallDNApolymorphisms includingbothinsertionsanddeletionsrangingfrom1to50bpinlength.SVsare largegenomicalterations(>50bp)includingunbalancedvariants(deletions,inser- tions,orduplications)andbalancedchanges(translocationsandinversions).Copy numbervariants(CNVs),alargecategoryofunbalancedSVs,areDNAalterations thatresultintheabnormalnumberofcopiesofparticularDNAsegments.Somatic mutationsaretumor-specificvariantsincancer–normalsamplepairs.Thedifferent types of sequence variants play important roles in the development of human complex diseases. For example, the SNVs associated with major depression were found in the genes encoding serotonin transporter, serotonin receptor, catechol-o- methyltransferase, tryptophan hydroxylase, and tyrosine hydroxylase [29]. These sequence variants can influence the neurotransmitter functions in multiple ways including changing gene expression level, altering substrate binding affinity, or affectingtransportkinetics[19].Abalancedt(1;11)(q42.1;q14.3)translocationin disrupted in schizophrenia 1 (DISC1) gene was discovered in a large Scottish family highly burdened for severe mental illnesses, and the family members with the translocation showed a reduced P300 event-related potential associated with schizophrenia[9].Identifyingthecasualvariantsandtheirclinicaleffectsprovides important insight to understand the roles of sequence variants in the causation of humandiseases. Discovery of disease-causing variants from a large number of sequence poly- morphisms detected from NGS data is a major challenge in precision medicine. Bioinformaticsandstatisticalmethodshavebeendevelopedfordetectingsequence variantsandidentifyingdisease-relatedcasualvariants.Theschematicdiagramof NGS variant analysison precision medicine informatics is shown inFig.1.1. The DNAsamplesareextractedfrompatients(ornormalindividuals)andsequencedon NGS platforms. The billions of short sequence reads are produced by the sequencers, and sequence information is stored in FASTQ format files. From here, NGS variant analysis falls into two major frameworks. The first framework is the variant detection. The high-quality sequence reads passed quality control (QC)filtersarealignedtoareferencegenome,andthesequencealignmentdatais depositedinSAM/BAMformatfiles.Severalvariantdetectiontoolsareusedtocall small variants including SNVs and INDELs. The somatic mutation callers are applied to tumor–normal patient samples. Multiple SV callers are developed to detectlargestructuralvariants.Thevariantscalledfromthesetoolscanbestoredin Variant Call Format (VCF) files or BED format files. The next framework is the variant analysis. The annotation tools are used to predict the functional effects of coding and regulatory variants.Theassociation analysis can identifythe common 1 NGSforVariants 3 Fig.1.1 AflowchartofNGSvariantanalysisinprecisionmedicine and rare variants associated with certain diseases or traits. Visualization tools are used to view the small and large candidate sequence variants. By combining numerous analyzing tools, the causal variants can be identified and connected with clinical information for precision medicine research. On the one hand, disease-related causal variants provide the genetic biomarkers for diagnostics of complex diseases. On the other hand, the candidate variants offer the targets for developing more precise treatments and drugs for patients. In the following sec- tions, we will review the bioinformatics approaches and provide a guide for detectingandanalyzingthesequencevariantsfromNGSdata. 1.2 Variant Detection Variantdetectionconsistsofqualitycontrol(QC),sequencealignment,andvariant calling. The raw data contains a large number of short reads generated by NGS sequencers. Preprocessing and post-processing QC are carried out to remove the

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.