ebook img

Distributed Approach for Peptide Identification PDF

84 Pages·2016·2.31 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Distributed Approach for Peptide Identification

Western Kentucky University TopSCHOLAR® Masters Theses & Specialist Projects Graduate School Fall 2015 Distributed Approach for Peptide Identification Naga V K Abhinav Vedanbhatla Western Kentucky University, [email protected] Follow this and additional works at:http://digitalcommons.wku.edu/theses Part of theAnalytical Chemistry Commons,Computer Engineering Commons, and the Computer Sciences Commons Recommended Citation Vedanbhatla, Naga V K Abhinav, "Distributed Approach for Peptide Identification" (2015).Masters Theses & Specialist Projects.Paper 1546. http://digitalcommons.wku.edu/theses/1546 This Thesis is brought to you for free and open access by TopSCHOLAR®. It has been accepted for inclusion in Masters Theses & Specialist Projects by an authorized administrator of TopSCHOLAR®. For more information, please contact [email protected]. DISTRIBUTEDAPPROACHFORPEPTIDEIDENTIFICATION AThesis Presentedto TheFacultyoftheDepartmentofComputerScience WesternKentuckyUniversity BowlingGreen,Kentucky InPartialFulfillment OftheRequirementsfortheDegree MasterofScience By NagaVenkataKrishnaAbhinavVedanbhatla December2015 DEDICATION Thisthesisisgratefullydedicatedtomyparentsandallmyfriendswhobelievedinme. ACKNOWLEDGMENTS I would like to extend my sincere gratitude to my committee chair, Dr. Zhonghang Xiaforprovidingmethisopportunity. Withouthisguidanceandpersistenthelp,thisthesis wouldnothavebeenpossible. Ithasbeenagreatpleasuretoworkalongsidehimbothasa graduateassistantandasastudent. I would also like to acknowledge Dr. Rong Yang. I am very thankful for the way that she inspired and motivated me. I appreciate Dr. Yang taking the time to review this document. ourdiscussionsandherperspectivewascrucialincompletingthisresearch. I would like to thank Dr. Qi Li who accepted my request to be part of my thesis committee. Iamextremelygratefulforhisassistanceandsuggestionsthroughtmyresearch and the coursework. I would also like to extend my sincere gratitude to Western Kentucky Universityforprovidingmeafriendlyenvironmenttopursuemymastersdegree. I would also like to thank my friends Travis Brummett and Harinivesh Donepudi for helping me in revising this document. I would like to extend my appreciation to my friends Sai Sandeep Jagarlamudi and Kavya Madugula for supporting me during my hard times. Finally, I would like to extend my gratitude to my parents who has continuously supportedmeandbelievedinmeeachandeverymomentofmylife. iv CONTENTS DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv LISTOFTABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii LISTOFFIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 ProteinsandPeptides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 PeptideandProteinIdentification . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 C-Ranker,PeptideIdentificationandTime . . . . . . . . . . . . . . . . . . 5 1.4 ResearchObjectivesandMethodology . . . . . . . . . . . . . . . . . . . . 7 1.5 Objectivesandplanofthesis . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.6 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.7 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1 MassSpectrometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 TandemMassSpectrometry . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 PostDatabaseSearchMethods . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.1 PeptideProphet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 v 2.3.2 Percolator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.3 C-Ranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4 DistributedSystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4.1 CentralizedsystemVsDistributedSystem . . . . . . . . . . . . . . 18 2.5 WhynotdistributedframeworkslikeHadoop? . . . . . . . . . . . . . . . . 19 2.6 DistributedframeworkeffectsonC-Ranker . . . . . . . . . . . . . . . . . 19 2.7 JavaforDistributedSystems . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.8 DistributedSystemsinnetworkingpoint-of-view . . . . . . . . . . . . . . 21 3 ENVIRONMENTSETUPANDDESIGN . . . . . . . . . . . . . . . . . . . . . 23 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 JavaRuntimeEnviroment . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.1 InstallingJavaRuntimeEnvironment . . . . . . . . . . . . . . . . 24 3.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4 DataFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4.1 DataFlowforSingleC-Ranker . . . . . . . . . . . . . . . . . . . 27 3.4.2 DataFlowforDistributedC-Ranker . . . . . . . . . . . . . . . . . 28 4 INFRASTRUCTURESETUPANDEXECUTION . . . . . . . . . . . . . . . . 32 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2 InfrastructureSetup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3 ExecutingC-Ranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5 RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.1 AnalysisofResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 vi 5.1.1 MemoryUsage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2 ComparisonwithC-RankerinanApacheHadoopFramework . . . . . . . 47 6 CONCLUSIONANDFUTUREWORK . . . . . . . . . . . . . . . . . . . . . . 51 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 A SOURCECODECOMMENTARY . . . . . . . . . . . . . . . . . . . . . . . . . 58 A.A ThecalcExcelMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 A.B ExcelDividerdriver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 A.C ThemergeMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 A.D TheserveMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 vii LISTOFTABLES 5.1 InputDataSetsforObservations . . . . . . . . . . . . . . . . . . . . . . . 40 5.2 HardwareUsedtoObserveResults . . . . . . . . . . . . . . . . . . . . . . 41 5.3 Comparison of C-Ranker on distributed approach with C-Ranker on an ApacheHadoopFrameworkCluster1 . . . . . . . . . . . . . . . . . . . . . 49 5.4 Upgradedhardwaretocomparewithcluster2Hadoop . . . . . . . . . . . . 49 5.5 Comparison of C-Ranker on distributed approach with C-Ranker on an ApacheHadoopFrameworkCluster2 . . . . . . . . . . . . . . . . . . . . 49 5.6 CostCalculationofApacheHadoopCluster1andCluster2 . . . . . . . . 50 viii LISTOFFIGURES 2.1 DistributedSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 LANDesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Peer-to-PeerNetwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1 InstallingJavaCompilerRuntime(InitialScreen) . . . . . . . . . . . . . . 24 3.2 InstallingJavaCompilerRuntime(LicenseandUserAgreement) . . . . . . 25 3.3 InstallingJavaCompilerRuntime(ConfirmInstallationSettings) . . . . . . 25 3.4 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5 DataFlowDetailsintheOriginalSingle-ThreadedC-Ranker . . . . . . . . 28 3.6 DataFlowDetailsinDistributedC-Ranker(Dividing) . . . . . . . . . . . . 29 3.7 DataFlowForaSingleWorkerHost . . . . . . . . . . . . . . . . . . . . . 30 3.8 DataFlowDetailsinDistributedC-Ranker(Merging) . . . . . . . . . . . . 31 4.1 DownloadApacheTomcat . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 ApacheTomcatExtractedinC:Drive . . . . . . . . . . . . . . . . . . . . 34 4.3 ApacheTomcatServer|startup.bat| . . . . . . . . . . . . . . . . . . . . . . 34 4.4 ApacheTomcatServerRunning . . . . . . . . . . . . . . . . . . . . . . . 35 4.5 ROOT.warfileintheApacheTomcatDirectory . . . . . . . . . . . . . . . 36 4.6 ServerRunner.jarFileMovedtoHomeDirectory . . . . . . . . . . . . . . 36 4.7 CrankerPropertiesFile . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.8 ServerRunner.jarRunningandAskingforAPortNumber . . . . . . . . . . 38 ix

Description:
Vedanbhatla, Naga V K Abhinav, "Distributed Approach for Peptide Identification" (2015). Masters Theses .. Hadoop Framework for peptide identification with respect to the cost, execution times and flexibility . SEQUEST produces a list of matches for the unknown peptide [Käll, Storey, Mac-. Coss, a
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.