Western Kentucky University TopSCHOLAR® Masters Theses & Specialist Projects Graduate School Fall 2015 Distributed Approach for Peptide Identification Naga V K Abhinav Vedanbhatla Western Kentucky University, [email protected] Follow this and additional works at:http://digitalcommons.wku.edu/theses Part of theAnalytical Chemistry Commons,Computer Engineering Commons, and the Computer Sciences Commons Recommended Citation Vedanbhatla, Naga V K Abhinav, "Distributed Approach for Peptide Identification" (2015).Masters Theses & Specialist Projects.Paper 1546. http://digitalcommons.wku.edu/theses/1546 This Thesis is brought to you for free and open access by TopSCHOLAR®. It has been accepted for inclusion in Masters Theses & Specialist Projects by an authorized administrator of TopSCHOLAR®. For more information, please contact [email protected]. DISTRIBUTEDAPPROACHFORPEPTIDEIDENTIFICATION AThesis Presentedto TheFacultyoftheDepartmentofComputerScience WesternKentuckyUniversity BowlingGreen,Kentucky InPartialFulfillment OftheRequirementsfortheDegree MasterofScience By NagaVenkataKrishnaAbhinavVedanbhatla December2015 DEDICATION Thisthesisisgratefullydedicatedtomyparentsandallmyfriendswhobelievedinme. ACKNOWLEDGMENTS I would like to extend my sincere gratitude to my committee chair, Dr. Zhonghang Xiaforprovidingmethisopportunity. Withouthisguidanceandpersistenthelp,thisthesis wouldnothavebeenpossible. Ithasbeenagreatpleasuretoworkalongsidehimbothasa graduateassistantandasastudent. I would also like to acknowledge Dr. Rong Yang. I am very thankful for the way that she inspired and motivated me. I appreciate Dr. Yang taking the time to review this document. ourdiscussionsandherperspectivewascrucialincompletingthisresearch. I would like to thank Dr. Qi Li who accepted my request to be part of my thesis committee. Iamextremelygratefulforhisassistanceandsuggestionsthroughtmyresearch and the coursework. I would also like to extend my sincere gratitude to Western Kentucky Universityforprovidingmeafriendlyenvironmenttopursuemymastersdegree. I would also like to thank my friends Travis Brummett and Harinivesh Donepudi for helping me in revising this document. I would like to extend my appreciation to my friends Sai Sandeep Jagarlamudi and Kavya Madugula for supporting me during my hard times. Finally, I would like to extend my gratitude to my parents who has continuously supportedmeandbelievedinmeeachandeverymomentofmylife. iv CONTENTS DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv LISTOFTABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii LISTOFFIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 ProteinsandPeptides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 PeptideandProteinIdentification . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 C-Ranker,PeptideIdentificationandTime . . . . . . . . . . . . . . . . . . 5 1.4 ResearchObjectivesandMethodology . . . . . . . . . . . . . . . . . . . . 7 1.5 Objectivesandplanofthesis . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.6 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.7 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1 MassSpectrometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 TandemMassSpectrometry . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 PostDatabaseSearchMethods . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.1 PeptideProphet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 v 2.3.2 Percolator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.3 C-Ranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4 DistributedSystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4.1 CentralizedsystemVsDistributedSystem . . . . . . . . . . . . . . 18 2.5 WhynotdistributedframeworkslikeHadoop? . . . . . . . . . . . . . . . . 19 2.6 DistributedframeworkeffectsonC-Ranker . . . . . . . . . . . . . . . . . 19 2.7 JavaforDistributedSystems . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.8 DistributedSystemsinnetworkingpoint-of-view . . . . . . . . . . . . . . 21 3 ENVIRONMENTSETUPANDDESIGN . . . . . . . . . . . . . . . . . . . . . 23 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 JavaRuntimeEnviroment . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.1 InstallingJavaRuntimeEnvironment . . . . . . . . . . . . . . . . 24 3.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4 DataFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4.1 DataFlowforSingleC-Ranker . . . . . . . . . . . . . . . . . . . 27 3.4.2 DataFlowforDistributedC-Ranker . . . . . . . . . . . . . . . . . 28 4 INFRASTRUCTURESETUPANDEXECUTION . . . . . . . . . . . . . . . . 32 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2 InfrastructureSetup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3 ExecutingC-Ranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5 RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.1 AnalysisofResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 vi 5.1.1 MemoryUsage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2 ComparisonwithC-RankerinanApacheHadoopFramework . . . . . . . 47 6 CONCLUSIONANDFUTUREWORK . . . . . . . . . . . . . . . . . . . . . . 51 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 A SOURCECODECOMMENTARY . . . . . . . . . . . . . . . . . . . . . . . . . 58 A.A ThecalcExcelMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 A.B ExcelDividerdriver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 A.C ThemergeMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 A.D TheserveMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 vii LISTOFTABLES 5.1 InputDataSetsforObservations . . . . . . . . . . . . . . . . . . . . . . . 40 5.2 HardwareUsedtoObserveResults . . . . . . . . . . . . . . . . . . . . . . 41 5.3 Comparison of C-Ranker on distributed approach with C-Ranker on an ApacheHadoopFrameworkCluster1 . . . . . . . . . . . . . . . . . . . . . 49 5.4 Upgradedhardwaretocomparewithcluster2Hadoop . . . . . . . . . . . . 49 5.5 Comparison of C-Ranker on distributed approach with C-Ranker on an ApacheHadoopFrameworkCluster2 . . . . . . . . . . . . . . . . . . . . 49 5.6 CostCalculationofApacheHadoopCluster1andCluster2 . . . . . . . . 50 viii LISTOFFIGURES 2.1 DistributedSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 LANDesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Peer-to-PeerNetwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1 InstallingJavaCompilerRuntime(InitialScreen) . . . . . . . . . . . . . . 24 3.2 InstallingJavaCompilerRuntime(LicenseandUserAgreement) . . . . . . 25 3.3 InstallingJavaCompilerRuntime(ConfirmInstallationSettings) . . . . . . 25 3.4 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5 DataFlowDetailsintheOriginalSingle-ThreadedC-Ranker . . . . . . . . 28 3.6 DataFlowDetailsinDistributedC-Ranker(Dividing) . . . . . . . . . . . . 29 3.7 DataFlowForaSingleWorkerHost . . . . . . . . . . . . . . . . . . . . . 30 3.8 DataFlowDetailsinDistributedC-Ranker(Merging) . . . . . . . . . . . . 31 4.1 DownloadApacheTomcat . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 ApacheTomcatExtractedinC:Drive . . . . . . . . . . . . . . . . . . . . 34 4.3 ApacheTomcatServer|startup.bat| . . . . . . . . . . . . . . . . . . . . . . 34 4.4 ApacheTomcatServerRunning . . . . . . . . . . . . . . . . . . . . . . . 35 4.5 ROOT.warfileintheApacheTomcatDirectory . . . . . . . . . . . . . . . 36 4.6 ServerRunner.jarFileMovedtoHomeDirectory . . . . . . . . . . . . . . 36 4.7 CrankerPropertiesFile . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.8 ServerRunner.jarRunningandAskingforAPortNumber . . . . . . . . . . 38 ix
Description: