Parallelization of DNA alignment algorithms using GPUs Gustavo Pa´ramos Merino Faria Encarnac¸a˜o Dissertac¸a˜o para obtenc¸a˜o do Grau de Mestre em Engenharia Informa´tica e de Computadores Ju´ri Presidente: Doutor Joaquim Armando Pires Jorge Orientador: Doutor Nuno Filipe Valentim Roma Vogais: Doutor Luis Manuel Silveira Russo Maio de 2012 Acknowledgments Iwouldliketo thank all myfriendsand familywho, everydayin these pasttwoyears, through peer-pressureandconstantlyquestioningmewhenIwouldfinallydelivermythesisandfinishmy degree,gavememotivationtocontinueandtonotgiveup. I would also like to thank Professor Nuno Roma and Nuno Sebastia˜o for their patience, sup- port,adviceandideaswithoutwhichthisthesiswouldnothavehappened. This thesis was performed in the scope of project “HELIX: Heterogeneous Multi-Core Archi- tectureforBiologicalSequenceAnalysis”,fundedbythePortugueseFoundationforScienceand Technology (FCT) with reference PTDC/EEA-ELC/113999/2009, and project “TAGS: The power oftheshort-ToolsandAlgorithmsfornextGenerationSequencingapplications”,fundedbyFCT withreferencePTDC/EIA-EIA/112283/2009. Abstract Since its discovery, the Deoxyribonucleic Acid (DNA) has been the object of thorough study. Thesignificantadvancesinsequencingtechnologies[1]haveallowedresearcherstoobtainDNA sequencesateverincreasingrates,withtheconsequentgrowthofthecorrespondingdatabases. Such huge amounts of data require efficient processing tools to extract useful information. How- ever,thesetoolshaveadvancedataslowerpaceandhavebecomeoneofthelimitingfactorsof newdiscoveriesinthisfieldofresearch. Recently, from the 3D game market, a new generation of hardware has emerged. This hard- ware, known as Graphics Processing Units (GPUs), now offers the capability to perform general purpose computations. With these new hardware devices, high performance computing has be- come possible using cheap and readily available hardware which creates new opportunities to studyandimprovecurrenttoolsandalgorithmsusedforthestudyofDNA. The goal of this dissertation was to discover a way of using these new processing platforms to improve the current tools, techniques and algorithms used for DNA analysis. To attain this goal, several data structures and algorithms were considered and thoroughly analysed in terms ofperformanceandflexibilitywhenexecutedonaGPU. Afterwards,thesestructureswereusedin theimplementationofbothexactandapproximatestringmatchingalgorithmsespeciallyadapted forDNAsequences. The outcome of the performed work was the development of a new tool for heuristic DNA alignmentthatiscapableofharnessingtherawcomputationalperformanceprovidedbythehighly parallelGPUdevice. Conductedtestsshowedthattheperformedworkledtothecreationofatoolthatiscapableof competingwithcurrentlyavailableandcommonlyusedsoftwareinbothperformanceandquality ofresults. Keywords ExactStringMatching,ApproximateStringMatching,DNAAlignment,High-PerformanceCom- puting,GraphicsProcessingUnit(GPU) iii Resumo Desdeasuadescoberta,oDNAtemsidoobjectotheestudointensivo. Osavanc¸ossignifica- tivosemtecnologiasdesequenciac¸a˜o[1]teˆmpermitidoobtersequeˆnciasdeDNAaritmoscada vez maiores, com o consequente crescimento das bases de dados correspondentes. Tamanha quantidade de dados requer ferramentas de processamento eficientes para extrair informac¸a˜o u´til. No entanto, estas ferramentasteˆm avanc¸ado a um ritmo mais lento e tornaram-senum dos factoreslimitativosdenovasdescobertasnestecampodepesquisa. Recentemente, impulsionada pelo mercado do entretenimento e dos jogos 3D, surgiu uma novagerac¸a˜odeprocessadores. Estesprocessadores,conhecidoscomoprocessadoresgra´ficos (GPU), sa˜o capazes de executar na˜o so´ as transformac¸o˜es geome´tricas necessa´rias em jogos mas tambe´m programas gene´ricos de alguma complexidade. Com estes novos processadores, computac¸a˜o de alto desempenho tornou-se possivel de realizar em plataformas baratas e de acesso fa´cil o que cria novas oportunidades para o estudo e melhoramento das ferramentas e te´cnicasparaoestudodoADNactuais. Oobjectivodestadissertac¸a˜ofoiodedescobrirumaformadeaproveitarestasnovasplatafor- mas para melhorar as ferramentas para estudo do ADN actuais. Para atingir este objectivo di- versas estruturas de representac¸a˜o de dados e algoritmos foram estudadas e cuidadosamente analisadasemtermosdesempenhoeflexibilidadequandoexecutadosnoGPU.Posteriormente, estasestruturasforamutilizadas,naimplementac¸a˜odealgoritmosparaprocuraexactaeaproxi- madadecadeiasdecaracteres,especialmenteadaptadosparasequeˆnciasdeDNA. O resultado do trabalho realizado foi o desenvolvimento de uma nova ferramenta heuristica paraalinhamentodeDNA,capazdeaproveitaropodercomputacionalfornecidopelosGPUs. Ostestesrealizadosdemonstramqueotrabalhorealizadolevou criac¸a˜odeumaferramenta capaz de competir com as ferramentas actuais, tanto em termos da qualidade dos resultados comonavelocidadeaqueestesresultadossa˜oobtidos. Palavras Chave Procura Exacta, Procura Aproximada, Alinhamento de DNA, Computac¸a˜o de Alto Desem- penho,UnidadedeProcessamentoGra´fico v Contents 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Maincontributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Dissertationoutline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 GeneralPurposeComputationonGraphicsProcessingUnits 5 2.1 GPGPUhardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 NVidiaG80andGF100architectures . . . . . . . . . . . . . . . . . . . . . 6 2.1.2 AMDFireStreamarchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 GPGPUperformanceissues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 WarpdivergenceandSIMTparadigm . . . . . . . . . . . . . . . . . . . . . 8 2.2.2 Memoryaccess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 GPGPUprogramminglanguages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.1 NVidiaCUDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2 ATIFireStreamSDK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.3 OpenCL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.4 MicrosoftDirectCompute . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 DNAAlignmentMethods 15 3.1 Localandglobalalignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 DynamicprogrammingDNAalignment . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.1 DNAanalysisusingstrings . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.2 Needleman-Wunsch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2.3 Smith-Waterman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 Indexedsearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.1 Suffixtriesandsuffixtrees . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.2 Suffixarrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.3 Hashtablesandq-mers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3.4 Burrows-Wheelertransform . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4 SummaryofDNAAnalysistools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4 ExactMatching 27 4.1 Datarepresentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1.1 Referencesequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1.2 Querysequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 vii Contents 4.2 GPUInitialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2.1 Memoryallocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2.2 Hosttodevicedatacopy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.3 Kernellaunch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.4 Memoryde-allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.2.5 TimemeasurementandGPUevents . . . . . . . . . . . . . . . . . . . . . . 36 4.3 ExactmatchingontheGPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.1 SuffixTreeKernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.2 SuffixArrayKernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.4 Comparisonanddiscussionofdifferentapproaches . . . . . . . . . . . . . . . . . 41 4.4.1 Overlapofcommunicationandcomputation . . . . . . . . . . . . . . . . . . 44 5 ApproximateMatching 47 5.1 Possibleapproachesforapproximatematching . . . . . . . . . . . . . . . . . . . . 48 5.1.1 Suffixarrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.1.2 Enhancedsuffixarray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.1.3 Suffixtrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.2 DiscussionandComparisonoftheconsideredapproaches . . . . . . . . . . . . . 61 6 DNAalignmentusingGPUs 63 6.1 Seeding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.1.1 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.1.2 SeedingByCounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.1.3 SeedingBySorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.2 Smith-Waterman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.2.1 Kernelmodifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.2.2 GPGPUKernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.3 Resultsandcomparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.3.1 Parameterinfluence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.3.2 SeedingbycountingversusSeedingbysorting . . . . . . . . . . . . . . . . 72 6.3.3 ComparisonwithBLAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.3.4 ComparisonwithMUMmer . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 7 Conclusions 75 7.1 Futurework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 viii
Description: