BIOINFORMATICS APPLICATIONS NOTE Vol.29no.62013,pages797–798 doi:10.1093/bioinformatics/btt013 Sequence analysis AdvanceAccesspublicationJanuary29,2013 ScalaBLAST 2.0: rapid and robust BLAST calculations on multiprocessor systems Christopher S. Oehmen* and Douglas J. Baxter Pacific Northwest National Laboratory, Richland, WA 99352, USA AssociateEditor:AlfonsoValencia ABSTRACT dynamictaskmanagementschemethat(ii)doesnotrequirepre- Motivation: BLAST remains one of the most widely used tools in formatting. This technique allows processors to obtain work computationalbiology.Therateatwhichnewsequencedataisavail- units independently and at run-time based on their availability. ablecontinuestogrow exponentially,drivingthe emergenceofnew Thisisa highlytolerantandfault-resilientapproachthatensures fieldsofbiologicalresearch.Atthesametime,multicoresystemsand that all processors are doing as close as possible to the same conventional clusters are more accessible. ScalaBLAST has been amount of workthroughout a calculation.Inaddition, thisim- designedtorunonconventionalmultiprocessorsystemswithaneye plementationallowsforcontinuedoperationeveninthepresence to extreme parallelism, enabling parallel BLAST calculations using of processor or other system failures. This is critical for all 416000 processing cores with a portable, robust, fault-resilient large-scale calculations and is independent of the code being design that introduces little to no overhead with respect to serial run because the longer the run and the larger the system, the BLAST. more likely one is to encounter a component failure during a Availability:ScalaBLAST2.0sourcecodecanbefreelydownloaded calculation. As the expected run-time increases, the likelihood fromhttp://omics.pnl.gov/software/ScalaBLAST.php. ofsuccessfullycompletingthecalculationbeforethenextfailure Contact:[email protected] tends to zero. We demonstrate near-ideal scaling using ScalaBLAST 2.0 calculations to machine capacity on a Linux ReceivedonMay9,2012;revisedonDecember20,2012;accepted clusterhaving418000computecoresevenduringprocessfailure onJanuary8,2013 events. ScalaBLAST 2.0 can be downloaded freely from http:// omics.pnl.gov/software/ScalaBLAST.php. 1 INTRODUCTION Genomeand proteinsequenceanalysis using BLAST continues to be among the most used tools for computational bioinfor- 2 METHODS matics. The continued exponential growth in throughput of ScalaBLAST2.0isimplementedusingtheNCBIBLASTCtoolkitdis- sequencing platforms has continued to drive the need for tributionversion2.2.13.Thisisseveralyearsold,butitisverystable,and ever-expanding capacity for BLAST (Altschul et al., 1990) cal- we have found that large-scale sequence analysis centers prefer such culations to support genome annotation, functional predictions stableversions.ScalaBLAST2.0supportsthefivebasicBLASTcalcula- and a host of other foundational analysis for sequence data. tiontypes—blastn,blastp,tblastn,tblastxandblastxandthreedifferent Parallel BLAST accelerators have been implemented in the output formats (standard pairwise, tabular and tabular with headers). past including mpiBLAST (Darling et al., 2003) and ThenextmajorreleaseofScalaBLASTwillincludeourownimplemen- tationoftheBLASTalgorithmandwillnotusetheNCBItoolkit. ScalaBLAST 1.0 (Oehmen and Nieplocha, 2006). Parallel ScalaBLAST 2.0 depends only on message passing interface (MPI) BLAST drivers accelerate large lists of BLAST calculations library,whichcanbedownloadedfreely.TasksinScalaBLAST2.0are using multiprocessor systems. ScalaBLAST 1.0 used a hybrid managedbyadynamictaskscheduler.Eachqueryisconsideredtobean parallelization scheme in which the sequence list was statically independenttaskandisprocessedbyasinglecomputecore.Eachtask partitioned among processor pairs (process groups). Process containsthe querysequence andthe whole target database. At thebe- groups performed independent BLAST calculations simultan- ginningoftherun,asinglemanagerprocessisselectedtocontrolwhich eously, gaining a degree of speedup on the overall calculation processes receive which tasks for the duration of the computation. in proportion to the number of process groups used in the cal- Depending on user-configurable parameters (in the sb_params.in file), culation.ThemainlimitationofScalaBLAST1.0wastheuseof themanagerwillhavesomenumberofsub-managers.Eachsub-manager staticdatapartitioningthatdidnothavefault-resilienceproper- will in turn have some number of worker nodes. Each collection of ties.By contrast,the mainlimitationofmpiBLAST isthe need sub-managerandworkernodesisreferredtoasataskgroup. forpre-formattingdatasetstoachieveoptimizedrun-time,some- AtthebeginningofeachScalaBLASTjob,filesaredistributedacross nodesatthestartofacalculationbythemanager.Userscansetinthe timesrequiringrepeatedattemptsonthesamedatasettofindthe sb_params.infiletherelationshipbetweenprocessingelementsandtheir rightpre-formattingconfiguration. underlying file system independently from the task group configuration. WehaveaddressedtheselimitationsinScalaBLAST2.0by(i) The task group defines how many workers are associated with each re-implementing the task scheduling layer by introduction of a sub-manager. The notion of how to distribute the files is governed by thediskgroup.Thediskgroupisusedtomaphowmanycomputecores *Towhomcorrespondenceshouldbeaddressed. share a common file system. This control is used to support storing (cid:2)TheAuthor2013.PublishedbyOxfordUniversityPress. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), whichpermitsunrestrictedreuse,distribution,andreproductioninanymedium,providedtheoriginalworkisproperlycited. C.S.OehmenandD.J.Baxter outputandinputongloballymountedorlocalfilesystemsorcombin- ationsofboth. Afterfiledistributioniscomplete,themanagerisresponsiblefortracking whichtaskshavebeenassignedandwhichtaskshavebeencompleted.The manager is also responsible for processing the FASTA input files (both queryandtargetdatabaseareinFASTAformat,eliminatingtheneedfor pre-formattingdatabasefiles)anddistributingtheseprocessedfiles. Thetaskgroupscanbecontrolledbytheuserandcanspanmultiple computenodes.Forinstance,asystemwitheight-corenodescanhavea taskgroupsizeof24inwhichsetsofthreenodesworktogetherasasingle taskgroup,havingonesub-managercoreand23workercores. Thisdynamicschedulinglayerensuresthatwhenprocessesfailorget loaded down withtasks takinga long processing time, otherprocesses continuetodomeaningfulwork.Thisallowsforhighlyskewedinputsets Fig. 1. Scaling performance of ScalaBLAST 2.0 on a large protein se- tobeprocessedasmuchaspossibleinanevenrun-time.Dynamicsche- quencedatasetcomparedwithnon-redundantdatabase duling isimplemented by having the manager ‘handout’taskstosub- managers.Workerscompletingataskdonotwritetheiroutputuntilthey verifyfromthemanager(viathesub-manager)whetherthetaskhasal- running in parallel mode with only one worker process. readybeencheckedbackin.Workersthenrequestanewassignmentfrom Weobservedbetweena10%improvementtoa24%slowdown themanager.Whenallthetaskshavebeenassigned,anyworkersreport- in serial processing time when comparing either version of ingfornewworkaregivenaduplicatetaskthathasnotyetbeencom- ScalaBLAST 2.0 with serial NCBI BLAST, depending on the pleted. In this way, nodes that fail during a calculation are simply dataset and runtime options demonstrating that ScalaBLAST ignored.Anytasksassignedtothemwillbere-assignedtootherworkers 2.0 scaling is based on the order of magnitude run time for untiloneofthemcompletesthecalculation. serialexecution. 3 RESULTS 4 CONCLUSION ScalaBLAST2.0wasrunonaLinuxclusteratPacificNorthwest ScalaBLAST2.0providesfault-resilientspeeduponconventional NationalLaboratorythathas2310computenodes,eachhaving Linux-basedclustersinproportiontothenumberofnodesinthe eight cores for a total of 18480 compute elements. For blastp cluster.Onbothsmall-andlarge-scalesystems,thisallowsusers scaling runs, our query dataset contained 203200 proteins with toacceleratethethroughputofBLASTcalculationsthatcomplete widely varying size distribution. Our query list had an average even when processes fail in support of robust sequence analysis proteinlengthof175.1(cid:2)138.5residues,withaminimumlength applications. ScalaBLAST 2.0 can be freely downloaded from of eight and a maximum length of 4299 residues. This list was http://omics.pnl.gov/software/ScalaBLAST.php. comparedagainstaversionofthenon-redundantdatabasefrom NCBIdatedMay2010andcontaining12millionreferencepro- ACKNOWLEDGEMENTS teins.Eachquerysequencewascomparedtothereferencedata- base using blastp with default BLOSUM62 scoring matrix and Aportionof theresearchwasperformedusing theW.R.Wiley printoption9. Environmental Molecular Science Laboratory (EMSL), a national scientific user facility sponsored by the Department of 3.1 Scalability results Energy’s Office of Biological and Environmental Research and locatedatPacificNorthwestNationalLaboratory. Runtimesincludeparallelexecutionstartup,timeforparsingthe input files, creating and distributing their binary counterparts, Funding: This work was supported through the Signature performing all calculations and terminating the job. Scaling Discovery Initiative Laboratory Directed Research and results are shown in Figure 1. This figure demonstrates that Development program at Pacific Northwest National for this calculation, ScalaBLAST 2.0 achieved nearly ideal Laboratory (PNNL) and by EMSL. PNNL is operated by speedup all the way to 16392 compute cores at which point Battelle Memorial Institute for the U.S. Department of Energy the whole task list was processed in 27minutes. We have undercontractDE-AC05-76RL01830. observed similar scaling performance characteristics for blastn, tblastn, blastx and tblastx program option when using ConflictofInterest:nonedeclared. ScalaBLAST2.0(resultsnotshown). REFERENCES 3.2 Fault resilience Altschul,S. et al. (1990) Basic local alignment search tool. J. Mol. Biol., 215, Weexperiencedseveralexamplesofhardwarefailureduringthe 403–410. courseofScalaBLASTscalabilitytesting.Eveninthepresenceof Darling,A. et al. (2003) The design, implementation and evaluation of suchfailures,ScalaBLASTwasabletocontinuethecalculation mpiBLAST. In: Proceedings of the Cluster World 2003. Linux Clusters Institute, San Jose, CA. andcompletethetasklist.Wetestedtheoverheadintroducedby Oehmen,C. and Nieplocha,J. (2006) ScalaBLAST: a scalable implementation of our fault-resilient design by comparing NCBI BLAST 2.2.13, BLAST for high-performance data-intensive bioinformatics analysis. IEEE ScalaBLAST 2.0 running in serial mode and ScalaBLAST 2.0 Trans.ParallelDist.Sys.,17,740–749. 798