ebook img

The road not taken: retreat and diverge in local search for simplified protein structure prediction. PDF

0.51 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The road not taken: retreat and diverge in local search for simplified protein structure prediction.

Shatabdaetal.BMCBioinformatics2013,14(Suppl2):S19 http://www.biomedcentral.com/1471-2105/14/S2/S19 PROCEEDINGS Open Access The road not taken: retreat and diverge in local search for simplified protein structure prediction Swakkhar Shatabda1,2*, MA Hakim Newton1,2, Mahmood A Rashid1,2, Duc Nghia Pham1,2, Abdul Sattar1,2 From The Eleventh Asia Pacific Bioinformatics Conference (APBC 2013) Vancouver, Canada. 21-24 January 2013 Abstract Background: Given a protein’s amino acid sequence, the protein structure prediction problem is to find a three dimensional structure that has the native energy level. For many decades, it has been one of the most challenging problems in computational biology. A simplified version of the problem is to find an on-lattice self-avoiding walk that minimizes the interaction energy among the amino acids. Local search methods have been preferably used in solving the protein structure prediction problem for their efficiency in finding very good solutions quickly. However, they suffer mainly from two problems: re-visitation and stagnancy. Results: In this paper, we present an efficient local search algorithm that deals with these two problems. During search, we select the best candidate at each iteration, but store the unexplored second best candidates in a set of elite conformations, and explore them whenever the search faces stagnation. Moreover, we propose a new non- isomorphic encoding for the protein conformations to store the conformations and to check similarity when applied with a memory based search. This new encoding helps eliminate conformations that are equivalent under rotation and translation, and thus results in better prevention of re-visitation. Conclusion: On standard benchmark proteins, our algorithm significantly outperforms the state-of-the art approaches for Hydrophobic-Polar energy models and Face Centered Cubic Lattice. Background X-ray crystallography and Nuclear Magnetic Resonance Proteinsarethemostimportantofallorgan-ismspresent (NMR) spectroscopy are very muchslow and expensive. inthelivingcell.Givenaprotein’saminoacidsequence, For these issues, many researchers from other fields are theproteinstructureprediction(PSP)problemistofind attracted to solve the problem using their own techni- a three dimensional native structure that has the lowest ques[1,2]. freeenergy.Inordertofunctionproperly,theproteinhas Computational methods applied to PSP fall into three to fold into its native structure. Mis-folded proteins broadcategories:abinitio,homologymodelingandprotein cause many criticaldiseasessuchasAlzheimer’sdisease, threading.Thelatertwomethodsdependonthetemplates Cystic fibrosis, and Mad Cow disease. Knowledge about (or structures) of known proteins and are useful only thisnativestructureisofparamountimportanceandcan whenmatchingtemplatesarefound.Researchinabinitio haveanenormousimpactonthefieldofdrugdiscovery. PSPhasbeeninstigatedbythefamousAnfinsen’sdogma. Not much is known about the folding process and the In1973Nobel Prize Laureate Christian B.Anfinsen sug- nature of the energy function is also very complex. For gested that the native structure of a globular protein is manydecades,ithasbeenconsideredoneofthehardest determined only by its primary amino acidsequence [3]. problems in biology. In vitro laboratory methods like The ab initio PSP can be viewed as a search problem, where one has to find a stable, unique, and kinetically accessible native structure from the space of all possible *Correspondence:[email protected] 1InstituteofIntelligentandIntegratedSystems,GriffithUniversity, structures (also called conformations). The search space Queensland,Australia forthisproblem, even in the simplifiedmodels, contains Fulllistofauthorinformationisavailableattheendofthearticle ©2013Shatabdaetal.;licenseeBioMedCentralLtd.ThisisanopenaccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycited. Shatabdaetal.BMCBioinformatics2013,14(Suppl2):S19 Page2of9 http://www.biomedcentral.com/1471-2105/14/S2/S19 anastronomicallylargenumberofconformations.There- search space and makes the similarity matching of the fore,systematicsearchtechniquesarealmostimpractical conformationsefficient.Theseisomorphicconformations sincetheyperformexhaustivesearchandrequiresahuge areessentiallysameandshowdifferencesonlybecauseof amount of computational resources. In contrast, local the translational and rotational symmetry. We applied search methods are normally very quick in finding good thisencodinginouralgorithmalongwiththe longterm solutions,althoughtheysufferfromre-visitationandstag- memoryoflocalminimaproposedin[10].Experimental nation,andrequiregoodheuristics. resultsshowthatouralgorithmsignificantlyoutperforms Performance of the computational methods also the state-of-the-art algorithms on standard benchmark degradeswhenappliedtothehighresolutionmodelsthat proteinsusingHydrophobic-Polar(HP)energymodeland deal with real structures of proteins. Thisis due tothree FaceCenteredCubic(FCC)lattice. reasons: i) the unknown contributing factors of different forces to the energy functions, ii) protein models with Related work atomicleveldetailsrequirehugecomputationaleffort,and Lau and Dill [1] proposed a simplified HP energy model iii) the space of possible conformations is very large and for protein structure prediction problem. It is proved to complex. For these reasons, the general paradigm of de be a hard combinatorial problem [11]. Due to the com- novo PSP is to begin with the sampling of a large set of plexity, several techniques and their hybridizations have candidate(decoy)structuresguidedbyascoringfunction. been applied to solve the problem. The similarity with Inthefinalstage,therefinementsaredonetoachievethe the thermodynamic nature of the protein folding allured real structure. The simplified models, though lack many the researchers to apply simulated annealing [12,13]. details,providearealisticback-bonefortheproteinsand Genetic algorithms were first applied to solve this pro- canberefinedtogetrealstructures[4]. blem by Unger and Moult [14]. The basic genetic algo- Localsearchalgorithmswhenappliedtolargeproteins rithm was subsequently improved by many researchers (sequence length around 200 monomers) suffer from a [15-17]. huge number of re-visitation and stagnation. To handle Yue and Dill [18] applied constraint based approaches theseissues,anumberoftechniqueshavebeenappliedin for the first time and developed the Constraint Based theliteratureofPSP[5-7]thatincludetabulists,adaptive HydrophobicCoreConstruction(CHCC)algorithm.Their measures, and various restart mechanisms. Similar methodhadseveralpitfalls:CHCCcouldonlysupportthe approacheshavealsobeenusedinotherdomainssuchas HPmodelandfailedtoreportdegeneracyornon-unique propositional satisfiability [8] and quadratic assignment structures for several protein sequences. The research problem [9]. Many of the algorithms apply random group of Rolf Backofen developed a Constrained-based restarts or restart from the best local minimum [6,7]; ProteinStructurePrediction(CPSP)tool[19],whichpro- whichdonotsolvetheproblemingeneral. vided solutions to these problems. However, CPSP tool depends on pre-calculated cores and does not converge Our contribution for larger protein sequences. Palu et al. [20] developed Inthispaper,we presentanewalgorithmforthesimpli- COLAsolverusinghighlyoptimizedconstraintsandpro- fied protein structure prediction problem. During the pagators to obtainsatisfactory resultsonsmallandmed- search, our method selects the best candidate in each ium-sizedinstances(length<80).Leshetal.[5]provided iteration, but memorizesthe secondbest conformations anovelsetoftransformationscalledpullmovesextendible that are generated but not selected or explored (called toanylattice.BothLeshetal.[5]andBlazewiczetal.[21] elite conformations) at each iteration. Whenever the implementedtabusearchmeta-heuristicsin-dependentof search faces stagnation,we selectthe bestconformation eachother. from this elite set and continue search from there. This Hybridtechniques thatcombinethepowerof different retreathelpsthesearch diverge.Similartechniqueshave strategies provided better results. Using the pull moves, been used in the systematic search techniques like A* Klauet al. [22] proposed aninteractive optimization fra- search, but they require a huge amount of memory to mework called Human Guided Simple Search (HuGS). store the unexplored frontier. We maintain only a small Usingthesamepullmoveset,Ullahetal.[23]proposeda set of previously generated conformations by discarding two-stageoptimizationapproach.Furthermore,Ullahetal. conformations with similar fitness. It reduces the mem- [24] combined local search and constraint programming oryrequirementandprovidesamechanismtogobackto approaches.Theyintroducedaproteinfoldingsimulation earlier conformations with lower fitness value but with procedureonFCClatticeandemployedtheCOLAsolver potential to lead towards better search regions. We also [20] to generate neighborhood states for a simulated proposeanewnon-isomorphicencodingthatreducethe annealingbasedlocalsearch.TheyusedMJmatriceswith non-unique or isomorphic conformations from the 20×20aminoacidpairwiseinteractions.Theytestedtheir Shatabdaetal.BMCBioinformatics2013,14(Suppl2):S19 Page3of9 http://www.biomedcentral.com/1471-2105/14/S2/S19 approaches onsome real proteins (length <80) from the Ala, Pro, Val, Leu, Ile, Met, Phe, Tyr, Trp); and hydro- ProteinData Bank(PDB).Jiangetal.[25]combinedtabu philic or polar P (Ser, Thr, Cys, Asn, Gln, Lys, His, Arg, searchstrategy(GTS)withgeneticalgorithmsinthetwo- Asp, Glu). The given amino acid sequence of a protein dimensionalHPModel. is represented as a string s of the alphabet {H, P}. The Cebrian et al. [26] used tabu search to find 3D struc- free energy calculation for the HP model, shown in (1), tures of Harvard instances [27] on FCC lattices for the counts only the energy interactions between two non- first time. In their subsequent work, Dotu et al. [6,7] consecutive amino acid monomers. applied Large Neighborhood Search (LNS) to further (cid:2) optimize the results found in [26]. They also improved E= cij.eij (1) the tabu search by adopting a new neighborhood selec- i,j:i+1<j tion technique [7]. Both of their methods are implemen- wherec =1onlyiftwomonomersiandjareneighbors ted in COMET. Shatabda et al. [10] proposed a memory ij (or in contact) on the lattice and 0 otherwise. The other based approach on top of the algorithm proposed by term, e is calculated depending on the type of amino Dotu et al. [7] and improved the results on the FCC lat- ij acids:e =-1ifs=s=Hand0otherwise.Minimizingthe tice and HP energy model. Other methods (such as ij i j summationin(1)isequivalenttomaximizingthenumber Simulated Annealing [12], Ant Colony Optimization ofnon-consecutiveH-Hcontacts.Severalothervariantsof (ACO) [28], and Extremal Optimization [29]) are also HP-model[31]existintheliterature. found in the literature. Using the HP energy model together with the FCC lattice, the simplified PSP problem is defined as: given a Materials and methods sequence s of length n, find a self avoiding walk p ... 1 Proteinsarepolymersofaminoacidmonomers.Inasim- p on the lattice such that the energy defined by (1) is n plified model, all monomers have an equal size and all minimized. bondsareofanequallength.Eachamino acidmonomer Local search framework is represented by a single point and its position is The local search framework was originally proposed in restricted to a three dimensional lattice. A simplified [7]. The algorithm is similar to that of the procedure energyfunctionisusedincalculatingtheenergyofacon- localSearch () presented in Table 1 except in Lines 6, formation.Thegivenaminoacidsequencefitsintoafixed 9-10and14.Itdependsonastructuredrandomizediniti- lattice, where every two consecutive monomers in the alizationmethodandmaintainsasimpletabulisttopre- sequencearealsoneighboronthelattice(calledthechain vent recently used moves. In the framework, moves constraint) andtwomonomerscan notoccupy the same involving single monomer are only allowed. For any latticepoint(calledtheselfavoidingconstraint). givenconformationcandasequencepositioni,amove(i, p, c) that moves an amino acidi to a new position p is FCC lattice allowed,if (i)pisfreeandisin contactwithbothamino The Face Centered Cubic (FCC) lattice is preferred over acids at positions i - 1 and i + 1, and (ii) i is not in the other lattices since it has the highest packing density tabulist.Thelengthofthetabulisttakesarandomvalue [30] for spheres of equal size, and provides the highest from[4,n/4],wherenisthelengthofthesequence.The degree of freedom for placing an amino acid monomer. movecanbeappliedtoeitherHorPtype ofaminoacid Thus, it provides a realistic discrete mapping for at each iteration. The fitness function minimizes the proteins. The FCC lattice is generated by the follo- summationofHH-distancesforallnon-consecutivepairs −→ −→ wing basis vectors: v =(1, 1, 0), v =(−1, −1, 0), of H-monomers. The fitness function can be formally 1 2 −→ −→ −→ definedasthefollowing: v = (−1, 1, 0), v =(1, −1, 0), v = (0, 1, 1), 3 4 5 −→ −→ −→ v =(0, 1, −1), v =(0, −1, −1), v =(0, −1, 1), (cid:2)n −→6 −→7 −→8 f(c)= (dv(i,j))2× (si =H,sj =H) (2) v =(1, 0, 1), v =(−1, 0, 1), v =(−1, 0, −1), 9 10 11 i,j:i+1<j −v→=(1, 0, −1). Two lattice points p, q∈L are said 12 −→ wheredv(i,j)=d(i,j)2-2andd(i,j)=(x-x)2+(y-y)2 to be in contact or neighbors of each other, if q=p+ vi + (z- z)2, i.e. square of the Euclidean disitanjce betiweejn −→ i j for some vector v in the basis of lattice L. theithandjthaminoacidsinthecurrentconformationc i ofasequencesoflengthn.Theenergylevelofthestruc- HP energy model ture is still determined by the HP energy value. The fit- The Hydrophobic-Polar (HP) energy model was pro- nessfunctionisusedtodrivethesearchonly.Thesearch posed by Lau and Dill [1]. In this model, all the amino algorithm periodically switches the type of the acid and acids are divided into two groups: hydrophobic H (Gly, selectsthebestmoveonaamino-acidwhichisnotinthe Shatabdaetal.BMCBioinformatics2013,14(Suppl2):S19 Page4of9 http://www.biomedcentral.com/1471-2105/14/S2/S19 Table 1Local Setach Framework. ProcedurelocalSearch() ProcedureselectMove() 1 initialize() 1 whilemoveList.notEmpty()do 2 initializeTabu() 2 m¬getNextCandidate() 3 whileiteration≤maxIterationdo 3 c¬getConformation(m) 4 selectMonomerType() 4 e¬getNonIsoEncoding(s) 5 generateMoves() 5 b¬getPacked(s) 6 selectMove() 6 ifmatch(b,proximity)then 7 performMove() 7 discardm 8 updateCosts() 8 else 9 iflocalminimaisdetectedthen 9 updateEliteSet() 10 storeLocalMinima() 10 returnm 11 end 11 end 12 ifnonImprovingSteps≥maxStablethen 12 end 13 initializeTabu() 13 ifmoveList.empty()then 14 selectFromEliteSet() 14 nomovespossible 15 end 15 nonImprovingSteps¬maxStable+1 16 end 16 end tabulist. In case of P moves, it selects a random move searchgoesonbygeneratingtheneighborsoftheselected sinceamoveofPtype aminoaciddoesnotaffectthefit- conformations. The other potential conformations with ness function. The search restarts from the previously goodfitnessvaluesareneverusedasthesearchisgreedy found best solution whenever the fitness function is not innature.Wecallthemeliteconformations.Theseconfor- improvingformaxStablesteps.Thememory-basedsearch mations, if explored ever, may lead to better search in [10] extends this local search framework. It stores a regions. Note that, in the systematic search techniques, proportionofthelocalminimaencounteredandwhenever these conformations are stored and explored. However, a move is selected, it generates the conformation and they require a huge amount of memory. Moreover, the checkssimilaritywiththestoredlocalminima.Ifthegen- selectioninasystematicsearchlikeA*searchdependson erated conformation is within a given proximity of a a heuristic function that requires the goal to be known stored local minimum, the conformation is discarded. beforehand. In our case, the optimal structure is totally Hamming distance isused asthe similarity measure and unknown and we can not afford to store a huge number relativeencodingtorepresenttheconformations. of conformations. In our algorithm, we store the second Our algorithm is developed on top of the memory- bestconformationsandexplorethemwheneverthesearch based search. The pseudo-code for our algorithm is facesstagnation. depicted in Table 1. Our algorithm differs from the memory-based approach in Line 14 of Procedure local- Store Search() where we select a conformation from the elite Westorethesecondbestconformationsineachiteration setatstagnationandinLine9ofProcedureselectMove() inasetcalledeliteset.Ateachiteration, when amoveis wherewestoretheprominentbutnotselectedcandidate selected, we update this elite set of conformations. The conformations into the elite set. It also differs in the pseudo-code for the updateEliteSet() procedure is given encodingoftherepresentationoftheconformations.We in the right side of Table 2. We use a priority queue dothatatLine4ofProcedureselectMove()beforematch- sorted intheorderoffitness valueanditeration number ing it with stored local minima and at Line 10 of Proce- to store the elite conformations. Before inserting a con- durelocalSearch()whilestoringthelocalminimum.Rest formationintothepriorityqueue,wecheckforsimilarity of this section describes the detail of the procedures of in the stored local minima list and store it only if no ouralgorithm. matchisfound. Elite conformations Ineachiterationofalocalsearch,anumberofconforma- Explore tions are generated. However, only a few of them are We select the top element from the priority queue exploredinthenextiterations.Inthecaseofasinglecan- whenever the search stagnates. The search then con- didate search, only a single conformation, which is typi- tinues from the selected elite conformation. The search cally the bestconformationaccording to the heuristic,is algorithm, guided by the fitness function defined in (2), selectedforthenextiteration.Insuccessiveiterations,the quickly forms a compact hydrophobic core at the center Shatabdaetal.BMCBioinformatics2013,14(Suppl2):S19 Page5of9 http://www.biomedcentral.com/1471-2105/14/S2/S19 Table 2Pseudo-code forElite Set Methods. ProcedureupdateEliteSet() ProcedureselectFromEliteSet() 1 sb¬setofsecondbestcandidates 1 whileeliteSet.notEmpty()do 2 whilesb.notEmpty()do 2 c¬eliteSet.getTopElement() 3 m¬sb.getNextCandidate() 3 e¬getNonIsoEncoding(c) 4 c¬getConformation(m) 4 b¬getPacked(e) 5 e¬getNonIsoEncoding(c) 5 ifmatch(b,proximity)==falsethen 6 b¬getPacked(e) 6 elitSet.release() 7 ifmatch(b,proximity)==falsethen 7 returnc 8 eliteSet.push(c) 8 end 9 end 9 eliteSet.popElement() 10 end 10 end of the conformation and the greedy search oscillates this issue. Shatabda et al. [10] used the relative encoding withinthe same regionof the searchspace before it can proposedbyBackofenetal.[32]intheiralgorithm.Their improvethefitnessfunctiontobreakthecoreortoform encoding scheme starts from a fixed direction and con- somealternate core. Thedetailednature ofthesearch is tinuestoupdateabasematrixthroughoutthechain.The discussedin [10]. The oscillatingnature indicatesthat if efficiencyofthealgorithmthusdependsofthedimension we select a conformation from a region in the search ofthelattice.Moreover,adecodingalgorithmisneededto space, then we can ignore the other conformations with getbacktheabsoluteencodingsortheco-ordinatepoints. the same or near fitness value and within the temporal ThecomputationalcomplexityoftheiralgorithmisO(nl3), locality.Everytimeaneliteconformationisselectedform where nisthenumberof absolute directions and listhe the list, we do that by discarding a fixed proportion of dimension of thelattice.The complexity of thedecoding thetopelementsfromthelist.Thisresultsineliminating algorithmisalsoO(nl3).Anon-isomorphicencodingwas the conformations that are similar in fitness value and alsoproposedin[33]forcubiclatticesthatcalculatesthe structure,andarealsotemporallyproximate.Thisretreat anglesbetweentwoconsecutiveabsolutedirectionvectors effectively helps the search diverge. It also reduces the anden-codesthemovesequence.Thisencodingalsocosts memory requirement for the priority queue used. The more as it requires computation of angles between the detailed pseudo-code of the method is given in the left directionvectors. side of Table 2. The method elitSet.release() at Line 6 In this paper, we propose a new non-isomorphic releasesthetopelementsfromtheeliteset. encoding,whichisgenericforanylatticeandrequiresno Non-isomorphic encoding separatedecodingalgorithm;theencodingitselfmapsto Manytechniqueshavebeenemployedintheliteratureto the absolute directions. Instead of relative angles, our represent the protein conformations. These representa- algorithm depends on the relative occurrence of the tionsallowthesearchtokeepthecandidateconformations absolutedirectionswithinthechain.ItrequiresonlyO(n) updated and perform operations like similarity checking time to encode. The pseudo-code of our algorithm is (memory-based algorithms) and crossover (genetic algo- given in Table 3.Thisalgorithmcalculatesthe encoding rithms).Themostobviouswaytorepresenttheconforma- on the fly. It starts with an emptyMap andevery time a tions is to use Cartesian co-ordinates of the amino-acid newabsolutedirectionisencounteredinthesequence,it monomers. However, such a representation contains assigns the next available code to it. Once the mapping translational symmetry, which can be solved if absolute for all possible directions is found then the algorithm is encoding is used. Absolute encoding is found from the just a simple lookup from the mapping array. In the absolutedirectionvectorsbetweentheconsecutivepoints resultssection,weshowtheeffectivenessofourencoding intheamino-acidchain.Thealphabetsizeoftheabsolute schemewhenappliedtothememory-basedsearch[10]. encodingdependsonthelatticeused.FortheFCClattice, thealphabetsizeis12sincethenumberofbasisvectorsis Results and discussion 12. However, absolute encoding is not suitable when we We implemented our algorithm in C++ and ran experi- check similarity between two conformations since it ments on the NICTA (http://www.nicta.com.au) cluster containstheproblemofrotationalsymmetry.Twoidenti- machine. The cluster has a number of machines each cal conformations with rotational symmetry are repre- equipped with two 6-core CPUs (AMD Opteron @2.8 sentedbydifferentabsoluteencoding(seetheexamplein GHz, 3MBL2/6M L3 Cache) and64 GB Memory,run- Figure 1). This type of encoding is called isomorphic ningRocksOS(aLinuxvariantforcluster).Wecompared encoding.Non-isomorphicencodingsprovideasolutionto the performance of our algorithm to that of the tabu Shatabdaetal.BMCBioinformatics2013,14(Suppl2):S19 Page6of9 http://www.biomedcentral.com/1471-2105/14/S2/S19 Figure 1 Isomorphic Encoding. Two identical structuresin cubic latticehaving different absolute encoding; structure in theleft has the encoding“DSES”,andthestructureatrightwithencoding“UNEN”,whereD=DownU=Up,N=North,S=South,E=EastandW=West. searchbyDotuetal.[7]andthememorybasedapproach LS-Tabude-notesthetabusearchbyDotuet al.[7].The proposed in[10].Algorithmswere run 50 times foreach best and average energy levels achieved are reported in of the protein sequences. Each run was given 5 hours to Table 4. We set proximity measure to 3 and only 5% of finish. We could not compare our results with the Large the local minima was stored while maxStable was set to Neighborhood Search (LNS) [7] since the COMET pro- 100 for our algorithm. For other algorithms, we set the gramexitedwith‘toomuchmemoryneeded’errorforthe parameters as recommended by the authors. The best large-sizedbenchmarkproteinsthatwehaveselected.We energy levels reported by Dotu et al. [7] are also shown do not show results for small-sized Harvard instances under the column LNS. These results were produced by (length = 48) or other smaller protein sequences since largeneighborhoodsearch.Optimallowerboundsforthe bothalgorithmsreachnearoptimalconformationsandthe minimumenergyvaluesfortheproteinsarealsoreported differenceoftheenergylevelsachievedfortheseproteins under the column ‘E’ generated by the CPSP tools [19]. l arerelativelysmall. Note that these values are obtained by using exhaustive searchmethodsandareusedonlytoevaluatehowfarour Results resultsarefromthem.Themissingvaluesindicatewhere We show results for two sets of benchmarks in Table 4. no such bound was found and the values marked with * ThefirstsixproteinsarealsousedbyDotuetal.[7].The are the values for which the algorithm did not converge R instances (length = 200) are originally taken from [34] evenafter24hoursofrun. and the f180 instances (length = 200) are provided by We also used a second set of benchmark proteins Sebastian Will [7]. LS-New denotes our algorithm and derived from the famous Critical Assessment of Techni- LS-Memdenotesthememory-basedapproachin[10]and ques for Protein Structure Prediction (CASP) competi- tion (http://predictioncenter.org/casp9/targetlist.cgi). These proteins are of length 230 ± 50. Six protein Table 3Pseudo-code forNon-Isomorphic Encoding. sequences were randomly chosen from the target list. ProceduregetNonIsoEncoding(s) These sequences are then converted into HP sequences. 1 initMap() Results for these six proteins are also given in Table 4 2 fori¬1toNdo (lower part). The PDB ids for each of these proteins are 3 absdir=c.getAbsDir(i) also given. The parameter settings for these six proteins 4 ifabsdirisanewdirectionthen were also kept the same. LNS column contains no data 5 Map[absdir]¬dirCount for these six proteins since they were not used in [7]. 6 dirCount++ 7 end Analysis 8 encoded[i]=Map[absdir]; From the average energy levels shown in bold-face in 9 end Table 4, it is clearly evident that, for all the twelve pro- 10 returnencoded teins, our algorithm significantly outperforms both of Shatabdaetal.BMCBioinformatics2013,14(Suppl2):S19 Page7of9 http://www.biomedcentral.com/1471-2105/14/S2/S19 Table 4Experimental Results. Protein LS-New LS-Mem LS-Tabu Seq. Length E best avg best avg R.I. best avg R.I. LNS l R1 200 -384 -359 -339 -353 -326 22.41% -332 -318 31.81% -330 R2 200 -383 -361 -343 -351 -330 24.52% -337 -324 32.20% -333 R3 200 -385 -354 -340 -352 -330 18.18% -339 -323 27.41% -334 f180_1 180 -378* -361 -341 -360 -334 15.90% -338 -327 27.45% -293 f180_2 180 -381* -368 -350 -362 -340 24.39% -345 -334 34.02% -312 f180_3 180 -378 -365 -355 -357 -343 34.28% -352 -339 41.02% -313 3no6 229 -455 -419 -397 -400 -375 27.50% -390 -373 29.26% - 3mr7 189 -355 -320 -304 -311 -292 19.04% -301 -287 25% - 3mse 179 -323 -288 -271 -278 -254 22.63% -266 -249 29.72% - 3mqz 215 -474 -430 -404 -415 -386 20.45% -401 -383 23.07% - 3on7 279 ? -514 -476 -499 -463 - -491 -461 - - 3no3 258 -494 -406 -376 -397 -361 11.27% -388 -359 12.59% - ThebestandaverageenergylevelsachievedandrelativeimprovementsofouralgorithmoverotheralgorithmsfortheR,f180andinstancestakenfromCASP. the algorithms. We performed statistical t-test for inde- continuesto improve inthe stagnant situations and thus pendent samples with 95% level of significance to verify producesbetterresults. the significant difference in performances. We report the new lowest energy levels (w.r.t. incomplete search Effect of the non-isomorphic encoding methods) for all twelve proteins. These energy levels are The effects of the new non-isomorphic encoding of the shown in italic-faced font in Table 4. protein conformations have been two-fold. Firstly, it resultedinthereductionofdegeneracy,whichisevident Relative improvement in the number of discarded conformations during the InTable4,we reporttherelativeachievementincolumn search. Secondlytheefficient computationimprovedthe ‘R.I.’.Relativeimprovementofourapproachismeasuredin runtime. In the memory-based approach proposed in termsofthedifference withoptimalboundoftheenergy [10], the authorsused the relative encoding proposed in level.Thisvalueissignificantbecauseitgetshardertofind [32]. When applied with the memory-based algorithm better conformations as the energy level of a protein proposedin[10],ournewencodingresultedinmoredis- sequenceapproachestheoptimal.Wedefine: cards and less computation time, as shown in Table 5. The discarded conformations are the approximate mea- RelativeImprovement= Eo−Er ×100% (3) sure of similar conformations encountered during the E −E search. The experimental results for six proteins are l r showninTable5forfirstonemillioniterations. where E is the average energy level achieved by our o approach, Er is the average energy level achieved by the Conclusions otherapproach,and Elistheoptimallower boundofthe In this paper, we presented a local search algorithm for energy level. The missing values indicate the absence of solvingtheproteinstructurepredictionproblemonFCC anylowerboundforthecorrespondingproteinsequence. latticeusinglowresolutionHPenergymodel.Experimen- Similar measurements were also used in [10]. From the talresultsshowsthatouralgorithmoutperformsthestate- values reported in Table 4, we clearly see that our algo- of-the art algorithms. We useda novel encodingscheme rithmproducesconformationsthataresignificantlybetter to represent the conformations along with a set of elite intermsoftheaverageenergylevelachieved. conformationstohandlethestagnationofthelocalsearch. We believe that use of domain specific heuristics while Search progress selectingtheconformationsfromtheelitesetcanfurther In Figure 2, we show search progress ofthree algorithms improve the performance of the algorithm. In future, we fortheproteinsequenceR1.Averageenergylevelbyeach wish to explore that and apply our techniques to higher of the algorithms for 50 runs are shown. All three algo- resolutionsandotherenergymodelstoseetheeffect.We rithmsachievealmostthesamelevelofenergyinitiallybut wish to apply our techniques to other domains such as assoonasthesearchmakesprogress,thetabusearchand propositionalsatisfiability,vehiclerouting.Webelievethe the memory-based search fail to overcome stagnation. It proposed encoding scheme will add efficiency to search is clearly evident from the graph that our algorithm techniquessuchasgeneticalgorithms. Shatabdaetal.BMCBioinformatics2013,14(Suppl2):S19 Page8of9 http://www.biomedcentral.com/1471-2105/14/S2/S19 Figure2SearchProgress.SearchprogressofthreealgorithmsforProteinR1over300minutes. Declarations Table 5Effect ofNon-Isomorphic Encoding. Thepublicationcostsforthisarticlewerefundedbythecorresponding author’sinstitution. Protein OurEncoding RelativeEncoding ThisarticlehasbeenpublishedaspartofBMCBioinformaticsVolume14 Seq. runtime #discards runtime #ofdiscards Supplement2,2013:SelectedarticlesfromtheEleventhAsiaPacific BioinformaticsConference(APBC2013):Bioinformatics.Thefullcontentsof R1 28.33 354712 39.6 91664 thesupplementareavailableonlineathttp://www.biomedcentral.com/ R2 30.4 406572 42.6 91219 bmcbioinformatics/supplements/14/S2. R3 25.74 475357 42.6 92765 Authordetails f180_1 22.90 402738 35.4 103059 1InstituteofIntelligentandIntegratedSystems,GriffithUniversity, f180_2 27.34 317317 39.0 93814 Queensland,Australia.2QueenslandResearchLaboratory,NationalICTof f180_3 24.54 358326 37.8 89810 Australia. Comparisoninruntime(inminutes)andthenumbersofdiscardswhileusing Published:21January2013 ournon-isomorphicencodingandtherelativeencoding[32]forfirst1million iterationsofthememory-basedalgorithmbyShatabdaetal.[10]. References 1. LauKF,DillKA:Alatticestatisticalmechanicsmodelofthe conformationalandsequencespacesofproteins.Macromolecules1989, Authors’contributions 22(10):3986-3997. SSconceivedtheoriginalideaofeliteconformationsandnon-isomorphic 2. KlauGW,LeshN,MarksJ,MitzenmacherM:Human-guidedtabusearch. encoding.Allauthorscontributedsignificantlyintheimplementation, Proceedingsofthe18thNationalConferenceonArtificialIntelligence2002,41-47. experimentationandwritingofthemanuscriptandapprovedthefinal 3. AnfinsenCB:Principlesthatgovernthefoldingofproteinchains.Science version. 1973,181(4096):223-230. 4. RotkiewiczP,SkolnickJ:Fastprocedureforreconstructionoffull-atom Competinginterests proteinmodelsfromreducedrepresentations.JournalofComputational Theauthorsdeclarethattheyhavenocompetinginterests. Chemistry2008,29(9):1460-1465. 5. LeshN,MitzenmacherM,WhitesidesS:Acompleteandeffectivemove Acknowledgements setforsimplifiedproteinfolding.ProceedingsoftheSeventhAnnual WegratefullyacknowledgethesupportoftheGriffithUniversityeResearch InternationalConferenceonResearchinComputationalMolecularBiology ServicesTeamandtheuseoftheHighPerformanceComputingCluster 2003,188-195,RECOMB‘03. “Gowonda”tocompletethisresearch.WealsothankNICTA,whichisfunded 6. DotuI,CebriánM,VanHentenryckP,CloteP:Proteinstructureprediction bytheAustralianGovernmentasrepresentedbytheDepartmentof withlargeneighborhoodconstraintprogrammingsearch.Principlesand Broadband,CommunicationsandtheDigitalEconomyandtheAustralian PracticeofConstraintProgrammingSpringer;2008,82-96. ResearchCouncilthroughtheICTCentreofExcellenceprogram. Shatabdaetal.BMCBioinformatics2013,14(Suppl2):S19 Page9of9 http://www.biomedcentral.com/1471-2105/14/S2/S19 7. DotuI,CebrianM,VanHentenryckP,CloteP:Onlatticeproteinstructure 33. HoqueT,ChettyM,DooleyLS:Non-isomorphiccodinginlatticemodel predictionrevisited.IEEE/ACMTransactionsonComputationalBiologyand anditsimpactforproteinfoldingpredictionusinggeneticalgorithm. Bioinformatics2011,8(6):1620-1632. ProceedingsofIEEESymposiumonComputationalIntelligencein 8. MazureB,SaisL,GrégoireÉ:TabusearchforSAT.Proceedingsofthe BioinformaticsandComputationalBiology(CIBCB)2006,1-8,IEEE. NationalConferenceonArtificialIntelligence1997,281-285. 34. BackofenR,WillS:Aconstraint-basedapproachtostructureprediction 9. BattitiR,TecchiolliG,etal:Thereactivetabusearch.ORSAJournalon forsimplifiedproteinmodelsthatoutperformsotherexistingmethods. Computing1994,6:126-126. LogicProgramming2003,49-71. 10. ShatabdaS,NewtonM,PhamDN,SattarA:Memory-basedlocalsearchfor simplifiedproteinstructureprediction.Proceedingsofthe3rdACM doi:10.1186/1471-2105-14-S2-S19 ConferenceonBioinformatics,ComputationalBiologyandBiomedicine2012, Citethisarticleas:Shatabdaetal.:Theroadnottaken:retreatand 345-352,BCB‘12,ACM. divergeinlocalsearchforsimplifiedproteinstructureprediction.BMC 11. BergerB,LeightonT:Proteinfoldinginthehydrophobic-hydrophilic(HP) Bioinformatics201314(Suppl2):S19. isNP-complete.ProceedingsoftheSecondAnnualInternationalConference onComputationalMolecularBiology1998,30-39,RECOMB‘98. 12. KawaiH,KikuchiT,OkamotoY:Apredictionoftertiarystructuresof peptidebytheMonteCarlosimulatedannealingmethod.Protein Engineering1989,3(2):85-94. 13. KapsokalivasL,GanX,AlbrechtAA,SteinhöfelK:Population-basedlocal searchforproteinfoldingsimulationintheMJenergymodelandcubic lattices.ComputationalBiologyandChemistry2009,33(4):283-294. 14. UngerR,MoultJ:Ageneticalgorithmforthreedimensionalprotein foldingsimulations.Proceedingsofthe5thInternationalConferenceon GeneticAlgorithms1993,581-588. 15. KonigR,DandekarT:Improvinggeneticalgorithmsforproteinfolding simulationsbysystematiccrossover.Biosystems1999,50:17-25. 16. KrasnogorN,HartW,PeltaD:Proteinstructurepredictionwith evolutionaryalgorithms.ProceedingsoftheGeneticandEvolutionary Computationconference1999,1596-1601. 17. HoqueT,ChettyM,SattarA:Proteinfoldingpredictionin3DFCCHP latticemodelusinggeneticalgorithm.IEEECongressonEvolutionary Computation2007,4138-4145. 18. YueK,DillK:Forcesoftertiarystructuralorganizationinglobular proteins.ProcNatlAcadSciUSA1995,92:146-150. 19. MannM,BackofenR:CPSP-tools-Exactandcompletealgorithmsforhigh- throughput3Dlatticeproteinstudies.BMCBioinformatics2008,9:230. 20. AlessandroDP,DovierA,PontelliE:Aconstraintsolverfordiscrete lattices,itsparallelization,andapplicationtoproteinstructure prediction.Software-PracticeandExperience2007,37:1405-1449. 21. BlazewiczJ,DillK,LukasiakP,MilostanM:Atabusearchstrategyfor findinglowenergystructuresofproteinsinHP-model.Computational MethodsinScienceandTechnology2004,10:7-19. 22. KlauGW,LeshN,MarksJ,MitzenmacherM:Human-guidedtabusearch. Proceedingsofthe18thNationalConferenceonArtificialIntelligence2002,41-47. 23. UllahAD,KapsokalivasL,MannM,SteinhöfelK:Proteinfoldingsimulation bytwo-stageoptimization.InComputationalIntelligenceandIntelligent SystemsCaiZ,LiZ,KangZ,LiuY2009,138. 24. UllahAZMD,SteinhöfelK:Ahybridapproachtoproteinfoldingproblem integratingconstraintprogrammingwithlocalsearch.BMCBioinformatics 2010,11(S-1):39. 25. JiangT,CuiQ,ShiG,MaS:Proteinfoldingsimulationsofthe hydrophobic-hydrophilicmodelbycombiningtabusearchwithgenetic algorithms.JournalofChemicalPhysics2003,119(8):4592-4596. 26. CebriánM,DotúI,VanHentenryckP,CloteP:Proteinstructureprediction onthefacecenteredcubiclatticebylocalsearch.InProceedingsofthe 23rdNationalConferenceonArtificialIntelligence.Volume1.AAAI’08,AAAI Press;2008:241-246. 27. YueK,FiebigK,ThomasP,ChanH,ShakhnovichE,DillK:Atestoflattice proteinfoldingalgorithms.ProcNatlAcadSciUSA1995,92:325. 28. ShmygelskaA,HoosH:Anantcolonyoptimisationalgorithmforthe2D Submit your next manuscript to BioMed Central and3Dhydrophobicpolarproteinfoldingproblem.BMCbioinformatics and take full advantage of: 2005,6:30. 29. LuH,YangG:Extremaloptimizationforproteinfoldingsimulationson • Convenient online submission thelattice.Computers&MathematicswithApplications2009,57:1855-1861. 30. CipraB:Packingchallengemasteredatlast.Science1998,281(5381):1267. • Thorough peer review 31. Bornberg-BauerE:ChaingrowthalgorithmsforHP-typelatticeproteins. • No space constraints or color figure charges ProceedingsoftheFirstAnnualInternationalConferenceonComputational MolecularBiologyRECOMB‘97,NewYork,NY,USA:ACM;1997,47-55. • Immediate publication on acceptance 32. BackofenR,WillS,CloteP:Algorithmicapproachtoquantifyingthe • Inclusion in PubMed, CAS, Scopus and Google Scholar hydrophobicforcecontributioninproteinfolding.Proceedingsofthe • Research which is freely available for redistribution PacificSymposiumonBiocomputing2000,92-103. Submit your manuscript at www.biomedcentral.com/submit

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.