Perspective: Alchemical free energy calculations for drug discovery David L. Mobley and Pavel V. Klimovich Citation: J. Chem. Phys. 137, 230901 (2012); doi: 10.1063/1.4769292 View online: http://dx.doi.org/10.1063/1.4769292 View Table of Contents: http://jcp.aip.org/resource/1/JCPSA6/v137/i23 Published by the American Institute of Physics. Additional information on J. Chem. Phys. Journal Homepage: http://jcp.aip.org/ Journal Information: http://jcp.aip.org/about/about_the_journal Top downloads: http://jcp.aip.org/features/most_downloaded Information for Authors: http://jcp.aip.org/authors Downloaded 17 May 2013 to 147.52.40.61. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions THEJOURNALOFCHEMICALPHYSICS137,230901(2012) Perspective: Alchemical free energy calculations for drug discovery DavidL.Mobley1,2,a) andPavelV.Klimovich2 1DepartmentofChemistry,UniversityofNewOrleans,2000LakeshoreDrive,NewOrleans, Louisiana70148,USA 2DepartmentofPharmaceuticalSciences,UniversityofCalifornia,Irvine,California92617,USA (Received31July2012;accepted15November2012;publishedonline21December2012) Computationaltechniquesseewidespreaduseinpharmaceuticaldrugdiscovery,buttypicallyprove unreliableinpredictingtrendsinprotein-ligandbinding.Alchemicalfreeenergycalculationsseekto changethatbyprovidingrigorousbindingfreeenergiesfrommolecularsimulations.Givenadequate samplingandanaccurateenoughforcefield,thesetechniquesyieldaccuratefreeenergyestimates. Recentinnovationsinalchemicaltechniqueshavesparkedaresurgenceofinterestinthesecalcula- tions.Still,manyobstaclesstandinthewayoftheirroutineapplicationinadrugdiscoverycontext, including theone wefocus on here,sampling. Sampling of binding modes poses aparticular chal- lenge as binding modes are often separated by large energy barriers, leading to slow transitions. Bindingmodesaredifficulttopredict,andinsomecasesmultiplebindingmodesmaycontributeto binding.Inviewofthesehurdles,wepresentaframeworkfordealingcarefullywithuncertaintyin bindingmodeorconformationinthecontextoffreeenergycalculations.Withcarefulsampling,free energytechniquesshowconsiderablepromiseforaidingdrugdiscovery.©2012AmericanInstitute ofPhysics.[http://dx.doi.org/10.1063/1.4769292] I. INTRODUCTION ofnewmoleculesandcanbeslowandexpensive.3,15 Anac- curate computational method could reduce the need for syn- A. Structure-baseddrugdesignseeks thesisandexperimentandacceleratetheprocess. topredictbinding Existingcomputationalmethodsusedinthepharmaceu- Structure-baseddrugdesignseekstotakeanexperimen- ticalindustryaremostlyfocusedontheearlieststagesofthis tal structure of a drug target and identify or design a small process—library screening. These methods, including dock- molecule which binds to this macromolecular target in a de- ing,chemoinformatics,andligand-basedmethods,arehighly siredway,modulatingitsfunctionandtherebytreatingatar- approximate and often empirical. While they can be helpful getdiseaseorcondition.1–3Thegoalisaprocesswhichbegins for screening large libraries, their ability to predict binding withastructureandyieldsagooddrug.2,4 Unfortunately,ev- strengthistypicallyextremelypoor.7,10,16,17 eryaspectofthisprocesshasprovenchallenging. Currently,computationaltechniquesareappliedthrough- out thediscovery process.3,5–8 Of particular interesthereare B. Freeenergycalculationscouldguide structure-baseddesign theearlytomiddlestagesoftheprocess,whereoneseeksto first identify an initial “hit”—a small molecule which binds Freeenergycalculationsbasedonmolecularsimulations to the target with sufficient affinity to be interesting—and showpromiseatprovidinghigheraccuracyfortheleadopti- then improve this molecule’s properties, affinity, and speci- mizationstageofdiscovery.Thesecalculationsyieldbinding ficitytothepointwhereitisgoodcandidateforfurtherdevel- free energy (or affinity) results which are correct given the opment as a drug.6,7,9 That is, we seek computational tech- forcefield(whichprovidesthepotentialenergyasafunction niqueswhichcanbeappliedtohitidentification(oftencalled ofsystemconfiguration),atleastinthelimitofadequatesam- “virtual screening” as this is usually applied to screen li- plingandsimulationtime.4,18–22 Ourfocusismainlyon“al- brariesofcompounds)andtheleadoptimizationstageofdrug chemical” free energy calculations, which see the most use, discovery. and particularly on innovations that make these calculations At the earliest stages of this process, large libraries of appealing for drug discovery, as well as challenges that still existing compounds are often considered,3,10–13 so compu- standinthewayoftheirwidespreaduse. tational techniques need to above all be fast, even if unreli- Binding free energy calculations yield either absolute ableformanyindividualmolecules.14Butonceinitialhitsare freeenergies (measuringthefreeenergyofbindingofasin- identified,thegoalchangesfromidentifyingwhichmolecules gle ligand to a single receptor), or, more commonly, relative bind, to making these initial hits bind better, otherwise im- free energies (comparing the binding of related ligands to a provingtheirproperties,orexpandingchemicaldiversity.9At receptororasingleligandtorelatedreceptors).Relativecal- thisstage(leadoptimization),accuracyisamuchlargercon- culations are generally thought to be more efficient.4,19,20,23 sideration than speed, as experiments now involve synthesis This is partly because practitioners hope for a cancellation of errors where, for example, inadequate sampling of recep- a)Electronicmail:[email protected]. tor motions for one ligand would have similar effects on 0021-9606/2012/137(23)/230901/12/$30.00 137,230901-1 ©2012AmericanInstituteofPhysics Downloaded 17 May 2013 to 147.52.40.61. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions 230901-2 D.L.MobleyandP.V.Klimovich J.Chem.Phys.137,230901(2012) bindingofanotherligand,thusyieldingminimalerrors.While thiscancellation isnotguaranteed, itseems reasonable tous that for small ligand modifications, relative free energy cal- culationswillindeedbemoreefficient.Also,accuratepredic- tionofrelativebindingfreeenergieswouldbewellsuitedto applications in lead optimization, as noted above. For these reasons,wefocushereonrelativecalculations. Alchemicalfreeenergycalculationsworkbyintroducing a series of intermediate unphysical states spanning between thedesiredendstates.Forexample,forrelativecalculations, binding free energies are compared by changing one ligand into another or turning off interactions of one ligand ina re- ceptor while turning on interactions of another ligand in a receptor (Fig. 3). The intermediate states here are, roughly speaking,associatedwithfractionalpresenceofeachligand, asdiscussedinmoredetailelsewhere.4,18–21 Alchemicalfreeenergycalculationsarebuiltonground- worklaidbyKirkwood24andZwanzig,25amongothers.Kirk- FIG.1. Theprobabilityofsynthesizingacompoundwithaparticularbind- wood described a coupling parameter approach, where a pa- ingfreeenergychange.Filledregionsindicatethosecompoundswithatleast rameter(λ)controlsinteractionstrengthbetweenamolecule afactorof10gaininbindingaffinity,andarelabeledwiththereductionin thenumberofcompoundswhichwouldneedtobesynthesized(onaverage) or particle and the remainder of the system, and described togainthisfactorof10inaffinity.Blueistheapproximatedistributionob- howthiscouldbeusedtocomputefreeenergydifferencesus- servedexperimentallyforcompoundsproposedinternallyatAbbott;orange, ing thermodynamic integration (TI). Zwanzig showed that a green,andredaredistributionsgeneratedbyfilteringmoleculeswithahypo- free energy difference between two states can be computed theticalcomputationalmethodwhichgivescorrectfreeenergyestimateswith 2.0,1.0,and0.5kcal/molofnoise,respectively.FigureadaptedfromRef.4. via anappropriate exponential average ofenergy differences over an ensemble of configurations.25 The coupling parame- terapproachunderliesallalchemicalfreeenergysimulations, argued that these methods are accurate enough to be used andTIandtheZwanzigrelationcanbethoughtofasanalysis routinelyinadrugdiscoverycontext.4,71–74Severalrecentre- techniquesforthesecalculations. viewscoverfoundationsandhighlights.4,18–20,75,76 Since these techniques predate molecular simulations, Robust free energy calculations could have a profound theywerequicklyappliedinnewwaysinsimulations.Many impact on the drug discovery process with a modest level early applications focused on solvation free energies.26–38 of accuracy. To see this, consider a hypothetical discovery Soonafterthebasicideaforrelativebindingfreeenergycal- pipeline (following Ref. 4). A computational chemist par- culationswaslaidout,39 thefirstapplicationstobindingfol- ticipates in an existing drug discovery project which already lowed,inhost-guestsystems30,40,41 andproteins.35,42–47 Sev- has several hits which otherwise look promising but lack eralearlyreviewsprovideusefulperspective.48,49 sufficient affinity. The computational chemist’s job, each Despite early enthusiasm for alchemical calcula- week, is to take a list of proposed compounds which could tions, a number of key innovations were necessary to be made next and select the most promising for synthesis. make them more robust. The development of “soft-core For example, the medicinal chemistry team might propose potentials”50 or “separation-shifted scaling”51 led to much 100compoundsandthecomputationalchemistmightneedto better convergence52 for transformations involving changes select10forsynthesis.Assumeourgoalistogainafactorof in the number of atoms or chemical structure, opening up 10inbindingaffinity(ormakethebindingfreeenergybetter new applications and improving performance. Rediscovered by ∼1.4 kcal/mol). What accuracy do we need to achieve to appreciation for the Bennett acceptance ratio approach53 for dramatically reduce the time or number of compounds that computing free energies54,55 yielded a much more efficient mustbesynthesizedtoreachthisgoal?Itturnsoutthatevena analysis tool than the Zwanzig relation, and the subsequent verymodestlevelofaccuracycanprovidesignificantbenefits generalization into the multistage Bennett acceptance ratio forleadoptimization(Fig.1).Toquantitativelyanalyzethis, tookthisstillfurther.56 Today,theseimprovementsmakesol- we assume our computational method yields correct affinity vationfreeenergycalculationsforsmalloratleastfragment- predictionswithagivenlevelofGaussianrandomnoise,and like molecules essentially routine,57,58 even for hundreds of ask what level of noise we can tolerate. It turns out that the molecules59–63andbindingfreeenergycalculationsaremuch distribution of affinity changes seen in actual compounds moretractable. proposedbymedicinalchemistsisverynearlyGaussianand Relativefreeenergycalculationshaveseensomesignif- centeredatanaffinitychangeofzero.77 Thus,arathersimple icant applications. The Jorgensen lab at Yale routinely ap- statistical analysis is possible. Screening a fixed number of plies MC-based alchemical free energy calculations to help compounds will result in finding an increasing number of guide optimization of lead candidates in an early-stage drug highly potent compounds as the method’s noise goes down discoverysetting.15,64 Severalothergroupsinacademiahave (Fig. 1). We can use this to reduce the number of molecules applied them in similar efforts.65–67 Industry has also found which must be synthesized. Assume we will actually screen some use for these techniques,68–70 and several studies have up to 10 molecules each week, and ask how many total Downloaded 17 May 2013 to 147.52.40.61. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions 230901-3 D.L.MobleyandP.V.Klimovich J.Chem.Phys.137,230901(2012) molecules must be screened to gain a factor of 10 in affin- Assessingthemagnitudeoftheerrorintroducedbysam- ity after filtering by our computational method. With 0.5 plingproblemsinrelativefreeenergycalculationscanbedif- kcal/molofnoise,thenumberscreenedisreducedbyafactor ficult, but some compelling evidence shows these problems of 8; with 1.0 kcal/mol of noise, a factor of 5, and with 2.0 introduce significant errors. Since the sum of free energies kcal/mol of noise, a factor of 3 (see Ref. 4). Thus, a method aroundanythermodynamiccycleisbydefinitionzero,cycle which could screen ∼10–100 molecules per week with closure errors provide a lower bound on the amount of sam- even 2 kcal/mol of noise would impact lead optimization by pling error present in free energy calculations. In cycle clo- reducingthesynthesisneededinaleadseriesbyafactorof3. sureanalysis,therelativebindingfreeenergyoftwoligands In reality, binding affinity is just one consideration in is computed by at least two different paths, forming a cycle. lead optimization—improvements in affinity must be bal- Cycle closure errors are reported relatively infrequently, but ancedagainstotherfactorssuchassolubility,bio-availability, in cases where they have been reported, cycles may fail to stability,andsoon.6–8 Still,accuratetoolsforaffinitypredic- close (indicating an error) by as much as several kcal/mol, tionwillprovidetremendoushelp.Forexample,ligandmodi- farinexcessofstatisticalerrorestimates,83–87 suggestingse- ficationsmadeforsolubilityreasonsmustnotruinthebinding rioussamplingproblems.Thismaycontributetorelativefree affinity, so affinity prediction will be useful even while opti- energycalculations’reputationforunreliability,whereperfor- mizingotherfactors.7,20 mancecanbegoodinsomesystemsandterribleinothers. Insummary,freeenergycalculationscoulddramatically Some significant error in free energy calculations, then, aidstructure-baseddrugdesign,oneofouroverarchinggoals. originateswithsamplingproblems.Consideringaflexiblelig- Butaconcreteandhopefullyrealisticnear-termgoalistorou- and binding to a flexible receptor, our fundamental problem tinelyexceeda2kcal/mol(rootmeansquarederror)levelof is that multiple states contribute to binding. These include accuracy in computing relative binding free energies of re- multiple conformations of the receptor, multiple conforma- lated molecules. If possible with reasonable computational tions of theligand, and multipleorientations of theligand,88 efficiency, this would yield real improvements in early-stage as we discuss further in Sec. I E. Adequate sampling means drugdiscovery. that our simulations have visited all of the relevant states in the correct proportions, and thus we can obtain correct free energies. Some or all of these states will be within the same localminimaontheenergylandscape,sointerconversionbe- C. Freeenergycalculationsfaceseriouschallenges tween them willbe relatively quick. But other states may be Freeenergycalculations,whileinprinciplerigorous,face separated by substantial barriers. We will call these distinct threemainchallenges.Oneissimplyalogisticalchallenge— metastablestatesseparatedbybarriers“bindingmodes.”Dif- thesecalculationsremaindifficulttosetup,conduct,andan- ferentbindingmodesmaydifferinligandorientation,ligand alyze, and choices which may be relatively unimportant for conformation, receptor (usually protein) structure, or even other types of molecular simulations (such as the choice of presence or absence of water molecules in the binding site thermostat,barostat,orevencutoff78)canadverselyaffectac- region,forexample.88Giventhis(broad)definitionofabind- curacy. But beyond logistics, computed free energies can be ingmode,then,typicalsamplingerrorsinfreeenergycalcu- inerrorbecauseoflimitationsintheforcefieldorduetoinad- lations originate from a failure to adequate sample relevant equatesampling.Inthelimitofinfinitesampling,mostanal- ligand-receptorbindingmodes. ysis approaches are guaranteed to yield correct free energies for the force field. But practical simulations may often fall D. Manysamplingproblemsinvolvebindingmode short of this limit, and research on how much simulation is samplingchallenges neededtoreachthislimitisongoing. Here,ourinterestisfreeenergytechniquesthemselves— This problem of binding mode sampling is particularly that is, obtaining correct results for the particular setup important for relative free energy calculations. Specifically, and choice of force field. So we focus primarily on lim- relative free energy calculations (as we discuss below) typi- itations due to sampling, as new force field developments callyassumethattworelatedinhibitorsshareacommonbind- can generally easily be incorporated in free energy simula- ingmodeorthatanyenergybarriersbetweenbindingmodes tions. Evidence from solvation free energy studies suggests aresmall,sotheirdifferentpotentialbindingmodesintercon- that current force fields actually may be good enough to vertquickly.Whilethisissometimesthecase,therearemany reach the level of accuracy needed for relevance to phar- cases where the binding mode is not shared or not known a maceutical drug discovery, further justifying our focus on prioriandinterconversionisslow,leadingtothepotentialfor sampling.59–63,79–81 It is certainly the case that force field seriouserrors,asweshowintheexamplesthatfollow. limitations do exist,59,62,63,82 but these are not our focus Tobemorespecific,wheneverthedominantligandbind- here.Therearemanydifferentplaceswheresamplingcango ingmodeisnotdefinitivelyknownforthetwoligandsbeing wrong.Wemighthavedifficultyfindingtherelevantconfigu- compared, and shared by these two ligands, the thermody- rationsofasystem,orobtainingcorrectpopulationsforthese namiccyclemaynotclose,exceptinthelimitofimpractically configurations. This could occur for the receptor, such as in longsimulations.Inthiscase,noaccurateaffinityprediction the case of receptor conformational changes, or for the lig- ispossible.Computedrelativefreeenergiesmaybeincorrect and(forligandconformationalchangesorchangesinbinding by an unknown amount which, as we highlight below, is re- mode). lated to the free energy of changing binding modes. Recall Downloaded 17 May 2013 to 147.52.40.61. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions 230901-4 D.L.MobleyandP.V.Klimovich J.Chem.Phys.137,230901(2012) FIG.2. Thymidylatesynthaseinhibitors.(a)Initialbindingmode.(b)Modifiedbindingmodeonadditionofanitrogroupanddeletionofchlorines.(PDB codes1TSLand1TSM91). that by “binding mode” here we include ligand orientation pounds.Asstructuraldissimilaritygrows,wecanexpectthat andalsoligandandreceptorconformation. alternate binding modes (in the form of alternate ligand ori- entationsorconformations,oralternateproteinstructures)be- comeincreasinglylikely.Furthermore,inscreeningasmallli- E. Bindingmodesaredifficulttopredict braryofcompounds,thereisnoguaranteeweknowthebound Bothexperimentalandcomputationalevidencesuggests structureofeitherligand,letalonethattheyshareacommon this binding mode problem poses a real challenge. In lead bindingmode. optimization, one typically starts from a known inhibitor of Hence,theissuesdiscussedherearelikelytoaffectrela- a protein (often with a known binding mode) and modifies tivefreeenergycalculationsinavarietyofdifferentapplica- it to attempt to improve binding.19,70,89 In cases where the tions,evenfordrug-likeligands. ligand binding mode is maintained as functional groups are Computation provides some evidence concerning how modified, conventional relative free energy calculations may much binding mode sampling problems can affect binding oftenworkwellwithoutanyspecialtreatmentofalternatepo- free energy estimates. Previously, we found that an alternate tential binding modes. However, small modifications to lig- stable (and potentially reasonable) binding mode of phenol ands do on occasion yield big differences in ligand bind- in a model binding site differed by 4.0 kcal/mol in free en- ing orientation,65,88,90–97 and there is currently no way to ergy from the true binding mode, yet was predicted to be knowwhenthiswillhappen.Forexample,Stoutetal.devel- the dominant binding mode by some docking techniques.106 oped a series of thymidylate synthase (TS) inhibitors based Thus, mutating phenol into benzene in this binding mode, on phenolphthalein and phthalein derivatives. They found for example, would yield computed binding free energies that changing a five membered ring to a six membered ring erroneously favoring binding of benzene over phenol by dramatically altered the binding mode, then adding a nitro 4 kcal/mol. Thus, appreciable errors are possible. Problems group (removing two chlorines) to the resulting compound ofsimilarmagnitudewerealsoobservedinsomerelativefree introduced yet another unexpected binding mode (Fig. 2). energy calculations—the (cid:3)(cid:3)G between different small lig- Thesechangesledtomarkeddifferencesinaffinityandspeci- andsinthepolarlysozymecavityvariedbyupto4kcal/mol ficity for the enzyme, and the authors highlighted how some depending on how the mutation was set up, for reasons that other structures show multiple binding orientations for in- wereattributedtosamplingofbindingmodes.103 dividual inhibitors. They speculate that many TS inhibitors These trends seem to hold true in pharmaceutically rel- actually have multiple binding orientations, with the affinity evant binding sites. For thrombin inhibitors, addition of a involving a combination of these.91 A variety of data sug- methylgroupyieldsaringflipwhichaltersthebindingmode; gests this same conclusion may hold true in a variety of withoutspecialcare,freeenergycalculationsmissthischange othersystems.20,68,88,97–103 Goingbeyondligandorientation, in binding mode and yield errors around 1 kcal/mol in com- closely related ligands can bind with significantly different puted relative free energies.105 Work on neutrophil elastase protein conformations. Protein conformational changes can usingend-point freeenergy calculations considered multiple be important, and even protein side chain differences in the bindingmodesandfoundonecasewheretherelativefreeen- bindingsitecanbeslowand,calculationsindicate,thermody- ergy was clearly wrong due to two different kinetically dis- namicallysignificant.101,103–105Ligandmodificationscanalso tinct conformations of a ligand (by up to 1.6 kcal/mol). Ob- resultinchangesofbindingmodebywayofdisplacingwater taining correct binding free energies required including the molecules.87 Sorelatedligandsmayhavesignificantlydiffer- free energy of flipping the relevant portion of the ligand.65 ent binding modes, in terms of binding orientation, protein Michel et al. used dual-topologies to compute the relative conformation,orevenbindingsitewateroccupancy. freeenergybetweendifferentpotentialbindingmodesofthe Thesituationisevenworseifwemovebeyondleadop- same ligands,107,108 highlighting the fact that adequate sam- timizationandimagineusingrelativefreeenergycalculations pling over binding modes is not guaranteed, nor is the bind- to screen binding of a small library of more dissimilar com- ing mode always obvious. Here, subtle differences in ligand Downloaded 17 May 2013 to 147.52.40.61. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions 230901-5 D.L.MobleyandP.V.Klimovich J.Chem.Phys.137,230901(2012) compositioncouldeasilyalterthebindingmode,107andstable P PL bindingmodesdifferedby1kcal/mol107to7kcal/mol.108One Δ G 1 ° otherstudyofparticularinteresthighlightedmultiplebinding L1 1 modes of catechol-O-methyltransferase inhibitors which had tobetreatedseparatelyinrelativecalculationsbecauseoftime scaleissues.68Multipledistinctbindingmodeswereseenand inonecasedifferedbyupto1.5kcal/mol.Otherapplications ΔΔ Gsolu ΔΔ Gsite P PL have highlighted similar issues.18,109–112 Water plays a ther- Δ G 2 ° modynamicallysignificantroleaswell—slowwatermotions L2 2 intoandoutofbindingsitescanyielderrorsincomputedrel- ativefreeenergiesinexcessof10kcal/mol.87Thus,available computationaldata,thoughlimited,suggestthatissuesrelat- ingtouncertain,incorrect,orchangingligandbindingmodes canintroduceerrorsupto7kcal/molinrelativebindingfree FIG.3. Standardthermodynamiccycleforrelativefreeenergycalculations. energycalculationswhentheseeffectsareignored.Errorsre- Prepresentstheprotein(orreceptor)andLtheligands.Tocomputetherel- latingtobindingmodemaybeafactorinboththeoverallun- ativebindingfreeenergyoftwoligands,L1andL2,tothesameprotein,L2 ismutatedintoL1insolutionyielding(cid:3)(cid:3)Gsolv,andL2intoL1inthebind- reliabilityofthesecalculationsandinthecycleclosureprob- ingsite,yielding(cid:3)(cid:3)Gsite.Thedifference,(cid:3)(cid:3)Gsite−(cid:3)(cid:3)Gsolv,isrelated lemsnotedabove. tothedifferenceof(cid:3)G◦and(cid:3)G◦.Inthefigure,eachboxdenotesaseparate 1 2 Clearly, binding free energy calculations face very real solvatedsystemwhichdoesnotinteractwiththeotherboxes. sampling challenges. Modifying an existing ligand with a known binding mode, without taking into account the pos- several forms. Single topology relative free energy calcula- sibility of an important, slow change in binding mode, can tionsactuallyinvolvedirectlychangingL intoL ,whiledual 2 1 lead to errors of several kcal/mol in relative free energy topologyrelativefreeenergycalculationsinvolveturningoff calculations. Uncertainty in ligand binding mode can intro- the interactions of one ligand in the binding site while turn- duce similar errors. We now discuss how typical relative ing on interactions of the other ligand in the binding site.113 free energy calculations are done, then highlight how bind- This issue is discussed in more detail in the supplementary ing mode changes can introduce these errors, and provide a material;113herewefocusmainlyonsingletopologycalcula- generalframeworkforhandlingmultiplebindingmodes and tions,thoughthesameconsiderationsapplytodualtopology changesinbindingmodeinthecontextofrelativefreeenergy calculations.113 calculations. B. Alternatebindingmodesposepotentialproblems forrelativebindingfreeenergycalculations II. THEORYOFRELATIVEFREEENERGY CALCULATIONS To understand the nature of challenges related to bind- ing mode sampling in free energy calculations, consider the A. Relativefreeenergycalculationsnormallyassume adequatesampling case where related ligands L1 and L2 both bind to a receptor (as in Fig. 4 for binding of 5-chloro-2-methylphenol vs. 2- Here,ourinterestisaccuratecalculationsofrelativebind- ethylphenol).AssumeL hasaknown,single,dominantbind- 1 ing free energies of different ligands to a receptor. We must ing mode, but L has two distinct potential binding modes, 2 assume,basedontheanalysisabove,thattheirbindingmodes both of which are stable in MD simulations on time scales maynotbecompletelyknown.Asnoted,theideaof“binding longerthanourtypicalbindingfreeenergycalculations.This mode”hereincludesreceptorconformation,aswellasligand scenarioisdescribedinFig.5.Inourexamplehere,L isalig- 2 positionandorientationandligandconformation. andwhichconsistsofL plusanadditionalfunctionalgroup 1 Relativebindingfreeenergycalculationsemployather- (green rectangle). L might share a binding mode with L 2 1 modynamic cycle like that shown in Fig. 3. This cycle com- pares binding of ligands L and L to a single protein recep- 1 2 tor, using an alchemical transformation of L into L . The 2 1 cycle involves turning L into L in solvent (or replacing 2 1 L with L ), which yields (cid:3)(cid:3)G , and turning (or replac- 2 1 solv ing) L into L in the receptor binding site, which yields 2 1 (cid:3)(cid:3)G . Since these compose two legs of a thermodynamic site cycle as shown, we can then write (cid:3)G◦−(cid:3)G◦ =(cid:3)(cid:3)G 1 2 site −(cid:3)(cid:3)G , where (cid:3)G◦ is the standard binding free energy solv 1 (also sometimes called the absolute binding free energy) for L and (cid:3)G◦ is the corresponding quantity for L . Alterna- 1 2 2 tively, (cid:3)(cid:3)G1→2 =(cid:3)(cid:3)Gsite−(cid:3)(cid:3)Gsolv where (cid:3)(cid:3)G1→2 is FIG.4. Relativefreeenergycalculationsmightbedonetocomparebind- therelativebindingfreeenergy. ing of 5-chloro-2-methylphenol,left, with binding of 2-ethylphenol, right. This thermodynamic cycle is the standard approach for Atomswhichwouldbetransformedintodummyatomsinasingletopology relativebindingfreeenergies,butitsimplementationcantake calculation113areshowninmagentaandthescaffoldisshowninblack. Downloaded 17 May 2013 to 147.52.40.61. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions 230901-6 D.L.MobleyandP.V.Klimovich J.Chem.Phys.137,230901(2012) space likely to be important and treating kinetically distinct P PL ΔG 1 states separately.115–118 For example, for proteins with mul- ° L 1 1 tiple relevant conformations, one might consider each dis- tinct stable conformation individually (such as in cases of different stable rotamers or isomers).104,115–117,119–123 For a ΔΔGsolu ΔΔGsite ligand with multiple binding orientations, one might con- P PL siderdifferentorientationsseparately.65,68,106,115,116,124 Here, ΔG 2 L ° we can think of each of these approaches as considering a 2 2 different stable binding mode separately. In this scenario, the main advantage is that we only have to adequately sample each binding mode, not transitions between bind- ing modes. The price we have to pay, however, is that we P PL Δ G 1 must obtain the relative free energies of the different bind- ° L1 1 ing modes. Still, in cases where transitions between bind- ingmodesareslowbuttransitionswithinbindingmodesare fast,thiscandramaticallyimproveconvergenceoffreeenergy ΔΔG ΔΔG estimates.106,124 solu site P PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL ΔG 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 °2 D. Thereisasimplegeneralexpressionforrelative freeenergiesinvolvingmultiplebindingmodes Separation of states approaches have a long history in free energy calculations, even predating simulations FIG.5. ThermodynamiccyclesforcomparingbindingofL1andL2,where themselves.116 There have been a reasonable number of L2 hastwopossiblebindingmodes,asindicatedbythepositioningofthe applications of these techniques to protein conformational greenrectangle.In(a),L1 andL2 shareacommonscaffold(grayspheres) changes, but relatively few to binding free energy calcula- whichhasthesamebindingmodeinbothligands(asinFig.3).In(b),L1 andL2shareacommonscaffold(grayspheres)whichisactuallyrotatedin tions. And of these, most applications were in absolute free bindingofL2relativetoL1. energy calculations,101,103,106,124 with only very few to rela- tivefreeenergycalculations.68,107 Here, we are interested in a general approach to handle (Fig.5(a)).ButL mightalternativelyhaveamodifiedbinding 2 multiple binding modes within relative free energy calcula- mode(Fig.5(b)). tions.Inthisapproach,wepickareferencebindingmodefor Both these cycles yield identical results only in the theligandwhichwewillusefordoingtheactualbindingcal- limitofadequatesampling.Whileadequatesamplingmaybe culation, from which we will obtain (cid:3)(cid:3)G , the binding site,r straightforward in case Fig. 5(a) since the two ligands share free energy in that particular mode. Additionally, we need thesamebindingmode,energybarriersmeanitwillbevery (cid:3)(cid:3)GLn,i→r, the free energies of taking each ligand Ln be- challenging inFig.5(b),whichinvolvesachangeinbinding tweenbindingmodenumberiandthereferencebindingmode mode.Inthislattercase,computed(cid:3)(cid:3)G1→2 willdependon r(“interconversionfreeenergies”(IFE),inourterminology). the starting structure and be incorrect unless the simulations Thenwecanwrite(motivatedbyRefs.106and124;seethe sampleenoughbindingmodeinterconversionevents. supplementary material113 for the derivation) the following To sum up, whenever different potential ligand binding generalexpressionfortherelativebindingfreeenergy: modesareseparatedbylargekineticbarriers,thermodynamic cycles of the sort in Fig. 5 are unlikely to close with normal (cid:3)(cid:3)G1→2 =(cid:3)(cid:3)Gsite,(cid:2)r −(cid:3)(cid:3)Gsolv (cid:4) simulationlengthsduetoconvergenceproblems.Thiswillbe (cid:3) especiallyproblematicwheneverthebindingmodeofone(or 1+ exp(−β(cid:3)GL1,i→r) more)ofthepotentialligandsinthecalculationsisuncertain, −β−1ln(cid:2) modesi(cid:5)=r (cid:4). orwhentheirbindingmodesaredifferent. (cid:3) 1+ exp(−β(cid:3)GL2,i→r) modesi(cid:5)=r C. Longersimulationsorseparationofstatescan (1) bothsolveconvergenceproblems Here the sum runs over different stable binding modes. Currently, there are two established approaches to solv- Two important limiting cases of this expression are: ing these types of sampling problems. The most straightfor- (1) When all of the (cid:3)GLn,i→r are large and positive (when wardapproachissimplytosimulatelongeruntilconvergence all other binding modes except r are unfavorable), Eq. is adequate; after all, both thermodynamic cycles yield cor- (1) reduces to (cid:3)(cid:3)G1→2 =(cid:3)(cid:3)Gsite,r −(cid:3)(cid:3)Gsolv; and (2) rect free energies in the limit of infinite sampling. But this When all other binding modes are equivalent to the ref- may not always be practical.114 Another approach involves erence binding mode (when all binding modes are equal), separationofstates(alsocalled“integrationoverparts”115)— thenallofthe(cid:3)GLn,i→rarezeroand(cid:3)(cid:3)G1→2 =(cid:3)(cid:3)Gsite,r specifically,focusingsamplingonindividualregionsofphase −(cid:3)(cid:3)Gsolv −β−1lnMN whereNisthenumberofequivalent Downloaded 17 May 2013 to 147.52.40.61. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions 230901-7 D.L.MobleyandP.V.Klimovich J.Chem.Phys.137,230901(2012) PL PL 1 1 ΔΔG L1,1→2 ΔG ° P 1 L 1 1 2 ΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔΔGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGsssooollluvv ite, ite, s s P G G Δ Δ L Δ Δ 2 ΔG ° 2 PL PL 2 2 ΔΔG L2,1→2 FIG.6. TocorrectlytreatbindingofL1andL2whenbothhavemultiplepotentialbindingmodes,weneedadifferentapproachdependingonthepreferred bindingmode,asdiscussedinthetext.Calculating(cid:3)GL1,1→2and(cid:3)GL2,1→2willtelluswhichbindingmodesarepreferred,andwecancombinethiswithEq. (1)togetthecorrectbindingfreeenergy. bindingmodesofL andMisthenumberofequivalentbind- foreachligandwhichwouldbepotentialbindingmodes,but 1 ingmodesofL .Inthelattercase,theterminvolvingthelog- if time scales for interconversion are slow, we will have no 2 arithmessentiallyamountstoaligandsymmetrynumbercor- data at this point about which binding modes are the most rectionanalogoustothoseinRef.106. important to binding—we will simply know of a small set This approach requires clear separation into multiple, ofstablebindingmodesofeachligand.Considerthecaseof non-overlapping binding modes. As we will discuss below, two ligands, each with two possible binding modes (Fig. 6). thisissimpleinthelimitofextremelyslowinterconversions Dependingonhowthecalculationsaresetup,wecanimagine betweenbindingmodes,butwhenonlyafewtransitionsoccur two possible values for (cid:3)G◦—one going to binding mode 1 1 on simulation time scales, a clear definition of each binding (left) of L and one going to binding mode 2 (right) of L . 1 1 modeisnecessary. Similarly, there are two possible values for (cid:3)G◦. The more 2 Overall, this expression is a general one for handling negativeofthesefreeenergiesforeachligandwillcorrespond multiple binding modes in the context of relative free en- tothemoredominantbindingmode,106 andwedonotknow ergy calculations. Next, we consider several additional lim- aprioriwhichthiswillbe.Thisisoneofthethingswewould itingcases. liketofindout,inadditionto(cid:3)(cid:3)G1→2. Assumethatsimulationtimescalesareshortenoughthat the ligand does not switch between binding modes. Then, E. Ingeneral,wemaynotknowthebindingmode because each ligand has two possible binding modes, cal- ofeitherligand culations will obtain one of the two different potential val- In the general case (Eq. (1)) each ligand L and L may uesforthefreeenergychangeinthebindingsite,(cid:3)(cid:3)G 1 2 site,1 have multiple binding modes88 or at least we may not know and(cid:3)(cid:3)G ,dependingonthechoiceofreferencebinding site,2 their dominant binding modes. To deal with this, we might modeforthecalculations.125Inthelimitwhereasingle(com- dockeachligandintothebindingsiteandrunsomeshort(i.e., mon) binding mode dominates, the corresponding (cid:3)(cid:3)G site,r nanosecond) MD simulations from several different starting willyieldthecorrectbindingfreeenergy,butwedonotknow poses.103,106,111 From these, we might identify stable states whichiscorrectatthispoint. Downloaded 17 May 2013 to 147.52.40.61. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions 230901-8 D.L.MobleyandP.V.Klimovich J.Chem.Phys.137,230901(2012) Wenotethreelimitingscenariosinthissituation: If the possibility of multiple binding modes is ignored, as in many relative calculations, then typically scenario 1 1. Bothligandsprefertheleft(#1)bindingmode. 2. Bothligandsprefertheright(#2)bindingmode. or scenario 2 will be assumed and no IFEs ((cid:3)GL1,1→2 3. The ligands prefer different binding modes (assume L and (cid:3)GL2,1→2) are calculated. This introduces an error of prefers the left binding mode and L prefers the righ1t (cid:3)GL2,1→2 (or (cid:3)GL1,1→2 depending on the cycle) when sce- 2 nario3describesbinding.AsnotedintheIntroduction,these bindingmode). freeenergiescanbelarge,upto7kcal/mol. These three cases require slightly different approaches, Lead optimization campaigns will in some cases begin buttodistinguishbetweenthem,weneedtoknowwhichbind- withknowledgeofthedominantbindingmodeofoneofthe ing mode of each ligand is preferred. This, as discussed in ligands. But even if we know the binding mode of one lig- Sec. II B, will require a separate calculation of (cid:3)GL1,1→2, and, we may not know the binding mode of the other. Thus the free energy of taking L1 from the left to the right bind- westillneed(cid:3)GL1,1→2,aslongastransitionsbetweenbind- ing mode, and (cid:3)GL2,1→2, the corresponding free energy for ing modes in the site are slow compared to simulation time L .Calculationofthesemaybechallenginginitsownright, scales. 2 thoughweprovidesomediscussionbelowofhowthesemay be obtained (Sec. III). For now, assume we have obtained these IFEs, allowing us to distinguish between the scenarios F. Thesameframeworkapplieswhenhandling notedabove. mutationstoacommonscaffold Withthesequantitiesinhand,wenowconsiderourthree Ingeneral,oneligandmaynotbeasubsetoftheother,so scenarios: insteadofdirectlymutatingL intoL ,thefreeenergycalcu- 1 2 1. Left binding mode: In this case, we follow the ther- lationmayneedtopassthroughacommonscaffold(asshown modynamic cycle labeled “Cycle 1” and (cid:3)(cid:3)G1→2 inthesupplementarymaterial,113Fig.1(b)).Additionally,we =(cid:3)(cid:3)G −(cid:3)(cid:3)G . may need to consider even more potential binding modes— site,1 solv 2. Right binding mode: In this case, we follow the ther- forexample,formutationsof5-chloro-2-methylphenoltocat- modynamic cycle labeled “Cycle 2” and (cid:3)(cid:3)G1→2 echol,wetriedfourpotentialbindingmodes(whichdiffered =(cid:3)(cid:3)G −(cid:3)(cid:3)G . substantially in free energy) since there are four ways to site,2 solv 3. Different binding modes: In this case, we obtain overlay it onto catechol while preserving the position of the (cid:3)(cid:3)G1→2 =(cid:3)(cid:3)Gsite,1−(cid:3)(cid:3)Gsolv +(cid:3)GL2,1→2 or hydroxyl.106Tohandlesuchcases,weintroduceanadditional (cid:3)(cid:3)G1→2 =(cid:3)(cid:3)Gsite,2−(cid:3)(cid:3)Gsolv −(cid:3)GL1,1→2. intermediate state (Fig. 7) corresponding to the scaffold. These are thermodynamically equivalent, though in Dependingonwhetherthesearesingleordualtopologycal- practicetheywillhavedifferentconvergenceproperties. culations,specificationofthisscaffoldmaybeexplicitorim- ThesefollowfromEq.(1)intheappropriatelimits.126 plicit(supplementarymaterial,113 Fig.1). N Gsite,1 Gsite,2 Gsite,3 Gsite, Δ Δ Δ Δ Δ Δ Δ Δ FIG.7. Inthefullygeneralcase,eachligand(Ligand1,Ligand2)mayhavemultiplepotentialbindingmodes,andoneligandmaynotbeasubsetofthe other.Inthiscase,forsingletopologycalculations,weneedtopassthroughanintermediatestateorscaffoldwhichisacommonsubstructureofbothligands. Computingtherelativefreeenergymayrequirecomputingallofthe(cid:3)(cid:3)Gsite,1...(cid:3)(cid:3)Gsite,N,aswellasthefreeenergyofmovingthescaffoldbetweendifferent scaffoldbindingmodes.Herethedifferentligandsareshownwithcoloredadditionsrepresentingfunctionalgroups;thecommonsubstructureisgray. Downloaded 17 May 2013 to 147.52.40.61. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions 230901-9 D.L.MobleyandP.V.Klimovich J.Chem.Phys.137,230901(2012) Figure 7 shows a variety of possible structures (which modewhilebeingturnedoninanotherbindingmode.Orien- representmetastablestatesorpotentialbindingmodes)which tationalrestraints106 couldbeusedtoensurethetwobinding mayaffecttherelativebindingfreeenergy.127 Whatexactin- modesarewellseparatedandthecorrectIFEisobtained,asin formationdoweneedtogettherelativebindingfreeenergyin Rocklinetal.129OtherpotentialstrategiesforcomputingIFEs thiscase?Minimally,werequireapathbetweeneverystable includeMarkovstatemodels,124 metadynamics,andeventhe potentialbindingmodetoeveryotherstablepotentialbinding “deactivatedmorphing”approach.130 mode.Wehaveconsiderableflexibilityinhowthesepathscan We recommend one of two approaches to calculate the beconstructed,including: IFEsinarelativelygeneralway,thoughmorealternativesare becomingavailable: 1. Computethefreeenergyofalteringthebindingmodeof thescaffoldonly,andallofthebindingsitefreeenergies 1. Using absolute free energy calculations to compute the (cid:3)(cid:3)G . free energy to change the binding mode of the scaffold site,i 2. Compute the free energy of altering the binding mode in the binding site. While this may seem overkill for a of each ligand in its binding site between all binding relativefreeenergycalculation,therearetwomainben- modes, and only one (cid:3)(cid:3)G avoiding the need for efits. First, it actually enables a set of relative free en- site,i scaffoldIFEs. ergy calculations to be used to obtain absolute binding 3. Computethefreeenergyofalteringthebindingmodeof freeenergies(byvirtueofknowingtheabsolutebinding justoneligandinitsbindingsite,andallofthebinding freeenergyofourscaffold).Second,itscalesextremely sitefreeenergies(cid:3)(cid:3)G . well with additional related ligands. Particularly, imag- site,i ine instead of two related ligands we want to compare, AswewilldiscussbelowinSec.III,differentapproaches we have 10 which share a common scaffold. We can forcomputingthebindingmodeIFEsmaygiveusreasonto compute absolute free energies of the scaffold binding favor one of these scenarios over the others, but all of these modes just once, and use this one set of absolute bind- yieldtherequisiteinformation.Additionofsomeredundancy ing free energy calculations to compute correct relative isadvisabletoallowcomputationofcycleclosureerrorsasa orabsolutebindingfreeenergiesofall10ofourligands. testforconvergence. Thisisdonesimplybyaddingadditionalligands(i.e.,in Assuming we have this information, we then use IFEs Fig.7)andconnectingeachligandtothecommonscaf- in combination with the other information available to iden- fold. tifywhichbindingmode(s)aredominantandthenconstructa 2. Using relative free energy calculations combined with thermodynamiccyclesuchasFig.6whichincludesourscaf- orientational restraints to calculate the free energy to foldtogetthecorrectrelativefreeenergy. change the binding mode of each ligand in the binding Whileourdiscussionherehasfocusedonsingletopology site.Withadualtopologyscheme,wecanturnaligand calculations,similarissuesfacedualtopologycalculationsas into dummy atoms in one binding mode while turning wenoteinthesupplementarymaterial.113 it on from dummy atoms in another, potentially using orientationalrestraintstoensurethatnointerconversion III. RECOMMENDEDAPPROACHES betweenbindingmodeshappensduringthecalculations andthusobtainingacorrectIFE.107,108,129 Forligandswithslowbindingmodechangesorunknown binding modes, efficient relative free energy calculations re- quirethefreeenergiesoftakingligandsbetweenstablebind- Both of these approaches have the caveat that, as noted ing modes (the IFE) in the binding site. Otherwise, free en- previously,treatingdifferentbindingmodesseparatelyinthe ergy calculations will need to be extremely long in order to these calculations requires careful separation of kinetically achieve convergence. We envision several potential ways to distinctbindingmodes.106,124ConsideraligandL whichhas 1 obtaintheIFEinthebindingsite(thefreeenergies,(cid:3)GLn,i→r two metastable binding modes A and B, and its transforma- (Eq.(1)),totakeeachbindingmodetothereferencebinding tion into L which has the same two binding modes. When 2 mode). thebarrierbetween Aand Bislargeenough thatnobinding TwodifferenttypesofIFEswouldsuffice.Wecoulduse modeinterconversionoccurs,separationintodistinctbinding IFEs of the scaffold (as in Fig. 7) or IFEs of actual ligands. modes is trivial. And when rapid interconversion between A ScaffoldIFEsmaybeeasiertoobtainduetofewersterichin- andBoccurs,noseparationisnecessary.Butintheinterme- drances (and in special cases such as a scaffold consisting diateregime,whenjustoneorseveraltransitionsbetweenA of a symmetric ring, scaffold IFEs may be zero128), though andBoccur(butnotenoughtoensureconvergence),thesim- IFEsofaligandcanprovideadditionalphysicalinsight.These ulationseitherneedbelengthened,orsimulationdataneedto IFEs could be obtained a number of ways, including poten- be sorted by binding mode and each binding mode analyzed tial of mean force approaches, relative free energy calcula- separately. tions between different binding modes,65,68,107,108 and even A third approach is to use enhanced sampling absolute binding free energy calculations to get relative free techniques21,131,132 tosampleallrelevantbindingmodeseas- energies of binding modes.106,124 Relative calculations be- ily within the context of a single free energy calculation, tween binding modes seem likely to work particularly well, eliminating the need for IFEs. However, success with these and would probably best be implemented with a dual topol- techniques so far seems confined mainly to model binding ogy strategy, where the ligand is turned off in one binding sites.21,105,110,128,133 Downloaded 17 May 2013 to 147.52.40.61. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://jcp.aip.org/about/rights_and_permissions
Description: