New Approaches to Computer-Aided Drug Design Dissertation der Mathematisch-Naturwissenschaftlichen Fakultät der Eberhard Karls Universität Tübingen zur Erlangung des Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) vorgelegt von Dipl. Inform. (Bioinformatik) Marcel Schumann aus Moers Tübingen 2012 Tag dermündlichenQualifikation: 05.06.2013 Dekan: Prof. Dr. WolfgangRosenstiel 1. Berichterstatter: Prof. Dr.-Ing. Oliver Kohlbacher 2. Berichterstatter: Prof. Dr. FrankBöckler Abstract Computer-aided drug design is very important for modern drug discovery. Using a variety of different algorithms, approximations of the binding free energyof chemical compounds to a molecular target can be generated in silico in a very fast and very cheapway,withoutanyneedforphysicalavailabilityofthosecompoundsinthisstep. Computer-aideddrugdesignthusallowstodrasticallyspeedupthetaskofdeveloping newdrugs,stronglyreducescostsandenablestherapidtestingofnew,yetunsynthe- sized,classes ofcompounds. Inthis dissertation,new approaches for computer-aideddrugdesign arepresented: a frameworkforQuantitativeStructure-ActivityRelationship(QSAR)modeling,areceptor- ligand scoring function and a docking algorithm, a three-dimensional target-specific rescoring procedure and CADDSuite, a software suite that contains all the aforemen- tionedalgorithmsandalargesetofadditional,auxiliarytoolsandalgorithms. TheQSARframeworkprovidesallnecessarystepstogenerateregressionorclassifica- tion models with high predictive quality: read input, generate molecular descriptors, generate a variety of different regression and classification models, automatically se- lect relevant descriptors and evaluate the quality of models. Using several data sets, wewillshowthatiseasilypossibletoobtainhigh-qualityQSARmodelsbyusingallthe functionalityincombination. IMGDock, a deterministicreceptor-ligand docking algorithm employing a specially de- signed empirical scoring function has been developed. Using the established DUD (Cross et al., J Med Chem, 2006, 49, 6789-6801) docking benchmark sets, we show thatIMGDockyieldsresultsofhighqualityandinmanycasesoutperformsotherdock- ing approaches. Furthermore, IMDock is fast, easily configurable and freely available asopensourceandcaneasilybedeployed oncomputeclusters,clouds,orgrids. Target-SpecificGrid-basedRescoring(TaGRes) employs three-dimensionalinformation generated by docking and experimental binding free energy measurements for other compounds in order to rescore molecular interactions. Thereby, this approach takes into account receptor-ligand interactions, their three-dimensional locations and their target-specific importances. We will show that using this technique, the enrichment obtainedbydockingcanbestronglyenhanced. CADDSuite(Computer-AidedDrugDesignSuite),wascreatedasaframeworkforcom- puter-aided drug design, containing all the algorithms mentioned before, and a high number of auxiliary tools, for example for preparation or analysis purposes. Thus, CADDSuite provides flexibly combinable programs for all commonly required steps and can therefore make solving common drug design tasks much easier. To make 3 creation of pipelines even simpler, CADDSuite has also been integrated into the well- known workflow system Galaxy, thus essentially allowing users to create drug design workflows directly from aweb browser,withoutany need for softwareinstallations on their local computer, and also to directly submit them to a compute cluster, grid, or cloud. Last but not least, we will explain our work towards discovery of inhibitors for bac- terial biofilm formation. We will describe how we found a number of very promising inhibitorcandidates,usingacombinationofourcomputer-aideddrugdesigntoolsand experimentalvalidations. 4 Acknowledgements First of all, I would like to thank Prof. Dr. Oliver Kohlbacher for the opportunity and freedomtoworkin theareaIcared mostfor. Many thanks to Prof. Dr. Frank Böckler for readily agreeing to serve as reviewer for thisdissertation. I am very grateful to Dr. Silvia Herbert for performing many wetlab experiments con- cerning IcaA. She performed the biofilm assays used to evaluate IcaA inhibitor candi- datesproposedbyus. Thanks a lot to all the members of the SFB766 for many interesting and inspiring discussionsaboutbiofilms,infections,cell walls aswell asbiologyandsciencein gen- eral. I would like to thank Dr. Alex Böhm for exciting discussions about IcaA and biofilms andpossiblefuturestepswithrespecttothebiofilmformationinhibitorsdiscoveredby us. I am grateful to Dr. Andreas Kämper for carefully and speedily proof-reading my manuscripts. Last but not least, many thanks go to my former colleagues in the Kohlbacher lab for aniceandenjoyabletimethere. 5 In accordance with the standard scientific protocol, I will use the personal pronoun "we"toindicatethereaderandthewriter,ormyscientificcollaboratorsandmyself. 6 Contents 1 Introduction 11 2 Biological Background 15 2.1 Overview ofdrugdesignpipelines . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.1 Disease selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.2 Targetidentification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.3 Leadidentification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.4 Leadoptimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.5 Preclinical trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.6 Clinicaltrials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2 Binding-Affinitymeasurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.1 Surfaceplasmonresonance . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.2 Isothermaltitration calorimetry . . . . . . . . . . . . . . . . . . . . . . . 23 2.3 CarbonicAnhydraseII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4 Biofilms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3 ComputationalBackground 29 3.1 Overview ofcomputer-aideddrugdesign . . . . . . . . . . . . . . . . . . . . . 29 3.2 Ligand-baseddrugdesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3 Structure-baseddrugdesign. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.4 Qualitystatistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.4.1 Coefficientofdetermination. . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.4.2 Quality ofclassifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.4.3 Receiver operatingcharacteristics curves . . . . . . . . . . . . . . . . . 36 3.4.4 Enrichmentfactors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.5 Samplingtechniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.5.1 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.5.2 Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.5.3 ResponsePermutationTesting . . . . . . . . . . . . . . . . . . . . . . . . 38 3.6 Machinelearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.6.1 Classification approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.6.2 Regressionapproaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.6.3 Featureselection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.7 Multi-greedyheuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4 QSAR approaches for ligand-based drug design 49 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 7 Contents 4.2 Design &Implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2.1 Inputdatamodule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.2 Models module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.3 FeatureSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2.4 ModelValidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3 Results &Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5 Receptor-Ligand Docking for structurebased drug design: IMGDock 57 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.2.1 Scoringfunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.2.2 Preparationalgorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.2.3 Docking algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.3 Results &Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.3.1 Scoringfunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.3.2 Docking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6 Receptor-Ligand Rescoring for structurebased drug design: TaGRes 69 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.2.1 TaGRes modelgeneration . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.2.2 Rescoringofdockingresults. . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.3 Results &Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 7 Consolidation of approaches into modular, workflow-enabled package: CADDSuite 77 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 7.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.2.1 Datainput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.2.2 Preparation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.2.3 Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.2.4 QSAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.2.5 Docking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.2.6 Rescoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.2.7 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7.2.8 Converter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.3 Results &Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.3.1 Integrationinto Galaxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.3.2 CarbonicanhydraseIIvirtualscreeningworkflow . . . . . . . . . . . . 87 8 Application: Virtual screening for biofilm-formation inhibitors 91 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.2 HomologyModeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.3 Scaffold finding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.4 VirtualscreeningI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 8.5 Hitverification I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 8 Contents 8.6 VirtualscreeningII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 8.7 Hitverification II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 8.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 9 Discussion & Conclusion 105 9
Description: