Chemoinformatics inDrugDiscovery Editedby TudorI.Oprea MethodsandPrinciplesinMedicinalChemistry EditedbyR.Mannhold,H.Kubinyi,G.Folkers EditorialBoard H.-D.Ho¨ltje,H.Timmerman,J.Vacca,H.vandeWaterbeemd,T.Wieland RecentlyPublishedVolumes: .. T.Lengauer(ed.) H.-J.Bohm,G.Schneider(eds.) Bioinformatics – Protein-LigandInteractions FromGenomestoDrugs Vol.19 Vol.14 2003,ISBN 3-527-30521-1 2001,ISBN 3-527-29988-2 R.E.Babine,S.S.Abdel-Meguid(eds.) J.K.Seydel,M.Wiese ProteinCrystallography Drug-MembraneInteractions inDrugDiscovery Vol.15 Vol.20 2002,ISBN 3-527-30427-4 2004,ISBN 3-527-30678-1 O.Zerbe(ed.) Th.Dingermann,D.Steinhilber, G.Folkers(eds.) BioNMRinDrugResearch Vol.16 MolecularBiology inMedicinalChemistry 2002,ISBN 3-527-30465-7 Vol.21 P.Carloni,F.Alber(eds.) 2004,ISBN 3-527-30431-2 QuantumMedicinalChemistry .. H.Kubinyi,G.Muller(eds.) Vol.17 Chemogenomicsin 2003,ISBN 3-527-30456-8 DrugDiscovery H.vandeWaterbeemd, 2004,ISBN 3-527-30987-X .. H.Lennernas,P.Artursson(eds.) DrugBioavailability Vol.18 2003,ISBN 3-527-30438-X Chemoinformatics in Drug Discovery Edited by Tudor I. Oprea SeriesEditors: (cid:1) ThisbookspublishedbyWiley-VCHarecarefully produced.Nevertheless,authors,editors,and Prof.Dr.RaimundMannhold publisherdonotwarranttheinformationcon- BiomedicalResearchCenter tainedinthesebooks,includingthisbook,tobe MolecularDrugResearchGroup freeoferrors.Readersareadvisedtokeepin Heinrich-Heine-Universita¨t mindthatstatements,data,illustrations,pro- Universita¨tsstrasse1 ceduraldetailsorotheritemsmayinadvertently 40225Du¨sseldorf beinaccurate. Germany [email protected] LibraryofCongressCardNo.:Appliedfor BritishLibraryCataloging-in-PublicationDataA Prof.Dr.HugoKubinyi cataloguerecordforthisbookisavailablefrom Donnersbergstrasse9 theBritishLibrary. 67256WeisenheimandSand Germany BibliographicinformationpublishedbyDie [email protected] DeutscheBibliothek DieDeutscheBibliothekliststhispublicationin Prof.Dr.GerdFolkers theDeutscheNationalbibliografie;detailed DepartmentofAppliedBiosciences bibliographicdataisavailableintheinternetat ETHZu¨rich http://dnb.ddb.de. Winterthurerstrasse19 8057Zu¨rich 2005WILEY-VCHVerlagGmbH&Co. Switzerland KGaAWeinheim [email protected] Allrightsreserved(includingthoseof translationintootherlanguages).Nopartofthis bookmaybereproducedinanyform–nor VolumeEditor: transmittedortranslatedintomachinelanguage withoutwrittenpermissionfromthepublishers. Prof.Dr.TudorI.Oprea Registerednames,trademarks,etc.usedinthis DivisionofBiocomputing book,evenwhennotspecificallymarkedassuch, MSC084560 arenottobeconsideredunprotectedbylaw. UniversityofNewMexico SchoolofMedicine PrintedintheFederalRepublicofGermany Albuquerque,NM87131 Printedonacid-freepaper USA Composition LaserwordsPrivateLtd, Chennai,India Printing betz-druckGmbH,Darmstadt Bookbinding Litges&DopfBuchbinderei GmbH,Heppenheim ISBN-13:978-3-527-30753-1 ISBN-10:3-527-30753-2 V Contents APersonalForeword XV Preface XVII ListofContributors XIX 1 IntroductiontoChemoinformaticsinDrugDiscovery– APersonalView 1 GarlandR.Marshall 1.1 Introduction 1 1.2 HistoricalEvolution 4 1.3 KnownversusUnknownTargets 5 1.4 GraphTheoryandMolecularNumerology 6 1.5 Pharmacophore 7 1.6 Active-AnalogApproach 8 1.7 Active-SiteModeling 9 1.8 ValidationoftheActive-AnalogApproachandActive-SiteModeling 10 1.9 PLS/CoMFA 11 1.10 PredictionofAffinity 12 1.11 ProteinStructurePrediction 13 1.12 Structure-BasedDrugDesign 15 1.13 RealWorldPharmaceuticalIssues 15 1.14 CombinatorialChemistryandHigh-throughputScreens 16 1.15 DiversityandSimilarity 16 1.16 PredictionofADME 17 1.17 FailurestoAccuratelyPredict 17 1.18 Summary 18 References 19 ChemoinformaticsinDrugDiscovery.EditedbyTudor I. Oprea Copyright2004WILEY-VCHVerlagGmbH&Co.KGaA,Weinheim ISBN:3-527-30753-2 VI Contents PartI VirtualScreening 23 2 ChemoinformaticsinLeadDiscovery 25 TudorI.Oprea 2.1 ChemoinformaticsintheContextofPharmaceuticalResearch 25 2.2 LeadsintheDrugDiscoveryParadigm 27 2.3 IsThereaTrendforHighActivityMolecules? 29 2.4 TheConceptofLeadlikeness 32 2.5 Conclusions 37 References 38 3 ComputationalChemistry,MolecularComplexityandScreening SetDesign 43 MichaelM.Hann,AndrewR.Leach,andDarrenV.S.Green 3.1 Introduction 43 3.2 BackgroundConcepts:theVirtual,TangibleandRealWorldsof Compounds,the‘‘KnowledgePlot’’andTargetTractability 44 3.3 TheConstructionofHighThroughputScreeningSets 45 3.4 CompoundFilters 47 3.5 ‘‘Leadlike’’ScreeningSets 48 3.6 FocusedandBiasedSetDesign 54 3.7 Conclusion 55 References 56 4 AlgorithmicEnginesinVirtualScreening 59 MatthiasRarey,ChristianLemmen,andHansMatter 4.1 Introduction 59 4.2 SoftwareToolsforVirtualScreening 61 4.3 PhysicochemicalModelsinVirtualScreening 62 4.3.1 IntermolecularForcesinProtein–LigandInteractions 63 4.3.2 ScoringFunctionsforProtein–LigandRecognition 66 4.3.3 CoveringConformationalSpace 67 4.3.4 ScoringStructuralAlignments 68 4.4 AlgorithmicEnginesinVirtualScreening 69 4.4.1 MathematicalConcepts 69 4.4.2 AlgorithmicConcepts 76 4.4.3 DescriptorTechnology 81 4.4.4 GlobalSearchAlgorithms 85 4.5 EnteringtheRealWorld:VirtualScreeningApplications 89 4.5.1 PracticalConsiderationsonVirtualScreening 89 4.5.2 SuccessfulApplicationsofVirtualScreening 91 4.6 PracticalVirtualScreening:SomeFinalRemarks 99 References 101 Contents VII 5 StrengthsandLimitationsofPharmacophore-BasedVirtual Screening 117 DragosHorvath,BoryeuMao,RafaelGozalbes,Fre´de´riqueBarbosa, andSherryL.Rogalski 5.1 Introduction 117 5.2 The‘‘Pharmacophore’’Concept:PharmacophoreFeatures 117 5.3 PharmacophoreModels:ManagingPharmacophore-related Information 118 5.4 TheMainTopicofThisPaper 119 5.5 TheCox2DataSet 119 5.6 PharmacophoreFingerprintsandSimilaritySearches 120 5.7 MolecularFieldAnalysis(MFA)-BasedPharmacophore Information 123 5.8 QSARModels 125 5.9 HypothesisModels 125 5.10 TheMinimalistOverlay-IndependentQSARModel 126 5.11 MinimalistandConsensusOverlay-BasedQSARModels 128 5.12 DiversityAnalysisoftheCox2CompoundSet 131 5.13 DoHypothesisModelsActuallyTellUsMoreThanSimilarityModels AbouttheStructuralReasonsofActivity? 131 5.14 WhyDidHypothesisModelsFailtoUnveiltheKeyCox2Site–Ligand Interactions? 134 5.15 Conclusions 136 References 137 PartII HitandLeadDiscovery 141 6 EnhancingHitQualityandDiversityWithinAssayThroughput Constraints 143 IainMcFadyen,GaryWalker,andJuanAlvarez 6.1 Introduction 143 6.1.1 WhatMakesaGoodLeadMolecule? 144 6.1.2 CompoundCollections – SuitabilityasLeads 144 6.1.3 CompoundCollections – Diversity 145 6.1.4 DataReliability 146 6.1.5 SelectionMethods 149 6.1.6 EnhancingQualityandDiversityofActives 153 6.2 Methods 154 6.2.1 ScreeningLibrary 155 6.2.2 DeterminationofActivityThreshold 156 6.2.3 Filtering 156 6.2.4 High-ThroughputScreenClusteringAlgorithm(HTSCA) 157 6.2.5 DiversityAnalysis 160 VIII Contents 6.2.6 DataVisualization 161 6.3 Results 162 6.3.1 PeptideHydrolase 162 6.3.2 ProteinKinase 167 6.3.3 Protein–ProteinInteraction 168 6.4 DiscussionandConclusion 169 References 172 7 MolecularDiversityinLeadDiscovery:FromQuantitytoQuality 175 CullenL.Cavallaro,DoraM.Schnur,andAndrewJ.Tebben 7.1 Introduction 175 7.2 LargeLibrariesandCollections 176 7.2.1 MethodsandExamplesforLargeLibraryDiversityCalculations 177 7.3 Medium-sized/Target-classLibrariesandCollections 181 7.3.1 ComputationalMethodsforMedium-andTarget-classLibrariesand Collections 183 7.4 SmallFocusedLibraries 189 7.4.1 ComputationalMethodsforSmallandFocusedLibraries 190 7.5 Summary/Conclusion 191 References 192 8 InSilicoLeadOptimization 199 ChrisM.W.Ho 8.1 Introduction 199 8.2 TheRiseofComputer-aidedDrugRefinement 200 8.3 RACHELSoftwarePackage 201 8.4 ExtractionofBuildingBlocksfromCorporateDatabases 201 8.5 IntelligentComponentSelectionSystem 203 8.6 DevelopmentofaComponentSpecificationLanguage 205 8.7 FiltrationofComponentsUsingConstraints 207 8.8 Template-drivenStructureGeneration 208 8.9 ScoringFunctions – MethodstoEstimateLigand–Receptor Binding 209 8.10 TargetFunctions 212 8.11 LigandOptimizationExample 214 References 219 Contents IX PartIII DatabasesandLibraries 221 9 WOMBAT:WorldofMolecularBioactivity 223 MariusOlah,MariaMracec,LilianaOstopovici,RamonaRad,AlinaBora, NicoletaHadaruga,IonelaOlah,MagdalenaBanda,ZenoSimon, MirceaMracec,andTudorI.Oprea 9.1 Introduction – BriefHistoryoftheWOMBATProject 223 9.2 WOMBAT2004.1Overview 224 9.3 WOMBATDatabaseStructure 227 9.4 WOMBATQualityControl 228 9.5 UncoveringErrorsfromLiterature 231 9.6 DataMiningwithWOMBAT 234 9.7 ConclusionsandFutureChallenges 235 References 237 10 Cabinet–ChemicalandBiologicalInformaticsNetwork 241 VeraPovolna,ScottDixon,andDavidWeininger 10.1 Introduction 241 10.1.1 IntegrationEfforts,WWWasInformationResourceand Limitations 241 10.1.2 Goals 243 10.2 MeritsofFederationRatherthanUnification 243 10.2.1 TheMeritsofUnification 244 10.2.2 TheMeritsofFederation 244 10.2.3 UnifyingDisparateDataModelsisDifficult,Federatingthem isEasy 245 10.2.4 LanguageisaNaturalKey 246 10.3 HTTPisAppropriateCommunicationTechnology 248 10.3.1 HTTPisSpecificallyDesignedforCollaborativeComputing 248 10.3.2 HTTPistheDominantCommunicationProtocolToday 248 10.3.3 HTMLProvidesaUniversallyAccessibleGUI 249 10.3.4 MIME‘‘Text/Plain’’and‘‘Application/Octet-Stream’’areImportant Catch-alls 249 10.3.5 OtherMIMETypesareUseful 250 10.3.6 OneSignificantHTTPWork-aroundisRequired 250 10.4 Implementation 251 10.4.1 DaylightHTTPToolkit 251 10.4.2 Metaphorics’CabinetLibrary 252 10.5 SpecificExamplesofFederatedServices 252 10.5.1 Empath – MetabolicPathwayChart 253 10.5.2 Planet – Protein–ligandAssociationNetwork 254 10.5.3 ECBook – EnzymeCommissionCodebook 254 10.5.4 WDI – WorldDrugIndex 254 X Contents 10.5.5 WOMBAT – WorldofMolecularBioactivity 255 10.5.6 TCM(TraditionalChineseMedicines),DCM(DictionaryofChinese Medicine),PARK(PhotoARKive)andzi4 255 10.5.7 Cabinet‘‘Download’’Service 256 10.5.8 CabinetUsageExample 256 10.6 DeploymentandRefinement 262 10.6.1 LocalDeployment 264 10.6.2 IntranetDeployment 264 10.6.3 InternetDeployment 265 10.6.4 OnlineDeployment 266 10.7 Conclusions 266 References 268 11 StructureModificationinChemicalDatabases 271 PeterW.KennyandJensSadowski 11.1 Introduction 271 11.2 Permute 274 11.2.1 ProtonationandFormalCharges 274 11.2.2 Tautomerism 275 11.2.3 NitrogenConfigurations 276 11.2.4 DuplicateRemoval 276 11.2.5 NestedLoop 276 11.2.6 ApplicationStatistics 277 11.2.7 ImpactonDocking 277 11.3 Leatherface 279 11.3.1 ProtonationandFormalCharges 279 11.3.2 Tautomerism 280 11.3.3 IonizationandTautomerModel 281 11.3.4 RelationshipsbetweenStructures 282 11.3.5 SubstructuralSearchingandAnalysis 283 11.4 ConcludingRemarks 283 References 284 12 RationalDesignofGPCR-specificCombinationalLibraries BasedontheConceptofPrivilegedSubstructures 287 NikolayP.Savchuk,SergeyE.Tkachenko,andKonstantinV.Balakin 12.1 Introduction – CombinatorialChemistryandRationalDrug Design 287 12.2 RationalSelectionofBuildingBlocksBasedonPrivilegedStructural Motifs 288 12.2.1 PrivilegedStructuresandSubstructuresintheDesignof PharmacologicallyRelevantCombinatorialLibraries 288