Data-DrivenComputationalNeuroscience Data-drivencomputationalneurosciencefacilitatesthetransformationofdataintoinsights onthestructureandfunctionsofthebrain.Thisintroductionforresearchersandgraduate studentsisthefirstin-depth,comprehensivetreatmentofstatisticalandmachinelearning methods for neuroscience. The methods are demonstrated through case studies of real problemstoempowerreaderstobuildtheirownsolutions.Thebookcoversawidevariety of methods, including supervised classification with non-probabilistic models (nearest- neighbors,classificationtrees,ruleinduction,artificialneuralnetworks,andsupportvec- tor machines) and probabilistic models (discriminant analysis, logistic regression, and Bayesian network classifiers), metaclassifiers, multidimensional classifiers, and feature subsetselectionmethodsaswellasunsupervisedclassification.Otherpartsofthebookare devotedtoassociationdiscoverywithprobabilisticgraphicalmodels(Bayesiannetworks andMarkovnetworks)andspatialstatisticswithpointprocesses(completespatialrandom- ness and cluster, regular, and Gibbs processes). Cellular, structural, functional, medical, andbehavioralneurosciencelevelsareconsidered. Concha Bielza is a professor in the Department of Artificial Intelligence at Universidad Politécnica de Madrid. She has published more than 120 impact factor journal papers and coauthored the book Industrial Applications of Machine Learning (2019). She was awarded the2014UPM Research Prizeandreceived the2020MachineLearningAward fromtheAmityUniversityinIndia. PedroLarrañagaisaprofessorintheDepartmentofArtificialIntelligenceatUniversidad PolitécnicadeMadrid.Hehaspublishedmorethan150journalpapersandcoauthoredthe bookIndustrialApplicationsofMachineLearning(2019).HeisFellowoftheEuropean Association for Artificial Intelligence and of Academia Europaea. He received the 2020 MachineLearningAwardfromtheAmityUniversityinIndia. Data-Driven Computational Neuroscience Machine Learning and Statistical Models CONCHA BIELZA UniversidadPolitécnicadeMadrid PEDRO LARRAÑAGA UniversidadPolitécnicadeMadrid UniversityPrintingHouse,CambridgeCB28BS,UnitedKingdom OneLibertyPlaza,20thFloor,NewYork,NY10006,USA 477WilliamstownRoad,PortMelbourne,VIC3207,Australia 314–321,3rdFloor,Plot3,SplendorForum,JasolaDistrictCentre,NewDelhi–110025,India 79AnsonRoad,#06–04/06,Singapore079906 CambridgeUniversityPressispartoftheUniversityofCambridge. ItfurtherstheUniversity’smissionbydisseminatingknowledgeinthepursuitof education,learning,andresearchatthehighestinternationallevelsofexcellence. www.cambridge.org Informationonthistitle:www.cambridge.org/9781108493703 DOI:10.1017/9781108642989 ©ConchaBielzaandPedroLarrañaga2021 Thispublicationisincopyright.Subjecttostatutoryexception andtotheprovisionsofrelevantcollectivelicensingagreements, noreproductionofanypartmaytakeplacewithoutthewritten permissionofCambridgeUniversityPress. Firstpublished2021 PrintedintheUnitedKingdombyTJBooksLimited,PadstowCornwall AcataloguerecordforthispublicationisavailablefromtheBritishLibrary. LibraryofCongressCataloging-in-PublicationData Names:Bielza,Concha,author.|Larrañaga,Pedro,1958–author. Title:Data-drivencomputationalneuroscience:machinelearningandstatistical models/ConchaBielza,UniversidadPolitécnicadeMadrid,PedroLarrañaga, UniversidadPolitécnicadeMadrid Description:Cambridge,UnitedKingdom;NewYork,NY:CambridgeUniversityPress, 2020.|Includesbibliographicalreferencesandindex. Identifiers:LCCN2019060117(print)|LCCN2019060118(ebook)| ISBN9781108493703(hardback)|ISBN9781108642989(epub) Subjects:LCSH:Neurosciences–Dataprocessing.|Neurosciences–Statisticalmethods. Classification:LCCQP357.5.B542020(print)|LCCQP357.5(ebook)|DDC612.8–dc23 LCrecordavailableathttps://lccn.loc.gov/2019060117 LCebookrecordavailableathttps://lccn.loc.gov/2019060118 ISBN978-1-108-49370-3Hardback CambridgeUniversityPresshasnoresponsibilityforthepersistenceoraccuracy ofURLsforexternalorthird-partyinternetwebsitesreferredtointhispublication anddoesnotguaranteethatanycontentonsuchwebsitesis,orwillremain, accurateorappropriate. TomymotherAurora,mysistersAurora,Marga,andSilvia, andthememoryofmyfatherLuis ConchaBielza TomywifeMaría,mydaughtersNagoreandAna,andthememory ofmyparentsBegoñaandMoisés PedroLarrañaga Contents Preface pagexi ListofAcronyms xv PartI Introduction 1 1 ComputationalNeuroscience 3 1.1 TheMultilevelOrganizationoftheBrain 3 1.2 TheHumanBrain 6 1.3 BrainResearchInitiatives 8 1.4 Neurotechnologies 12 1.5 Data-DrivenComputationalNeuroscience 18 1.6 RealExamplesDiscussedinThisBook 30 PartII Statistics 51 2 ExploratoryDataAnalysis 53 2.1 DataTypes 53 2.2 UnivariateData 54 2.3 BivariateData 67 2.4 MultivariateData 70 2.5 ImputationofMissingData 87 2.6 VariableTransformation 88 2.7 BibliographicNotes 94 3 ProbabilityTheoryandRandomVariables 96 3.1 ProbabilityTheory 96 3.2 UnivariateDiscreteDistributions 100 3.3 UnivariateContinuousDistributions 106 3.4 MultivariateProbabilityDistributions 113 3.5 SimulatingRandomVariates 124 3.6 InformationTheory 136 3.7 BibliographicNotes 140 viii Contents 4 ProbabilisticInference 141 4.1 ParameterEstimation 141 4.2 HypothesisTests 162 4.3 BibliographicNotes 195 PartIII SupervisedClassification 199 5 PerformanceEvaluation 201 5.1 TheLearningProblem 202 5.2 PerformanceMeasures 204 5.3 PerformanceEstimation 211 5.4 StatisticalSignificanceTesting 216 5.5 ImbalancedDataSetsandAnomalyDetection 223 5.6 BibliographicNotes 224 6 FeatureSubsetSelection 226 6.1 OverviewofFeatureSubsetSelection 227 6.2 FilterApproaches 230 6.3 WrapperMethods 238 6.4 EmbeddedMethods 252 6.5 HybridFeatureSelection 254 6.6 FeatureSelectionStability 255 6.7 Example:GABAergicInterneuronNomenclature 256 6.8 BibliographicNotes 259 7 Non-probabilisticClassifiers 262 7.1 NearestNeighbors 262 7.2 ClassificationTrees 271 7.3 RuleInduction 286 7.4 ArtificialNeuralNetworks 291 7.5 SupportVectorMachines 300 7.6 BibliographicNotes 317 8 ProbabilisticClassifiers 320 8.1 BayesDecisionRule 321 8.2 DiscriminantAnalysis 325 8.3 LogisticRegression 332 8.4 BayesianNetworkClassifiers 347 8.5 BibliographicNotes 384 9 Metaclassifiers 387 9.1 MainIdeasonMetaclassifiers 387 9.2 CombiningtheOutputsofDifferentClassifiers 392 Contents ix 9.3 PopularMetaclassifiers 399 9.4 Example:InterneuronsversusPyramidalNeurons 410 9.5 Example:InterneuronsversusPyramidalNeurons;Comparison ofAllClassifiers 411 9.6 BibliographicNotes 413 10 MultidimensionalClassifiers 415 10.1 Multi-labelandMultidimensionalClassification 415 10.2 EquivalentNotationsforMulti-labelClassification 417 10.3 PerformanceEvaluationMeasures 418 10.4 LearningMethods 421 10.5 Example:QualityofLifeinParkinson’sDisease 432 10.6 BibliographicNotes 434 PartIV UnsupervisedClassification 435 11 Non-probabilisticClustering 437 11.1 Similarity/DissimilaritybetweenObjects 437 11.2 HierarchicalClustering 439 11.3 PartitionalClustering 442 11.4 ChoiceoftheNumberofClusters 456 11.5 SubspaceClustering 458 11.6 ClusterEnsembles 460 11.7 EvaluationCriteria 463 11.8 Example:DendriticSpines 465 11.9 BibliographicNotes 467 12 ProbabilisticClustering 469 12.1 TheExpectation-MaximizationAlgorithm 470 12.2 Finite-MixtureModelsforClustering 474 12.3 ClusteringwithBayesianNetworks 478 12.4 Example:DendriticSpines 484 12.5 BibliographicNotes 485 PartV ProbabilisticGraphicalModels 487 13 BayesianNetworks 489 13.1 BasicsofBayesianNetworks 490 13.2 InferenceinBayesianNetworks 503 13.3 LearningBayesianNetworksfromData 520 13.4 DynamicBayesianNetworks 537 13.5 Example:BasalDendriticTrees 539 13.6 BibliographicNotes 543