ebook img

Machine Learning A First Course for Engineers and Scientists [draft] PDF

275 Pages·2021·11.288 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Machine Learning A First Course for Engineers and Scientists [draft]

MACHINE LEARNING A First Course for Engineers and Scientists Andreas Lindholm, Niklas Wahlström, Fredrik Lindsten, Thomas B. Schön Draft version: April 30, 2021 This material will be published by Cambridge University Press. This pre-publication version is free to view and download for personal use only. Not © for re-distribution, re-sale or use in derivative works. The authors, 2021. Feedback and exercise problems: http://smlbook.org Contents Notation 5 1 Introduction 7 1.1 Machinelearningexemplified. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 Aboutthisbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3 Furtherreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2 Supervisedlearning: afirstapproach 17 2.1 Supervisedmachinelearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Adistance-basedmethod: k-NN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Arule-basedmethod: Decisiontrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.4 Furtherreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3 Basicparametricmodelsandastatisticalperspectiveonlearning 37 3.1 Linearregression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2 Classificationandlogisticregression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.3 Polynomialregressionandregularization . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.4 Generalizedlinearmodels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.5 Furtherreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.A Derivationofthenormalequations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4 Understanding,evaluatingandimprovingtheperformance 57 4.1 Expectednewdataerror 𝐸 : performanceinproduction . . . . . . . . . . . . . . . . 57 new 4.2 Estimating 𝐸 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 new 4.3 Thetrainingerror–generalizationgapdecompositionof 𝐸 . . . . . . . . . . . . . . . 63 new 4.4 Thebias-variancedecompositionof 𝐸 . . . . . . . . . . . . . . . . . . . . . . . . . 69 new 4.5 Additionaltoolsforevaluatingbinaryclassifiers . . . . . . . . . . . . . . . . . . . . . . 75 4.6 Furtherreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5 Learningparametricmodels 79 5.1 Principlesofparametricmodeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2 Lossfunctionsandlikelihood-basedmodels . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.4 Parameteroptimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.5 Optimizationwithlargedatasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.6 Hyperparameteroptimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.7 Furtherreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6 Neuralnetworksanddeeplearning 113 6.1 Theneuralnetworkmodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.2 Traininganeuralnetwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.3 Convolutionalneuralnetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.4 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6.5 Furtherreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.A Derivationofthebackpropagationequations . . . . . . . . . . . . . . . . . . . . . . . . 133 ThismaterialwillbepublishedbyCambridgeUniversityPress. Thispre-publicationversionisfreetoview anddownloadforpersonaluseonly. Notforre-distribution,re-saleoruseinderivativeworks. 3 ©AndreasLindholm,NiklasWahlström,FredrikLindsten,andThomasB.Schön2021. Contents 7 Ensemblemethods: Baggingandboosting 135 7.1 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.2 Randomforests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.3 BoostingandAdaBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 7.4 Gradientboosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 7.5 Furtherreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 8 Nonlinearinputtransformationsandkernels 157 8.1 Creatingfeaturesbynonlinearinputtransformations . . . . . . . . . . . . . . . . . . . . 157 8.2 Kernelridgeregression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 8.3 Supportvectorregression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 8.4 Kerneltheory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 8.5 Supportvectorclassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 8.6 Furtherreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 8.A Therepresentertheorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 8.B Derivationofsupportvectorclassification . . . . . . . . . . . . . . . . . . . . . . . . . 176 9 TheBayesianapproachandGaussianprocesses 179 9.1 TheBayesianidea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 9.2 Bayesianlinearregression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 9.3 TheGaussianprocess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 9.4 PracticalaspectsoftheGaussianprocess . . . . . . . . . . . . . . . . . . . . . . . . . . 195 9.5 OtherBayesianmethodsinmachinelearning . . . . . . . . . . . . . . . . . . . . . . . 200 9.6 Furtherreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 9.A ThemultivariateGaussiandistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 10 Generativemodelsandlearningfromunlabeleddata 203 10.1 TheGaussianmixturemodelanddiscriminantanalysis . . . . . . . . . . . . . . . . . . 203 10.2 Clusteranalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 10.3 Deepgenerativemodels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 10.4 Representationlearninganddimensionalityreduction . . . . . . . . . . . . . . . . . . . 226 10.5 Furtherreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 11 Useraspectsofmachinelearning 235 11.1 Definingthemachinelearningproblem . . . . . . . . . . . . . . . . . . . . . . . . . . 235 11.2 Improvingamachinelearningmodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 11.3 Whatifwecannotcollectmoredata? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 11.4 Practicaldataissues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 11.5 CanItrustmymachinelearningmodel? . . . . . . . . . . . . . . . . . . . . . . . . . . 249 11.6 Furtherreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 12 Ethicsinmachinelearning 251 12.1 Fairnessanderrorfunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 12.2 Misleadingclaimsaboutperformance . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 12.3 Limitationsoftrainingdata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 12.4 Furtherreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Notation 265 Bibliography 267 Draft(April30,2021)ofMachineLearning–AFirstCourseforEngineersandScientists. http://smlbook.org 4 ©AndreasLindholm,NiklasWahlström,FredrikLindsten,andThomasB.Schön2021. Acknowledgments There are many people that have helped us throughout the writing of this book. First of all we want to mention David Sumpter who, in addition for giving feedback from using the material for teaching, contributedwiththeentirechapter12onethicalaspects. Wehavealsoreceivedvaluablefeedbackfrom manystudentsandotherteachercolleagues. Weareofcourseverygratefulforeachandeverycommentwe havereceived,andwewantinparticulartomentionDavidWidmann,AdrianWills,JohannesHendricks, MattiasVillani,DmitrijsKassandJoelOskarsson. Wehavealsoreceivedusefulfeedbackonthetechnical content of the book, including the practical insights in chapter 11 from Agrin Hilmkil (at Peltarion), Salla Franzén and Alla Tarighati (at SEB), Lawrence Murray (at Uber), James Hensman and Alexis Boukouvalas(atSecondmind),JoelKronanderandNandDalal(atNines)andPeterLindskogochJacob Roll (at Arriver). We did also receive valuable comments from Arno Solin on chapters 8 and 9, and JoakimLindbladonChapter6. Thereareseveralpeoplewhohelpeduswiththefiguresillustratingthe examples in Chapter 1, namely Antônio Ribeiro (Figure 1.1), Fredrik K. Gustafsson (Figure 1.4) and TheodorosDamoulas(Figure1.5). Thankyouallforyourhelp! Duringthewritingofthisbook,weenjoyedfinancialsupportfromAICompetenceforSweden,the SwedishResearchCouncil(projects: 2016-04278,2016-06079,2017-03807,2020-04122),theSwedish FoundationforStrategicResearch(projects: ICA16-0015,RIT12-0012),theWallenbergAI,Autonomous SystemsandSoftwareProgram(WASP)fundedbytheKnutandAliceWallenbergFoundation,ELLIIT andtheKjellochMärtaBeijerFoundation. WearefinallythankfultoLaurenCowlesatCambridgeUniversityPressforhelpfuladviceandguidance throughthepublishingprocess. ThismaterialwillbepublishedbyCambridgeUniversityPress. Thispre-publicationversionisfreetoview anddownloadforpersonaluseonly. Notforre-distribution,re-saleoruseinderivativeworks. 5 ©AndreasLindholm,NiklasWahlström,FredrikLindsten,andThomasB.Schön2021. 1 Introduction Machinelearningisaboutlearning,reasoning,andactingbasedondata. Thisisdonebyconstructing computerprogramsthatprocessthedata,extractusefulinformation,makepredictionsregardingunknown properties, and suggest actions to take or decisions to make. What turns data analysis into machine learning,isthattheprocessisautomatedandthatthecomputerprogramislearntfromdata. Thismeans that generic computer programs are used, which are adapted to application-specific circumstances by automaticallyadjustingthesettingsoftheprogrambasedonobserved,so-called,trainingdata. Itcan thereforebesaidthatmachinelearningisawayofprogrammingbyexamples. Thebeautyofmachine learningisthatitisquitearbitrarywhatthedatarepresent,andwecandesigngeneralmethodsthatare useful for a wide range of practical applications in different domains. We illustrate this via a range of examplesbelow. The“genericcomputerprogram”referredtoabovecorrespondstoamathematicalmodelofthedata. Thatis,whenwedevelopanddescribedifferentmachinelearningmethods,wedothisusingthelanguage of mathematics. The mathematical model describes a relationship between the involved quantities, or variables,thatcorrespondtotheobserveddataandthepropertiesofinterest(suchaspredictions,actions, etc.). Hence, the model is a compact representation of the data that, in a precise mathematical form, capturesthekeypropertiesofthephenomenonwearestudying. Whichmodeltomakeuseofistypically guidedbythemachinelearningengineer’sinsightsgeneratedwhenlookingattheavailabledataandthe practitioner’s general understanding of the problem. When implementing the method in practice, this mathematicalmodelistranslatedintocodethatcanbeexecutedonacomputer. However,tounderstand whatthecomputerprogramactuallydoes,itisimportanttoalsounderstandtheunderlyingmathematics. Asmentionedabove,themodel(orcomputerprogram)islearntbasedontheavailabletrainingdata. Thisisaccomplishedbyusingalearningalgorithmwhichiscapableofautomaticallyadjustingthesettings, orparameters,ofthemodeltoagreewiththedata. Insummary,thethreecornerstonesofmachinelearning are: 1.Thedata 2.Themathematicalmodel 3.Thelearningalgorithm In this introductory chapter we will give a taste of the machine learning problem by illustrating these cornerstones with a few examples. They come from different application domains and have different properties,butnevertheless,theycanallbeaddressedusingsimilartechniquesfrommachinelearning. We alsogivesomeadviseonhowtoproceedthroughtherestofthebookand,attheend,providereferencesto goodbooksonmachinelearningfortheinterestedreaderwhowantstodigfurtherintothistopic. 1.1 Machine learning exemplified Machinelearningisamultifacetedsubject. Wegaveabriefandhigh-leveldescriptionofwhatitentails above,butthiswillbecomemuchmoreconcreteasweproceedthroughoutthisbookandintroducespecific methodsandtechniquesforsolvingvariousmachinelearningproblems. However,beforediggingintothe detailswewilltrytogiveanintuitiveanswertothequestion“Whatismachinelearning?”,bydiscussing afewapplicationexampleswhereitcan(andhas)beenused. Westartwithanexamplerelatedtomedicine,morepreciselycardiology. ThismaterialwillbepublishedbyCambridgeUniversityPress. Thispre-publicationversionisfreetoview anddownloadforpersonaluseonly. Notforre-distribution,re-saleoruseinderivativeworks. 7 ©AndreasLindholm,NiklasWahlström,FredrikLindsten,andThomasB.Schön2021. 1 Introduction Example1.1: Automaticallydiagnosingheartabnormalities Theleadingcauseofdeathgloballyisconditionsthataffectheartandbloodvessels,collectivelyreferredto ascardiovasculardiseases. Heartproblemsofteninfluencetheelectricalactivityoftheheart,whichcanbe measuredusingelectrodesattachedtothebody. Theelectricalsignalsarereportedinanelectrocardiograms (ECG).InFigure1.1weshowexamplesof(partsof)themeasuredsignalsfromthreedifferenthearts. The measurementsstemfromahealthyheart(top),aheartsufferingfromatrialfibrillation(middle),andaheart sufferingfromrightbundlebranchblock(bottom). Atrialfibrillationmakestheheartbeatwithoutorder makingithardforthehearttopumpbloodinanormalway. Rightbundlebranchblockcorrespondstoa delayorblockageintheelectricpathwaysoftheheart. Fig. 1.1 ByanalyzingtheECGsignal,acardiologistgainsvaluableinformationabouttheconditionoftheheart, thatcanbeusedtodiagnosethepatientandplanthetreatment. Toimprovethediagnosticaccuracy,aswellassavetimeforthecardiologists,wecanaskourselvesifthis processcanbeautomatedtosomeextent. Thatis,canweconstructacomputerprogramwhichreadsin theECGsignals,analysesthedata,andreturnsapredictionregardingthenormalityorabnormalityofthe heart? Suchmodels,capableofaccuratelyinterpretinganECGexaminanautomatedfashion,willfind applicationsglobally,buttheyaremostacuteinlow-andmiddle-incomecountries. Animportantreason forthisisthatthepopulationinthesecountriesoftendonothaveeasyanddirectaccesstohighlyskilled cardiologistscapableofaccuratelycarryingoutECGdiagnosis. Furthermore,cardiovasculardiseasesin thesecountriesarerelatedtomorethan75%ofthedeaths. Thekeychallengeinbuildingsuchacomputerprogramisthatitisfarfromobviouswhichcomputations thatareneededforturningtherawECGsignalintoapredicationabouttheheartcondition. Evenifan experiencedcardiologistwouldtrytoexplaintoasoftwaredeveloperwhichpatternsinthedatatolookfor, translatingthecardiologist’sexperienceintoareliablecomputerprogramwouldbeextremelychallenging. To tackle this difficulty, the machine learning approach is to instead learn the computer program by examples. Specifically,insteadofaskingthecardiologisttospecifyasetofrulesforhowtoclassifyanECG signalasnormalorabnormal,wesimplyaskthecardiologist(oragroupofcardiologists)tolabelalarge numberofrecordedECGsignalswithlabelscorrespondingtothetheunderlyingheartcondition. Thisisa mucheasier(albeitpossiblytedious)wayforthecardiologiststocommunicatetheirexperienceandencode itinawaywhichisinterpretablebyacomputer. The task of the learning algorithm is then to automatically adapt the computer program so that its predictionsagreewiththecardiologists’labelsonthelabeledtrainingdata. Thehopeisthat,ifitsucceeds onthetrainingdata(wherewealreadyknowtheanswer),thenitshouldbepossibletousethepredictions madethebyprogramonpreviouslyunseendata(wherewedonotknowtheanswer)aswell. ThisistheapproachtakenbyRibeiroetal.(2020)whodevelopedamachinelearningmodelforECG prediction. In their study, the training data consists of more than 2300000 ECG records from almost 1700000differentpatientsofthestateofMinasGerais/Brazil. Morespecifically,eachECGcorrespondsto 12timeseries(onefromeachofthetwelveelectrodesthatwereusedinconductingtheexam)ofaduration betweenseventotensecondseach,sampledatfrequenciesrangingfrom300Hzto600Hz. TheseECGscan Draft(April30,2021)ofMachineLearning–AFirstCourseforEngineersandScientists. http://smlbook.org 8 ©AndreasLindholm,NiklasWahlström,FredrikLindsten,andThomasB.Schön2021. 1.1 Machinelearningexemplified beusedtoprovideafullevaluationoftheelectricactivityoftheheartanditisindeedthemostcommonly usedexaminevaluatingtheheart. Importantly,eachECGinthedatasetalsocomeswithanoutputsortingit intodifferentclasses—noabnormalities,atrialfibrillation,rightbundlebranchblock,etc.—accordingtothe statusoftheheart. Basedonthisdata,amachinelearningmodelistrainedtoautomaticallyclassifyanew ECGrecordingwithoutrequiringahumandoctortobeinvolved. Themodelusedisadeepneuralnetwork, morespecificallyaso-calledresidualnetworkthatiscommonlyusedforimages. Theresearchersadapted thistoalsoworkfortheECGsignalsofrelevanceforthisstudy. InChapter6weintroducedeeplearning modelsandtheirtrainingalgorithms. Evaluatinghowamodellikethiswillperforminpracticeisnotstraightforward. Theapproachtakenin thisstudywastoaskthreedifferentcardiologistswithexperienceinelectrocardiographytoexamineand classify827ECGrecordingsfromdistinctpatients. Thisdatasetwasthenevaluatedbythealgorithm,two 4thyearcardiologyresidents,two3rdyearemergencyresidents,andtwo5thyearmedicalstudents. The averageperformancewasthencompared. Theresultwasthatthealgorithmachievedbetterorthesame resultwhencomparedtothehumanperformanceonclassifyingsixtypesofabnormalities. Beforewemoveon,letuspauseandreflectontheexampleintroducedabove. Infact,manyconcepts thatarecentraltomachinelearningcanberecognizedinthisexample. Aswementionedabove,thefirstcornerstoneofmachinelearningisthedata. Takingacloserlookat whatthedataactuallyis,wenotethatitcomesindifferentforms. First,wehavethetrainingdatawhichis usedtolearnthemodel. EachtrainingdatapointconsistsofboththeECGsignal,whichwerefertoasthe input,anditslabelcorrespondingtothetypeofheartconditionseeninthissignal,whichwerefertoas the output. To train the model we need access to both the inputs and the outputs, where the latter had tobemanuallyassignedbydomainexperts(orpossiblysomeauxiliaryexamination). Trainingamodel fromlabeleddatapointsisthereforereferredtoassupervisedlearning. Wethinkofthelearningasbeing supervised by the domain expert, and the learning objective is to obtain a computer program that can mimicthelabelingdonebytheexpert. Second,wehavethe(unlabeled)ECGsignalsthatwillbefedto theprogramwhenitisused“inproduction”. Itisimportanttorememberthattheultimategoalofthe model is to obtain accurate predictions in this second phase. We say that the predictions made by the modelmustgeneralizebeyondthetrainingdata. Howtolearnmodelsthatarecapableofgeneralizing, andhowtoevaluatetowhatextenttheydo,isacentraltheoreticalquestionstudiedthroughoutthisbook (seeinparticularChapter4). We illustrate the training of the ECG prediction model in Figure 1.2. The general structure of the training procedure is however the same (or at least very similar) for all supervised machine learning problems. Trainingdata Labelse.g.healty, art.fib.,RBBB Unseendata ? Learning Model prediction algorithm updatemodel Model prediction Figure1.2:Illustratingthesupervisedmachinelearningprocesswithtrainingtotheleftandthentheuseofthe trainedmodeltotheright. Left: Valuesfortheunknownparametersofthemodelaresetbythelearningalgorithm suchthatthemodelbestdescribestheavailabletrainingdata. Right: Thelearnedmodelisusedonnew,previously unseendata,wherewehopetoobtainacorrectclassification. Itisthusessentialthatthemodelisabletogeneralize tonewdatathatisnotpresentinthetrainingdata. AnotherkeyconceptthatweencounteredintheECGexampleisthenotionofaclassificationproblem. ThismaterialwillbepublishedbyCambridgeUniversityPress. Thispre-publicationversionisfreetoview anddownloadforpersonaluseonly. Notforre-distribution,re-saleoruseinderivativeworks. 9 ©AndreasLindholm,NiklasWahlström,FredrikLindsten,andThomasB.Schön2021. 1 Introduction Classificationisasupervisedmachinelearningtaskwhichamountstopredictingacertainclass,orlabel, for each data point. Specifically, for classification problems there are only a finite number of possible outputvalues. IntheECGexample,theclassescorrespondtothetypeofheartcondition. Forinstance, theclassescouldbe’normal’or’abnormal’,inwhichcasewerefertoitasabinaryclassificationproblem (onlytwopossibleclasses). Moregenerally,wecoulddesignamodelforclassifyingeachsignalaseither ’normal’,orassignittooneofapredeterminedsetofabnormalities. Wethenfacea(moreambitious) multi-classclassificationproblem. Classificationishowevernottheonlyapplicationofsupervisedmachinelearningthatwewillencounter. Specifically,wewillalsostudyanothertypeofproblemsreferredtoasregressionproblems. Regression differsfromclassificationinthattheoutput(thatis,thequantitythatwewantthemodeltopredict)isa numericalvalue. Weillustratewithanexamplefrommaterialscience. Example1.2: Formationenergyofcrystals Muchofourtechnologicaldevelopmentisdrivenbythediscoveryofnewmaterialswithuniqueproperties. Indeed,technologiessuchastouchscreensandbatteriesforelectricvehicleshaveemergedduetoadvances inmaterialsscience. Traditionally,materialsdiscoverywaslargelydonethroughexperiments,butthisisboth timeconsumingandcostly,whichlimitedthenumberofnewmaterialsthatcouldbefound. Overthepast fewdecades,computationalmethodshavethereforeplayedanincreasinglyimportantrole. Thebasicidea behindcomputationalmaterialsscienceistoscreenaverylargenumberofhypotheticalmaterials,predict variouspropertiesofinterestbycomputationalmethods,andthenattempttoexperimentallysynthesizethe mostpromisingcandidates. Crystallinesolids(or,simply,crystals)areacentraltypeofinorganicmaterials. Inacrystal,theatomsare arrangedinahighlyorderedmicroscopicstructure. Hence,tounderstandthepropertiesofsuchamaterial, itisnotenoughtoknowtheproportionofeachelementinthematerial,butwealsoneedtoknowhowthese elements(oratoms)arearrangedintoacrystal. Abasicpropertyofinterestwhenconsideringahypothetical materialisthereforetheformationenergyofthecrystal. Theformationenergycanbethoughtofasthe energythatnatureneedstospendtoformthecrystalfromtheindividualelements. Naturestrivesforfinding aminimumenergyconfiguration. Hence,ifacertaincrystalstructureispredictedtohaveaformationenergy thatissignificantlylargerthanalternativecrystalscomposedofthesameelements,thenitisunlikelythatit canbesynthesizedinastablewayinpractice. A classical method (going back to the 60’s) that can be used for computing the formation energy is, so-called, density functional theory (DFT). The DFT method, which is based on quantum mechanical modeling, paved the way for the first breakthrough in computational materials science, enabling high throughputscreeningformaterialsdiscovery. Thatbeingsaid,theDFTmethodiscomputationallyvery heavyandevenwithmodernsupercomputers,onlyasmallfractionofallpotentiallyinterestingmaterials havebeenanalyzed. Tohandlethislimitation, muchrecentinteresthasbeenpaidtousingmachinelearningformaterials discovery, with the potential of resulting in a second computational revolution. By training a machine learningmodelto,forinstance,predicttheformationenergy—butinafractionofthecomputationaltime requiredbyDFT—amuchlargerrangeofcandidatematerialscanbeinvestigated. Asaconcreteexample,Faberetal.(2016)usedamachinelearningmethodreferredtoaskernelridge regression(seeChapter8)topredicttheformationenergyofaround2million,so-called,elpasolitecrystals. Themachinelearningmodelisacomputerprogramwhichtakesacandidatecrystalasinput(essentially, a description of the positions and elemental types of the atoms in the crystal), and is asked to return a predictionoftheformationenergy. Totrainthemodel,10000crystalswererandomlyselectedandtheir formationenergieswerecomputedusingDFT.Themodelwasthentrainedtopredictformationenergiesto agreeascloselyaspossiblewiththeDFToutputonthetrainingset. Oncetrained,themodelwasusedto predicttheenergyontheremaining 99.5%ofthepotentialelpasolites. Amongthese,128newcrystal ∼ structureswerefoundtohaveafavorableenergy,therebybeingpotentiallystableinnature. Comparingthetwoexamplesdiscussedabove,wecanmakeafewinterestingobservations. Asalready pointedout,adifferenceisthattheECGmodelisaskedtopredictacertainclass(say,normalorabnormal), whereasthematerialsdiscoverymodelisaskedtopredictanumericalvalue(theformationenergyofa crystal). Thesearethetwomaintypesofpredictionproblemsthatwewillstudyinthisbook,referredto asclassificationandregression,respectively. Whileconceptuallysimilar,weoftenuseslightvariationsof theunderpinningmathematicalmodels,dependingontheproblemtype. Itisthereforeinstructivetotreat Draft(April30,2021)ofMachineLearning–AFirstCourseforEngineersandScientists. http://smlbook.org 10 ©AndreasLindholm,NiklasWahlström,FredrikLindsten,andThomasB.Schön2021.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.