Ton J. Cleophas · Aeilko H. Zwinderman Machine Learning in Medicine Part Three Machine Learning in Medicine Ton J. Cleophas (cid:129) Aeilko H. Zwinderman Machine Learning in Medicine Part Three TonJ.Cleophas AeilkoH.Zwinderman DepartmentMedicine DepartmentBiostatisticsandEpidemiology AlbertSchweitzerHospital AcademicMedicalCenter Sliedrecht,TheNetherlands Amsterdam,TheNetherlands Additionalmaterial tothis bookcan bedownloaded from extras.springer.com ISBN978-94-007-7868-9 ISBN978-94-007-7869-6(eBook) DOI10.1007/978-94-007-7869-6 SpringerDordrechtHeidelbergNewYorkLondon LibraryofCongressControlNumber:2013955741 ©SpringerScience+BusinessMediaDordrecht2013 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionor informationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodologynowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerpts inconnectionwithreviewsorscholarlyanalysisormaterialsuppliedspecificallyforthepurposeofbeing enteredandexecutedonacomputersystem,forexclusiveusebythepurchaserofthework.Duplication ofthispublicationorpartsthereofispermittedonlyundertheprovisionsoftheCopyrightLawofthe Publisher’s location, in its current version, and permission for use must always be obtained from Springer.PermissionsforusemaybeobtainedthroughRightsLinkattheCopyrightClearanceCenter. ViolationsareliabletoprosecutionundertherespectiveCopyrightLaw. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Machine Learning in Medicine Part Three by TONJ.CLEOPHAS,MD,PhD,Professor, Past-PresidentAmericanCollegeofAngiology, Co-ChairModuleStatisticsAppliedtoClinicalTrials, EuropeanInteruniversityCollegeofPharmaceuticalMedicine,Lyon,France, DepartmentMedicine,AlbertSchweitzerHospital,Dordrecht,Netherlands AEILKOH.ZWINDERMAN,MathD,PhD,Professor, PresidentInternationalSocietyofBiostatistics, Co-ChairModuleStatisticsAppliedtoClinicalTrials, EuropeanInteruniversityCollegeofPharmaceuticalMedicine,Lyon,France, Department Biostatistics and Epidemiology, Academic Medical Center, Amsterdam, Netherlands Withthehelpfrom HENNYI.CLEOPHAS-ALLERS,BChem v Preface Machinelearningisanoveldisciplinefordataanalysisthatstartedsome40years ago with the advent of the computer. It is already widely implemented in socio-/ econometry, but not so in medicine, probably due to the traditional belief of cliniciansinclinicaltrialswheretheeffectsofmultiplevariablestendtoevenout by the randomization process and are not further taken into account. Machine learning is different from traditional data analysis, because, unlike means and standarddeviations,itusesproximitiesbetweendata,datapatterns,patternrecog- nition, data thresholding and data trafficking. It is more flexible than traditional statisticalmethods,anditcanprocessbigdataandhundredsofvariables,anddata, forwhichanalysiswithoutthehelpofacomputerwouldbeimpossible. This does not mean that the mathematics of machine learning is simple. Often integrals,derivatives,complexmatricesandothercalculusmethods,asdeveloped in the seventeenth century by great western mathematicians like Newton and Leibnitz,are applied.It isgratifying toobserve, that, three centurieslater,mathe- maticians from the same schools are responsible for the founding of important machinelearningmethodologies. Thefirsttwopartsofthisvolumereviewedbasicmachinelearningmethodslike, cluster analysis,neural networks, factor analysis, Bayesian networksand support vector machines. The current Part Three of this volume assesses more recent and moreadvancedmethodssuchastheonesdescribedunderneath: Newton’smethods,asthemostadvancedmethodsforfittingexperimentaldatatoa mathematicalpredictionfunction. Stochasticmethods,includingadvancedMarkovmodeling,adjustedforirreversible complicationsanddeath. Complexsampling,forobtainingunbiasedsamplesfrombigdemographicdata. Optimal binning, for advanced classification learning of health and health- educationparameters. Evolutionary operations, for advanced search of optimal solutions to many scientificquestions,includingevolutionaryalgorithms,neuroevolution,genetic programming,andgeneticalgorithms. vii viii Preface Each chapter of the current book was written in a way much similar to that of thefirsttwovolumes,and 1. Will describe a novel method that has already been successfully applied in the authors’ownresearch 2. Willprovidetheanalysisofmedicaldataexamples 3. Willprovidestep-by-stepanalysesforthebenefitofthereaders 4. Will provide the commands to be given to the software programs applied (mostlySPSS) 5. Canbestudiedwithouttheneedtoconsultotherchapters 6. Willbewritteninaexplanatorywayforareadershipofmainlynon-mathematicians 7. Willbeprovidedtogetherwithadatafileontheinternetthroughextras.springer. comforthebenefitofinvestigatorswhowishtoperformtheirownanalyses We should add that the authors are well-qualified in their field. Professor Zwinderman is president of the International Society of Biostatistics (2012–2015), and Prof. Cleophas is past-president of the American College of Angiology(2000–2002).Fromtheirexpertisetheyshouldbeabletomakeadequate selectionsofmodernmethodsforclinicaldataanalysisforthebenefitofphysicians, students,andinvestigators.Theauthorshavebeenworkingandpublishingtogether for 15 years, and their research can be characterized as a continued effort to demonstrate that clinical data analysis is not mathematics but rather a discipline attheinterfaceofbiologyandmathematics. The authors as professors and teachers in statistics for 20 years are convinced that the current part three of this three-volume book contains information unpublished so far and vital to the analysis of the complex medical data as commonlywitnessedtoday. Machinelearningisanoveldisciplineconcernedwiththeanalysisoflargedata and multiple variables. It involves computationally intensive methods, and is currently mainly the domain of computer scientists, and is already commonly used in social sciences, marketing research, operational research and applied sciences. Machinelearningisvirtuallyunusedinclinicalresearch.Thisisprobablydueto the traditional belief of clinicians in clinical trials where multiple variables are equally balanced by the randomization process and are not further taken into account. In contrast, modern computer data files often involve hundreds of vari- ables like genes and other laboratory values, and computationally intensive methodsarerequiredfortheiranalysis. Lyon,France TonJ.Cleophas 02-09-2013 AeilkoH.Zwinderman Contents 1 IntroductiontoMachineLearninginMedicinePartThree. . . . . . 1 1 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Objective. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 ResultsandConclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 3 ContentsofPartOneofThis3VolumeBook. . . . . . . . . . . . . . . 2 4 ContentsofPartTwoofThis3VolumeBook. . . . . . . . . . . . . . . 3 5 ContentsofPartThreeofThis3VolumeBook. . . . . . . . . . . . . . 5 6 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 References. . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . 9 2 EvolutionaryOperations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.1 Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2 Objective. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Methods. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. 11 1.4 Results.. . . .. . . .. . . .. . . . .. . . .. . . .. . . .. . . .. . . .. . 11 1.5 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3 ExampleData. . . .. . . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . 13 4 MethodologicalBackgroundofGeneticAlgorithms. . . . . . . . . . . 13 5 ResultsofaGeneticAlgorithmtoVariableSelection inOurExampleData. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6 Discussion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 7 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 References. . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . 17 3 MultipleTreatments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.1 Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.2 Objective. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.3 Methods. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. 19 ix