ebook img

Scientific Discovery using Genetic Programming PDF

160 Pages·2002·0.933 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Scientific Discovery using Genetic Programming

Scientific Discovery using Genetic Programming Maarten Keijzer LYNGBY2001 IMM-PHD-xx IMM iii Abstra t Geneti Programming is apable of automati ally indu ingsymboli omputer pro- gramsonthebasisofasetofexamplesortheirperforman einasimulation. Math- emati al expressions are a well-de(cid:12)ned subset of symboli omputer programs and are also suitable for optimization using the geneti programming paradigm. The indu tion of mathemati al expressions based on data is alled symboli regression. In this work, geneti programming is extended to not just (cid:12)t the data i.e., get the numbers right, but also to get the dimensions right. For this units of measurement are used. The main ontribution in this work an be summarized as: The symboli expressions produ ed by geneti programming an be made suitable for analysis and interpretation by using units of mea- surement to guide or restri t the sear h. To a hieve this, the following has been a omplished: (cid:15) A standard geneti programming system is modi(cid:12)ed to be able to indu e expressions that more-or-less abide type onstraints. This system is used to implement a preferential bias towards dimensionally orre t solutions. (cid:15) A novel geneti programming system is introdu ed that is able to indu e expressionsin languagesthat need ontext-sensitive onstraints. It isdemon- strated that this system an be used to implement a de larative bias towards 1. the ex lusion of ertain synta ti al onstru ts; 2. the indu tion of expressions that use units of measurement; 3. the indu tion of expressions that use matrix algebra; 4. the indu tion of expressions that are numeri ally stable and orre t. (cid:15) A ase study usingfour real-worldproblems in the indu tionof dimensionally orre t empiri al equations on data using the two di(cid:11)erent methods is pre- sented to illustrate the use and limitations of these methods in a framework of s ienti(cid:12) dis overy. vii Prefa e This thesis has been submitted in partial ful(cid:12)lment for the degree of Do tor of Philosophy. The work do umented in this thesis has been arried out both at DHI |Water&EnvironmentandtheDepartmentforMathemati alModelling,Se tion forDigitalSignalPro essingattheTe hni alUniversityofDenmark. Theworkwas supervised by Professor Lars Kai Hansen of the DTU and Dr. Vladan Babovi of DHI | Water & Environment. During the Ph.D. study a number of onferen e papers and journal papers have been written. A epted Journal Papers and Book Chapters (cid:15) Maarten Keijzer and Vladan Babovi . De larative and preferential bias in gp-based s ienti(cid:12) dis overy. Geneti Programmingand Evolvable Ma hines, to appear 2002. (cid:15) Vladan Babovi and Maarten Keijzer. On the introdu tion of de larative bias in knowledge dis overy omputer systems. In P. Goodwin, editor. New paradigms in river and estuarine management. Kluwer, 2001. (cid:15) Vladan Babovi and Maarten Keijzer. Geneti programming as a model indu tion engine. Journal of Hydroinformati s, 2(1):35-61,2000. (cid:15) Vladan Babovi and Maarten Keijzer. Fore astingof river dis hargesin the presen e of haos and noise. In J. Marsalek, editor, Coping with Floods: Lessons Learned from Re ent Experien es, Kluwer, 1999. (cid:15) Vladan Babovi , Jean Philip Dre ourt, Maarten Keijzer and Peter Friis Hansen. Modelling of water supply assets: a data mining approa h. Urban Water, to appear 2002. Conferen e Papers (cid:15) Maarten Keijzer, Vladan Babovi , Conor Ryan, Mi hael O'Neill, and Mike Cattoli o. Adaptive logi programming. In Lee Spe tor et.al., eds, Pro- eedingsof the Geneti and EvolutionaryComputationConferen e (GECCO- 2001), 2001. (cid:15) Maarten Keijzer, Conor Ryan, Mi hael O'Neill, Mike Cattoli o, and Vladan Babovi . Ripple rossover in geneti programming. In Julian Miller et.al., Geneti Programming, Pro eedings of EuroGP,2001 viii (cid:15) MaartenKeijzerandVladanBabovi . Geneti programmingwithinaframe- work of omputer-aided dis overy of s ienti(cid:12) knowledge. In Darell Whitley, et.al., Pro eedings of the Geneti and Evolutionary Computation Conferen e (GECCO-2000), 2000. (cid:15) Maarten Keijzer and Vladan Babovi . Geneti programming, ensemble methodsandthebias/varian etradeo(cid:11)|introdu toryinvestigations. InRi - ardo Poli et.al., Geneti Programming, Pro eedings of EuroGP'2000, 2000. (cid:15) Maarten Keijzer and Vladan Babovi . Dimensionally aware geneti pro- gramming. In Wolfgang Banzhaf et al., Pro eedings of the Geneti and Evo- lutionary Computation Conferen e, volume 2, 1999. (cid:15) Maarten Keijzer, J.J. Merelo, G. Romero, M. S hoenauer. Evolving Ob- je ts: a general purpose evolutionary omputation library In Pierre Collet, EA-01, Evolution Arti(cid:12) ielle, 5th International Conferen e on Evolutionary Algorithms, 2001. (cid:15) Maarten Keijzer and Vladan Babovi . Error orre tion of a deterministi modelin Veni e lagoonby lo allinear models. In Modelli omplessiemetodi omputatzionali intensivi per la stima e la previsione, 1999. (cid:15) Mi haelO'Neill,ConorRyan,MaartenKeijzerandMikeCattoli o. Crossover in Grammati al Evolution: The Sear h Continues. In Julian Miller et.al., Ge- neti Programming, Pro eedings of EuroGP,2001. (cid:15) KimJ(cid:28)rgensen,BerryElfering,MaartenKeijzer,andVladanBabovi . Anal- ysis of long term morphologi al hanges: A data mining approa h. In Pro- eedings of the International Conferen e on Coastal Engineering, Australia, 2000. (cid:15) VladanBabovi ,MaartenKeijzer,andMagnusStefansson. Optimalembed- dingusingevolutionaryalgorithms. InPro eedingsoftheFourthInternational Conferen e on Hydroinformati s, Iowa City, USA, 2000. (cid:15) Vladan Babovi , Maarten Keijzer, and Marek Bundzel. From global to lo al modelling: A ase study in error orre tion of deterministi models. In Pro eedingsoftheFourthInternationalConferen eonHydroinformati s,Iowa City, USA, 2000. (cid:15) Vladan Babovi , Maarten Keijzer, David R. Aquilera, and Joe Harrington. An evolutionary approa h to knowledge indu tion: Geneti programming in hydrauli engineering. In Pro eedings of the World Water & Environmental Resour es Congress, 2001. (cid:15) Vladan Babovi andMaarten Keijzer. A Gaussianpro ess model appliedto the predi tion of water levels in Veni e lagoon. In Pro eedings of the XXIX Congress of the International Asso iation for Hydrauli Resear h,2001. (cid:15) Vladan Babovi and Maarten Keijzer. An evolutionary algorithm approa h to theindu tionofdi(cid:11)erential equations. In Pro eedingsofthe FourthInter- national Conferen e on Hydroinformati s, 2000. ix (cid:15) Vladan Babovi and Maarten Keijzer. Computer supported knowledge dis- overy | A ase study in (cid:13)ow resistan e indu ed by vegetation. In Pro eed- ings of the XXVIII Congress of the International Asso iation for Hydrauli Resear h, 1999. (cid:15) VladanBabovi andMaarten Keijzer. Datatoknowledge|thenews ien- ti(cid:12) paradigm. In D. Savi and G. Walters, editors, Water Industry Systems, 1999. Submitted Journal Papers (cid:15) Maarten Keijzer and Vladan Babovi . Knowledge fusion in data driven modeling. Ma hine Learning. (cid:15) Vladan Babovi and Maarten Keijzer. Rainfall runo(cid:11) modelling based on geneti programming. Nordi Hydrology. x A knowledgements First and foremost I would like to thank Vladan, not only for onvin ing me to try to obtain a Ph.D. in Denmark by joining him in his Talent proje t, but also for his insistent enthusiasm and his many valuable ontributions to this work. Although as a Ph.D. thesis, this work is ne essarily authored by me alone, most of the views that are expressed in this work have been jointly developed. Lars Kai and his group at the DTU have been very helpful. Although right from thestartI'vetakenanalmostdiametri allyoppositepathfromthegroup'sresear h by on entrating on the use of symboli expressions rather than `sound' numeri al pro edure, these `numeri s' did have a profound in(cid:13)uen e on the work. I have learned a lot from the group. Conor Ryan, Mi hael O'Neill and Mike Cattoli o deserve mentioning for the many intense and onsiderably less intense dis ussionswe held during the various onfer- en es and workshops in the past three years. One of the tangible results of these dis ussionsistheALP systemwhi hisbasedonMi haelandConor's`Grammati al Evolution' system. I hope we an ontinue to ooperate in the future. ThePh.D.andthisthesiswerefundedbytheDanishResear hCoun ilunderTalent Proje t 9800463 entitled "Data to Knowledge { D2K". This funding is greatly appre iated. Deventer, May 1, 2002 Maarten Keijzer xi Contents Abstra t iii Resume (Abstra t in Danish) v Prefa e vii 1 Introdu tion 1 2 Geneti Programming 5 2.1 Evolution at work: Geneti & Evolutionary Computation . . . . . . . . . . . . . . . . . 5 2.2 Standard Geneti Programming . . . . . . . . . . . . . . . . . . . . 7 2.2.1 The Primitives . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.2 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.3 Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.4 Measuring Performan e and Wrapping . . . . . . . . . . . . 11 2.2.5 Auxiliary parameters and variables . . . . . . . . . . . . . . 13 2.3 Multi-Obje tive Optimization . . . . . . . . . . . . . . . . . . . . . 13 2.4 Implementation Issues . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3 Symboli Regression 21 3.1 The Con entration of Suspended Sediment. . . . . . . . . . . . . . 24 3.2 Symboli Regression on the Sediment Transport Problem . . . . . . 28 3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 xii CONTENTS 4 Indu tion of Empiri al Equations 33 4.1 Units of Measurement as a Type System . . . . . . . . . . . . . . . 36 4.2 Language, Bias and Sear h . . . . . . . . . . . . . . . . . . . . . . 37 4.3 Typing in Geneti Programming . . . . . . . . . . . . . . . . . . . 39 4.4 Expressiveness of Type Systems . . . . . . . . . . . . . . . . . . . . 40 4.5 Typed Variation Operators . . . . . . . . . . . . . . . . . . . . . . 43 4.5.1 Broken ergodi ity . . . . . . . . . . . . . . . . . . . . . . . 43 4.5.2 Loss of diversity . . . . . . . . . . . . . . . . . . . . . . . . 44 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5 Dimensionally Aware Geneti Programming 47 5.1 Coer ed Geneti Programming . . . . . . . . . . . . . . . . . . . . 48 5.1.1 Cal ulating the Coer ion Error for the uom system . . . . . 49 5.1.2 Wrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.2 Example: Sediment Transport. . . . . . . . . . . . . . . . . . . . . 51 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6 An Adaptive Logi Programming System 57 6.1 Logi Programming . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.2 An Adaptive Logi Programming System . . . . . . . . . . . . . . . 61 6.2.1 Representation and the Mapping Pro ess . . . . . . . . . . 63 6.2.2 Ba ktra king . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.2.3 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.2.4 Performan e Evaluation . . . . . . . . . . . . . . . . . . . . 82 6.2.5 Spe ial Predi ates . . . . . . . . . . . . . . . . . . . . . . . 82 6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.5 ALP, ILP and CLP. . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7 Appli ations for the ALP System 91 7.1 Appli ations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 7.1.1 A Sensible Ant on the Santa Fe Trail . . . . . . . . . . . . . 92 7.1.2 Interval Arithmeti . . . . . . . . . . . . . . . . . . . . . . 98 7.1.3 Units of Measurement . . . . . . . . . . . . . . . . . . . . . 103 7.1.4 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . 111 7.2 Dis ussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 7.3 The Art of Geneti Programming . . . . . . . . . . . . . . . . . . . 117 CONTENTS xiii 8 Experiments in S ienti(cid:12) Dis overy 119 8.1 Problem 1: Settling Velo ity of Sand Parti les . . . . . . . . . . . . 120 8.2 Problem 2: Settling Velo ity of Fae al Pellets . . . . . . . . . . . . 121 8.3 Problem 3: Con entration of sediment near bed . . . . . . . . . . . 122 8.4 Problem 4: Roughness indu ed by (cid:13)exible vegetation . . . . . . . . 123 8.5 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 124 8.6 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . 124 8.6.1 Bias/Varian e Analysis . . . . . . . . . . . . . . . . . . . . 126 8.6.2 Settling velo ity of sand parti les . . . . . . . . . . . . . . . 129 8.6.3 Settling velo ity of fae al pellets . . . . . . . . . . . . . . . 130 8.6.4 Con entration of suspended sediment near bed . . . . . . . 130 8.6.5 Additional roughness indu ed by vegetation . . . . . . . . . 131 8.6.6 Summary of the quantitative analysis. . . . . . . . . . . . . 131 8.7 Qualitative Results. . . . . . . . . . . . . . . . . . . . . . . . . . . 132 8.7.1 Interpretability of un onstrained expressions . . . . . . . . . 133 8.7.2 Settling velo ity of sand parti les . . . . . . . . . . . . . . . 133 8.7.3 Settling velo ity of fae al pellets . . . . . . . . . . . . . . . 134 8.7.4 Con entration of suspended sediment near bed . . . . . . . 136 8.7.5 Additional roughness indu ed by vegetation . . . . . . . . . 137 8.7.6 Summary and s ope of GP-based s ienti(cid:12) dis overy . . . . 138 8.8 Dis ussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 9 Con lusion 141 List of Tables 145 List of Figures 148 Bibliography 151 1 Chapter 1 Introdu tion Physi al on epts are free reations of the human mind, and are not, however it may seem, uniquely determined by the external world. -Albert Einstein and Leopold Infeld, 1938 Theformationofmoderns ien eo urred approximatelyintheperiod betweenthe late15thandthelate18th entury. Thenewfoundationswerebasedontheutiliza- tion of a physi al experiment and the appli ation of a mathemati al apparatus in order to des ribe these experiments. Theworksof Brahe, Kepler, Newton,Leibniz, Euler andLagrangepersonify thisapproa h. Prior tothese developments,s ienti(cid:12) work primarily onsisted of olle ting the observables, or re ording the `readings of the book of nature itself'. This s ienti(cid:12) approa h is traditionally hara terized by two stages: a (cid:12)rst one in whi h a set of observations of the physi al system are olle ted, and a se ond one in whi h an indu tive assertion about the behaviour of the system | a hypothesis | is generated. Observations present spe i(cid:12) knowledge, whereas hypotheses rep- resents ageneralization ofthese data whi himplies or des ribes observations. One may argue that through this pro ess of hypothesis generation, one fundamentally e onomizesthought,asmore ompa twaysofdes ribingobservationsareproposed. Although this view of the dispassionate s ientist observing fa ts and produ ing equations is popular, it is not all there is to say about the pro ess of s ienti(cid:12) dis overy. In the years that lead to Kepler's famous laws of planetary motion, he introdu ed and abandoned various informal models of the solar-system. These models initially took the form of a olle tion of embedded spheres (Holland et al., 1986)(pp. 323-325). It was only when he abandoned the idea of planets moving in ir ular orbits around the sun and repla ed it with ellipses that he was able to postulate his laws. Kepler is not unique in this; the pro ess of the formulation of s ienti(cid:12) law or theory usually takes pla e in the ontext of a mental model of the phenomenon under study: using the right on ept to explain the equation provides additionaljusti(cid:12) ationfortheseequations. Findingaproper on eptualizationofthe problemisasmu hafeatofs ienti(cid:12) dis overyastheformulationofamathemati al des ription or explanation of a phenomenon. Today,inthebeginningofthe21st entury, weareexperien ingyetanother hange in the s ienti(cid:12) pro ess as just outlined. This latest s ienti(cid:12) approa h is one

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.