Reinforcement Learning Theory and Applications Reinforcement Learning Theory and Applications Edited by Cornelius Weber Mark Elshaw Norbert Michael Mayer I-TECH Education and Publishing Published by the I-Tech Education and Publishing, Vienna, Austria Abstracting and non-profit use of the material is permitted with credit to the source. Statements and opinions expressed in the chapters are these of the individual contributors and not necessarily those of the editors or publisher. No responsibility is accepted for the accuracy of information contained in the published articles. Publisher assumes no responsibility liability for any damage or injury to persons or property arising out of the use of any materials, instructions, methods or ideas contained inside. After this work has been published by the Advanced Robotic Systems International, authors have the right to republish it, in whole or part, in any publication of which they are an author or editor, and the make other personal use of the work. © 2008 I-Tech Education and Publishing www.i-techonline.com Additional copies can be obtained from: [email protected] First published January 2008 Printed in Croatia A catalog record for this book is available from the Austrian Library. Reinforcement Learning, Theory and Applications, Edited by Cornelius Weber, Mark Elshaw and Nor- bert Michael Mayer p. cm. ISBN 978-3-902613-14-1 1. Reinforcement Learning. 2. Theory. 3. Applications. Preface Brains rule the world, and brain-like computation is increasingly used in computers and e- lectronic devices. Brain-like computation is about processing and interpreting data or direct- ly putting forward and performing actions. Learning is a very important aspect. This book is on reinforcement learning which involves performing actions to achieve a goal. Two other learning paradigms exist. Supervised learning has initially been successful in prediction and classification tasks, but is not brain-like. Unsupervised learning is about understanding the world by passively mapping or clustering given data according to some order principles, and is associated with the cortex in the brain. In reinforcement learning an agent learns by trial and error to perform an action to receive a reward, thereby yielding a powerful method to develop goal-directed action strategies. It is predominately associated with the basal ganglia in the brain. The first 11 chapters of this book, Theory, describe and extend the scope of reinforcement learning. The remaining 11 chapters, Applications, show that there is already wide usage in numerous fields. Reinforcement learning can tackle control tasks that are too complex for traditional, hand-designed, non-learning controllers. As learning computers can deal with technical complexities, the tasks of human operators remain to specify goals on increasingly higher levels. This book shows that reinforcement learning is a very dynamic area in terms of theory and applications and it shall stimulate and encourage new research in this field. We would like to thank all contributors to this book for their research and effort. Summary of Theory: Chapters 1 and 2 create a link to supervised and unsupervised learning, respectively, by re- garding reinforcement learning as a prediction problem, and chapter 3 looks at fuzzy- control with a reinforcement-based genetic algorithm. Reinforcement algorithms are modi- fied in chapter 4 for future parallel and quantum computing, and in chapter 5 for a more ge- neral class of state-action spaces, described by grammars. Then follow biological views; in chapter 6 how reinforcement learning occurs on a single neuron level by considering the in- teraction between a spatio-temporal learning rule and Hebbian learning, and in a global brain view of chapter 7, unsupervised learning is depicted as a means of data pre-processing and arrangement for reinforcement algorithms. A table presents a ready-to-implement desc- ription of standard reinforcement learning algorithms. The following chapters consider mul- ti agent systems where a single agent has only partial view of the entire system. Multiple agents can work cooperatively on a common goal, as considered in chapter 8, or rewards can be individual but interdependent, such as in game play, as considered in chapters 9, 10 and 11. VI Summary of Applications: Chapter 12 continues with game applications where a robot cup middle size league robot learns a strategic soccer move. A dialogue manager for man-machine dialogues in chapter 13 interacts with humans by communication and database queries, dependent on interaction strategies that govern the Markov decision processes. Chapters 14, 15, 16 and 17 tackle control problems that may be typical for classical methods of control like PID controllers and hand-set rules. However, traditional methods fail if the systems are too complex, time- varying, if knowledge of the state is imprecise, or if there are multiple objectives. These chapters report examples of computer applications that are tackled only with reinforcement learning such as water allocation improvement, building environmental control, chemical processing and industrial process control. The reinforcement-controlled systems may conti- nue learning during operation. The next three chapters involve path optimization. In chap- ter 18, internet routers explore different links to find more optimal routes to a destination address. Chapter 19 deals with optimizing a travel sequence w.r.t. both time and distance. Chapter 20 proposes an untypical application of path optimization: a path from a given pat- tern to a target pattern provides a distance measure. An unclassified medical image can the- reby be classified dependent on whether a path from it is shorter to an image of healthy or unhealthy tissue, specifically considering lung nodules classification using 3D geometric measures extracted from the lung lesions Computerized Tomography (CT) images. Chapter 21 presents a physicians' decision support system for diagnosis and treatment, involving a knowledgebase server. In chapter 22 a reinforcement learning sub-module improves the ef- ficiency for the exchange of messages in a decision support system in air traffic manage- ment. January 2008 Cornelius Weber Mark Elshaw Norbert Michael Mayer Contents Preface ........................................................................................................................................V 1. Neural Forecasting Systems............................................................................................001 Takashi Kuremoto, Masanao Obayashi and Kunikazu Kobayashi 2. Reinforcement learning in system identification.........................................................021 Mariela Cerrada and Jose Aguilar 3. Reinforcement Evolutionary Learning for Neuro-Fuzzy Controller Design.............033 Cheng-Jian Lin 4. Superposition-Inspired Reinforcement Learning and Quantum Reinforcement Learning.................................059 Chun-Lin Chen and Dao-Yi Dong 5. An Extension of Finite-state Markov Decision Process and an Application of Grammatical Inference..................................085 Takeshi Shibata and Ryo Yoshinaka 6. Interaction between the Spatio-Temporal Learning Rule (non Hebbian) and Hebbian in Single Cells: A cellular mechanism of reinforcement learning.........105 Minoru Tsukada 7. Reinforcement Learning Embedded in Brains and Robots........................................119 Cornelius Weber, Mark Elshaw, Stefan Wermter, Jochen Triesch and Christopher Willmot 8. Decentralized Reinforcement Learning for the Online Optimization of Distributed Systems........................................................143 Jim Dowling and Seif Haridi 9. Multi-Automata Learning..................................................................................................167 Verbeeck Katja, Nowe Ann, Vrancx Peter and Peeters Maarten 10. Abstraction for Genetics-based Reinforcement Learning........................................187 Will Browne, Dan Scott and Charalambos Ioannides VIII 11. Dynamics of the Bush-Mosteller learning algorithm in 2x2 games........................199 Luis R. Izquierdo and Segismundo S. Izquierdo 12. Modular Learning Systems for Behavior Acquisition in Multi-Agent Environment...........................................................225 Yasutake Takahashi and Minoru Asada 13. Optimising Spoken Dialogue Strategies within the Reinforcement Learning Paradigm...............................................239 Olivier Pietquin 14. Water Allocation Improvement in River Basin Using Adaptive Neural Fuzzy Reinforcement Learning Approach..........257 Abolpour B., Javan M. and Karamouz M. 15. Reinforcement Learning for Building Environmental Control.................................283 Konstantinos Dalamagkidis and Dionysia Kolokotsa 16. Model-Free Learning Control of Chemical Processes..............................................295 S. Syafiie, F. Tadeo and E. Martinez 17. Reinforcement Learning-Based Supervisory Control Strategy for a Rotary Kiln Process................................................311 Xiaojie Zhou, Heng Yue and Tianyou Chai 18. Inductive Approaches based on Trial/Error Paradigm for Communications Network.........................................................325 Abdelhamid Mellouk 19. The Allocation of Time and Location Information to Activity-Travel Sequence Data by means of Reinforcement Learning.........................359 Wets Janssens 20. Application on Reinforcement Learning for Diagnosis based on Medical Image.............................................................379 Stelmo Magalhaes Barros Netto, Vanessa Rodrigues Coelho Leite, Aristofanes Correa Silva, Anselmo Cardoso de Paiva and Areolino de Almeida Neto 21. RL based Decision Support System for u-Healthcare Environment.......................399 Devinder Thapa, In-Sung Jung, and Gi-Nam Wang 22. Reinforcement Learning to Support Meta-Level Control in Air Traffic Management..................................................409 Daniela P. Alves, Li Weigang and Bueno B. Souza