Table Of ContentReinforcement Learning
Theory and Applications
Reinforcement Learning
Theory and Applications
Edited by
Cornelius Weber
Mark Elshaw
Norbert Michael Mayer
I-TECH Education and Publishing
Published by the I-Tech Education and Publishing, Vienna, Austria
Abstracting and non-profit use of the material is permitted with credit to the source. Statements and
opinions expressed in the chapters are these of the individual contributors and not necessarily those of
the editors or publisher. No responsibility is accepted for the accuracy of information contained in the
published articles. Publisher assumes no responsibility liability for any damage or injury to persons or
property arising out of the use of any materials, instructions, methods or ideas contained inside. After
this work has been published by the Advanced Robotic Systems International, authors have the right to
republish it, in whole or part, in any publication of which they are an author or editor, and the make
other personal use of the work.
© 2008 I-Tech Education and Publishing
www.i-techonline.com
Additional copies can be obtained from:
publication@i-techonline.com
First published January 2008
Printed in Croatia
A catalog record for this book is available from the Austrian Library.
Reinforcement Learning, Theory and Applications, Edited by Cornelius Weber, Mark Elshaw and Nor-
bert Michael Mayer
p. cm.
ISBN 978-3-902613-14-1
1. Reinforcement Learning. 2. Theory. 3. Applications.
Preface
Brains rule the world, and brain-like computation is increasingly used in computers and e-
lectronic devices. Brain-like computation is about processing and interpreting data or direct-
ly putting forward and performing actions. Learning is a very important aspect. This book is
on reinforcement learning which involves performing actions to achieve a goal. Two other
learning paradigms exist. Supervised learning has initially been successful in prediction and
classification tasks, but is not brain-like. Unsupervised learning is about understanding the
world by passively mapping or clustering given data according to some order principles,
and is associated with the cortex in the brain. In reinforcement learning an agent learns by
trial and error to perform an action to receive a reward, thereby yielding a powerful method
to develop goal-directed action strategies. It is predominately associated with the basal
ganglia in the brain.
The first 11 chapters of this book, Theory, describe and extend the scope of reinforcement
learning. The remaining 11 chapters, Applications, show that there is already wide usage in
numerous fields. Reinforcement learning can tackle control tasks that are too complex for
traditional, hand-designed, non-learning controllers. As learning computers can deal with
technical complexities, the tasks of human operators remain to specify goals on increasingly
higher levels.
This book shows that reinforcement learning is a very dynamic area in terms of theory and
applications and it shall stimulate and encourage new research in this field. We would like
to thank all contributors to this book for their research and effort.
Summary of Theory:
Chapters 1 and 2 create a link to supervised and unsupervised learning, respectively, by re-
garding reinforcement learning as a prediction problem, and chapter 3 looks at fuzzy-
control with a reinforcement-based genetic algorithm. Reinforcement algorithms are modi-
fied in chapter 4 for future parallel and quantum computing, and in chapter 5 for a more ge-
neral class of state-action spaces, described by grammars. Then follow biological views; in
chapter 6 how reinforcement learning occurs on a single neuron level by considering the in-
teraction between a spatio-temporal learning rule and Hebbian learning, and in a global
brain view of chapter 7, unsupervised learning is depicted as a means of data pre-processing
and arrangement for reinforcement algorithms. A table presents a ready-to-implement desc-
ription of standard reinforcement learning algorithms. The following chapters consider mul-
ti agent systems where a single agent has only partial view of the entire system. Multiple
agents can work cooperatively on a common goal, as considered in chapter 8, or rewards
can be individual but interdependent, such as in game play, as considered in chapters 9, 10
and 11.
VI
Summary of Applications:
Chapter 12 continues with game applications where a robot cup middle size league robot
learns a strategic soccer move. A dialogue manager for man-machine dialogues in chapter
13 interacts with humans by communication and database queries, dependent on interaction
strategies that govern the Markov decision processes. Chapters 14, 15, 16 and 17 tackle
control problems that may be typical for classical methods of control like PID controllers
and hand-set rules. However, traditional methods fail if the systems are too complex, time-
varying, if knowledge of the state is imprecise, or if there are multiple objectives. These
chapters report examples of computer applications that are tackled only with reinforcement
learning such as water allocation improvement, building environmental control, chemical
processing and industrial process control. The reinforcement-controlled systems may conti-
nue learning during operation. The next three chapters involve path optimization. In chap-
ter 18, internet routers explore different links to find more optimal routes to a destination
address. Chapter 19 deals with optimizing a travel sequence w.r.t. both time and distance.
Chapter 20 proposes an untypical application of path optimization: a path from a given pat-
tern to a target pattern provides a distance measure. An unclassified medical image can the-
reby be classified dependent on whether a path from it is shorter to an image of healthy or
unhealthy tissue, specifically considering lung nodules classification using 3D geometric
measures extracted from the lung lesions Computerized Tomography (CT) images. Chapter
21 presents a physicians' decision support system for diagnosis and treatment, involving a
knowledgebase server. In chapter 22 a reinforcement learning sub-module improves the ef-
ficiency for the exchange of messages in a decision support system in air traffic manage-
ment.
January 2008 Cornelius Weber
Mark Elshaw
Norbert Michael Mayer
Contents
Preface ........................................................................................................................................V
1. Neural Forecasting Systems............................................................................................001
Takashi Kuremoto, Masanao Obayashi and Kunikazu Kobayashi
2. Reinforcement learning in system identification.........................................................021
Mariela Cerrada and Jose Aguilar
3. Reinforcement Evolutionary Learning for Neuro-Fuzzy Controller Design.............033
Cheng-Jian Lin
4. Superposition-Inspired
Reinforcement Learning and Quantum Reinforcement Learning.................................059
Chun-Lin Chen and Dao-Yi Dong
5. An Extension of Finite-state Markov
Decision Process and an Application of Grammatical Inference..................................085
Takeshi Shibata and Ryo Yoshinaka
6. Interaction between the Spatio-Temporal Learning Rule (non Hebbian)
and Hebbian in Single Cells: A cellular mechanism of reinforcement learning.........105
Minoru Tsukada
7. Reinforcement Learning Embedded in Brains and Robots........................................119
Cornelius Weber, Mark Elshaw, Stefan Wermter, Jochen Triesch and Christopher Willmot
8. Decentralized Reinforcement Learning
for the Online Optimization of Distributed Systems........................................................143
Jim Dowling and Seif Haridi
9. Multi-Automata Learning..................................................................................................167
Verbeeck Katja, Nowe Ann, Vrancx Peter and Peeters Maarten
10. Abstraction for Genetics-based Reinforcement Learning........................................187
Will Browne, Dan Scott and Charalambos Ioannides
VIII
11. Dynamics of the Bush-Mosteller learning algorithm in 2x2 games........................199
Luis R. Izquierdo and Segismundo S. Izquierdo
12. Modular Learning Systems for
Behavior Acquisition in Multi-Agent Environment...........................................................225
Yasutake Takahashi and Minoru Asada
13. Optimising Spoken Dialogue
Strategies within the Reinforcement Learning Paradigm...............................................239
Olivier Pietquin
14. Water Allocation Improvement in
River Basin Using Adaptive Neural Fuzzy Reinforcement Learning Approach..........257
Abolpour B., Javan M. and Karamouz M.
15. Reinforcement Learning for Building Environmental Control.................................283
Konstantinos Dalamagkidis and Dionysia Kolokotsa
16. Model-Free Learning Control of Chemical Processes..............................................295
S. Syafiie, F. Tadeo and E. Martinez
17. Reinforcement Learning-Based
Supervisory Control Strategy for a Rotary Kiln Process................................................311
Xiaojie Zhou, Heng Yue and Tianyou Chai
18. Inductive Approaches based on
Trial/Error Paradigm for Communications Network.........................................................325
Abdelhamid Mellouk
19. The Allocation of Time and Location Information to
Activity-Travel Sequence Data by means of Reinforcement Learning.........................359
Wets Janssens
20. Application on Reinforcement
Learning for Diagnosis based on Medical Image.............................................................379
Stelmo Magalhaes Barros Netto, Vanessa Rodrigues Coelho Leite,
Aristofanes Correa Silva, Anselmo Cardoso de Paiva and Areolino de Almeida Neto
21. RL based Decision Support System for u-Healthcare Environment.......................399
Devinder Thapa, In-Sung Jung, and Gi-Nam Wang
22. Reinforcement Learning to
Support Meta-Level Control in Air Traffic Management..................................................409
Daniela P. Alves, Li Weigang and Bueno B. Souza