Table Of ContentApplications of
Mathematics
22
A.v.
Edited by Balakrishnan
I. Karatzas
M.Yor
Applications of Mathematics
1 Fleming/Rishel, Deterministic and Stochastic Optimal Control (1975)
2 Marchuk, Methods of Numerical Mathematics, Second Ed. (1982)
3 Balakrishnan, Applied Functional Analysis, Second Ed. (1981)
4 Borovkov, Stochastic Processes in Queueing Theory (1976)
5 LiptserlShiryayev, Statistics of Random Processes I: General Theory (1977)
6 LiptserlShiryayev, Statistics of Random Processes II: Applications (1978)
7 Vorob'ev, Game Theory: Lectures for Economists and Systems Scientists
(1977)
8 Shiryayev, Optimal Stopping Rules (1978)
9 Ibragimov/Rozanov, Gaussian Random Processes (1978)
10 Wonham, Linear Multivariable Control: A Geometric Approach, Third Ed.
(1985)
11 Hida, Brownian Motion (1980)
12 Hestenes, Conjugate Direction Methods in Optimization (1980)
13 Kallianpur, Stochastic Filtering Theory (1980)
14 Krylov, Controlled Diffusion Processes (1980)
15 Prabhu, Stochastic Storage Processes: Queues, Insurance Risk, and Dams
(1980)
16 Ibragimov/Has'minskii, Statistical Estimation: Asymptotic Theory (1981)
17 Cesari, Optimization: Theory and Applications (1982)
18 Elliott, Stochastic Calculus and Applications (1982)
19 MarchukiShaidourov, Difference Methods and Their Extrapolations (1983)
20 Hijab, Stabilization of Control Systems (1986)
21 Protter, Stochastic Integration and Differential Equations (1990)
22 Benveniste/Metivier/Priouret, Adaptive Algorithms and Stochastic
Approximations (1990)
Albert Benveniste Michel Metivier
Pierre Priouret
Adaptive Algorithms and
Stochastic Approximations
Translated from the French by Stephen S. Wilson
With 24 Figures
Springer-Verlag
Berlin Heidelberg New York
London Paris Tokyo
HongKong Barcelona
Albert Benveniste Michel Metivier t
IRISA-INRIA
Campus de Beaulieu
35042 RENNES Cedex
France
Pierre Priouret
Laboratoire de Probabilites
Universite Pierre et Marie Curie
4 Place lussieu
75230 PARIS Cedex
France
Managing Editors
A. V. Balakrishnan I. Karatzas
Systems Science Department Department of Statistics
University of California Columbia University
Los Angeles, CA 90024 New York, NY 10027
USA USA
M.Yor
Laboratoire de Probabilites
Universite Pierre et Marie Curie
4 Place lussieu, Tour 56
75230 PARIS Cedex
France
Title of the Original French edition:
Algorithmes adaptatifs et approximations stochastiques
© Masson, Paris, 1987
Mathematics Subject Classification (1980): 62-XX, 62L20, 93-XX, 93C40, 93E12, 93EI0
ISBN-13: 978-3-642-75896-6 e-ISBN-13: 978-3-642-75894-2
DOl: 10.1007/978-3-642-75894-2
This work is subject to copyright. All rights are reserved, whether the whole or part of the mate
rial is concerned, specifically the rights oftranslation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in other ways, and storage in data banks. Duplica
tion of this publication or parts thereof is only permitted under the provisions of the German
Copyright Law of September 9, 1965, in its current version, and a copyright fee must always be
paid. Violations fall under the prosecution act of the German Copyright Law.
© Springer-Verlag Berlin Heidelberg 1990
So/kover reprint of the hardcover 1s t edition 1990
214113140-543210 - Printed on acid-free paper
A notre ami Michel
Albert, Pierre
Preface to the English Edition
The comments which we have received on the original French edition of this
book, and advances in our own work since the book was published, have led
us to make several modifications to the text prior to the publication of the
English edition. These modifications concern both the fields of application
and the presentation of the mathematical results.
As far as the fields of application are concerned, it seems that our claim
to cover the whole domain of pattern recognition was somewhat exaggerated,
given the examples chosen to illustrate the theory. We would now like to
put this to rights, without making the text too cumbersome. Thus we have
decided to introduce two new and very different categories of applications,
both of which are generally recognised as being relevant to pattern recognition.
These applications are introduced through long exercises in which the reader
is strictly directed to the solutions. The two new examples are borrowed,
respectively, from the domain of machine learning using neural networks and
from the domain of Gibbs fields or networks of random automata.
As far as the presentation of the mathematical results is concerned, we
have added an appendix containing details of a.s. convergence theorems for
stochastic approximations under Robbins-Monro type hypotheses. The new
appendix is intended to present results which are easily proved (using only
basic limit theorems about supermartingales) and which are brief, without
over-restrictive assumptions. The appendix is thus specifically written for
reference, unlike the more technical body of Part II of the book. We have,
in addition, corrected several minor errors in the original, and expanded the
bibliography to cover a broader area of research.
Finally, for this English version, we would like to thank Hans Walk for his
interesting suggestions which we have used to construct our list of references,
and Dr. Stephen S.Wilson for his outstanding work in translating and editing
this edition.
April 1990
Preface to the Original French Edition
The Story of a Wager
When, some three years ago, urged on by Didier Dacunha-Castelle and Robert
Azencott, we decided to write this book, our motives were, to say the least,
both simple and naive. Number 1 (in alphabetical order) dreamt of a
corpus of solid theorems to justify the practical everyday engineering usage of
adaptive algorithms and to act as an engineer's handbook. Numbers 2 and 3
wanted to show that the term "applied probability" should not necessarily
refer to probability with regard to applications, but rather to probability in
support of applications.
The unfolding dream produced a game rule, which we initially found quite
amusing: Number 1 has the material (examples of major applications) and
the specification (the theorems of the dream), Numbers 2 and 3 have the tools
(martingales, ... ), and the problem is to achieve the specification. We were
overwhelmed by this long and curious collaboration, which at the same time
brought home several harsh realities: not all the theorems of our dreams are
necessarily true, and the most elegant tools cannot necessarily be adapted to
the toughest applications.
The book owes a great deal to the highly active adaptive processing
community: Michele Basseville, Bob Bitmead, Peter Kokotovic, Lennart
Ljung, Odile Macchi, Igor Nikiforov, Gabriel Ruget and Alan WilIsky, to
name but a few. It also owes much to the ideas and publications of Harold
Kushner and his co-workers D.S.Clark, Hai Huang and Adam Shwartz. Proof
reading amongst authors is a little like being surrounded by familiar objects:
it blunts the critical spirit. We would thus like to thank Michele Basseville,
Bernard Delyon and Georges Moustakides for their patient reading of the first
drafts.
Since this book was bound to evolve as it was written, we saw the need
to use a computer-based text-processing system; we were offered a promising
new package, MINT, which we adopted. The generous environment of IRIS A,
much perseverance by Dominique Blaise, Philippe Louarn's great ingenuity
in tempering the quirks of the software, and Number 1' s stamina of a long
distance runner in implementing the many successive corrections, all
contributed to the eventual birth of this book.
January 1987
Contents
Introduction 1
Part I. Adaptive Algorithms: Applications 7
1. General Adaptive Algorithm Form 9
1.1 Introduction ...................................................... 9
1.2 Two Basic Examples and Their Variants ......................... 10
1.3 General Adaptive Algorithm Form and Main Assumptions ........ 23
1.4 Problems Arising ................................................ 29
1.5 Summary of the Adaptive Algorithm Form: Assumptions (A) ..... 31
1.6 Conclusion ...................................................... 33
1. 7 Exercises ........................................................ 34
1.8 Comments on the Literature ..................................... 38
2. Convergence: the ODE Method 40
2.1 Introduction .................................................... 40
2.2 Mathematical Tools: Informal Introduction ...................... 41
2.3 Guide to the Analysis of Adaptive Algorithms ................... .48
2.4 Guide to Adaptive Algorithm Design ............................. 55
2.5 The Transient Regime ........................................... 75
2.6 Conclusion ...................................................... 76
2.7 Exercises ........................................................ 76
2.8 Comments on the Literature .................................... 100
3. Rate of Convergence 103
3.1 Mathematical Tools: Informal Description ...................... 103
3.2 Applications to the Design of Adaptive Algorithms with
Decreasing Gain ................................................ 110
3.3 Conclusions from Section 3.2 ................................... 116
3.4 Exercises ........... '" ......................................... 116
3.5 Comments on the Literature .................................... 118
x Contents
4. Tracking Non-Stationary Parameters 120
4.1 Tracking Ability of Algorithms with Constant Gain ............. 120
4.2 Multistep Algorithms ........................................... 142
4.3 Conclusions .................................................... 158
4.4 Exercises ....................................................... 158
4.5 Comments on the Literature .................................... 163
5. Sequential Detection; Model Validation 165
5.1 Introduction and Description of the Problem .................... 166
5.2 Two Elementary Problems and their Solution ................... 171
5.3 Central Limit Theorem and the Asymptotic Local Viewpoint .... 176
5.4 Local Methods of Change Detection ............................ 180
5.5 Model Validation by Local Methods ............................ 185
5.6 Conclusion ..................................................... 188
5.7 Annex: Proofs of Theorems 1 and 2 ............................. 188
5.8 Exercises ....................................................... 191
5.9 Comments on the Literature .................................... 197
6. Appendices to Part I 199
6.1 Rudiments of Systems Theory .................................. 199
6.2 Second Order Stationary Processes ............................. 205
6.3 Kalman Filters ................................................. 208
Part II. Stochastic Approximations: Theory 211
1. O.D.E. and Convergence A.S. for an Algorithm with
Locally Bounded Moments 213
1.1 Introduction of the General Algorithm .......................... 213
1.2 Assumptions Peculiar to Chapter 1 ............................. 219
1.3 Decomposition of the General Algorithm ........................ 220
1.4 L2 Estimates ................................................... 223
1.5 Approximation of the Algorithm by the Solution of the O.D.E. .. 230
1.6 Asymptotic Analysis of the Algorithm .......................... 233
1. 7 An Extension of the Previous Results ........................... 236
1.8 Alternative Formulation of the Convergence Theorem ........... 238
1.9 A Global Convergence Theorem ................................ 239
1.10 Rate of L2 Convergence of Some Algorithms ................... 243
1.11 Comments on the Literature ................................... 249
Contents Xl
2. Application to the Examples of Part I 251
2.1 Geometric Ergodicity of Certain Markov Chains ................ 251
2.2 Markov Chains Dependent on a Parameter () .................... 259
2.3 Linear Dynamical Processes .................................... 265
2.4 Examples ...................................................... 270
2.5 Decision-Feedback Algorithms with Quantisation ................ 276
2.6 Comments on the Literature .................................... 288
3. Analysis of the Algorithm in the General Case 289
3.1 New Assumptions and Control of the Moments .................. 289
3.2 Lq Estimates ................................................... 293
3.3 Convergence towards the Mean Trajectory ...................... 298
3.4 Asymptotic Analysis of the Algorithm .......................... 301
3.5 "Tube of Confidence" for an Infinite Horizon .................... 305
3.6 Final Remark. Connections with the Results of Chapter 1 ....... 306
3.7 Comments on the Literature .................................... 306
4. Gaussian Approximations to the Algorithms 307
4.1 Process Distributions and their Weak Convergence .............. 308
4.2 Diffusions. Gaussian Diffusions ................................. 312
4.3 The Process U"Y(t) for an Algorithm with Constant Step Size .... 314
4.4 Gaussian Approximation of the Processes U"Y(t) ................. 321
4.5 Gaussian Approximation for Algorithms with Decreasing
Step Size .................................... , .................. 327
4.6 Gaussian Approximation and Asymptotic Behaviour
of Algorithms with Constant Steps .............................. 334
4.7 Remark on Weak Convergence Techniques ...................... 341
4.8 Comments on the Literature .................................... 341
5. Appendix to Part II: A Simple Theorem in the
"Robbins-Monro" Case 343
5.1 The Algorithm, the Assumptions and the Theorem .............. 343
5.2 Proof of the Theorem .......................................... 344
5.3 Variants ....................................................... 345
Bibliography •..................•.....•....•.................. 349
Subject Index to Part I ...................................... 361
Subject Index to Part IT ..................................... 364