Table Of ContentProbability,
Random Processes,
and Ergodic Properties
November 3, 2001
ii
Probability,
Random Processes,
and Ergodic Properties
Robert M. Gray
Information Systems Laboratory
Department of Electrical Engineering
Stanford University
iv
(cid:176)c1987 by Springer Verlag, 2001 revision by Robert M. Gray.
v
This book is afiectionately dedicated to
Elizabeth Dubois Jordan Gray
and to the memory of
R. Adm. Augustine Heard Gray, U.S.N.
1888-1981
Sara Jean Dubois
and
William \Billy" Gray
1750-1825
vi
Preface
History and Goals
This book has been written for several reasons, not all of which are academic. This material was
for many years the flrst half of a book in progress on information and ergodic theory. The intent
wasandistoprovideareasonablyself-containedadvancedtreatmentofmeasuretheory,probability
theory, and the theory of discrete time random processes with an emphasis on general alphabets
and on ergodic and stationary properties of random processes that might be neither ergodic nor
stationary. The intended audience was mathematically inclined engineering graduate students and
visiting scholars who had not had formal courses in measure theoretic probability. Much of the
materialisfamiliarstufiformathematicians, butmanyofthetopicsandresultshavenotpreviously
appeared in books.
The original project grew too large and the flrst part contained much that would likely bore
mathematicians and discourage them from the second part. Hence I flnally followed a suggestion
to separate the material and split the project in two. The original justiflcation for the present
manuscriptwasthepragmaticonethatitwouldbeashametowastealltheefiortthusfarexpended.
A more idealistic motivation was that the presentation had merit as fllling a unique, albeit small,
holeintheliterature. Personalexperienceindicatesthattheintendedaudiencerarelyhasthetimeto
takeacompletecourseinmeasureandprobabilitytheoryinamathematicsorstatisticsdepartment,
atleastnotbeforetheyneedsomeofthematerialintheirresearch. Inaddition,manyoftheexisting
mathematical texts on the subject are hard for this audience to follow, and the emphasis is not well
matched to engineering applications. A notable exception is Ash’s excellent text [1], which was
likely in(cid:176)uenced by his original training as an electrical engineer. Still, even that text devotes little
efiorttoergodictheorems, perhapsthemostfundamentally importantfamily ofresultsforapplying
probability theory to real problems. In addition, there are many other special topics that are given
little space (or none at all) in most texts on advanced probability and random processes. Examples
of topics developed in more depth here than in most existing texts are the following:
Random processes with standard alphabets We develop the theory of standard spaces as a
model of quite general process alphabets. Although not as general (or abstract) as often
considered by probability theorists, standard spaces have useful structural properties that
simplify the proofs of some general results and yield additional results that may not hold
in the more general abstract case. Examples of results holding for standard alphabets that
have not been proved in the general abstract case are the Kolmogorov extension theorem, the
ergodicdecomposition,andtheexistenceofregularconditionalprobabilities. Infact,Blackwell
[6] introduced the notion of a Lusin space, a structure closely related to a standard space, in
ordertoavoidknownexamplesofprobabilityspaceswheretheKolmogorovextensiontheorem
does not hold and regular conditional probabilities do not exist. Standard spaces include the
vii
viii PREFACE
commonmodelsofflnitealphabets(digitalprocesses)andrealalphabetsaswellasmoregeneral
complete separable metric spaces (Polish spaces). Thus they include many function spaces,
Euclidean vector spaces, two-dimensional image intensity rasters, etc. The basic theory of
standard Borel spaces may be found in the elegant text of Parthasarathy [55], and treatments
of standard spaces and the related Lusin and Suslin spaces may be found in Christensen [10],
Schwartz [62], Bourbaki [7], and Cohn [12]. We here provide a difierent and more coding
oriented development of the basic results and attempt to separate clearly the properties of
standardspaces,whichareusefulandeasytomanipulate,fromthedemonstrationsthatcertain
spaces are standard, which are more complicated and can be skipped. Thus, unlike in the
traditional treatments, we deflne and study standard spaces flrst from a purely probability
theory point of view and postpone the topological metric space considerations until later.
Nonstationary and nonergodic processes We develop the theory of asymptotically mean sta-
tionary processes and the ergodic decomposition in order to model many physical processes
better than can traditional stationary and ergodic processes. Both topics are virtually absent
in all books on random processes, yet they are fundamental to understanding the limiting
behavior of nonergodic and nonstationary processes. Both topics are considered in Krengel’s
excellentbookonergodictheorems[41],butthetreatmenthereismoredetailedandingreater
depth. We consider both the common two-sided processes, which are considered to have been
producing outputs forever, and the more di–cult one-sided processes, which better model
processes that are \turned on" at some speciflc time and which exhibit transient behavior.
Ergodic properties and theorems We develop the notion of time averages along with that of
probabilistic averages to emphasize their similarity and to demonstrate many of the impli-
cations of the existence of limiting sample averages. We prove the ergodic theorem theorem
for the general case of asymptotically mean stationary processes. In fact, it is shown that
asymptotic mean stationarity is both su–cient and necessary for the classical pointwise or
almost everywhere ergodic theorem to hold. We also prove the subadditive ergodic theorem
of Kingman [39], which is useful for studying the limiting behavior of certain measurements
on random processes that are not simple arithmetic averages. The proofs are based on re-
cent simple proofs of the ergodic theorem developed by Ornstein and Weiss [52], Katznelson
and Weiss [38], Jones [37], and Shields [64]. These proofs use coding arguments reminiscent
of information and communication theory rather than the traditional (and somewhat tricky)
maximal ergodic theorem. We consider the interrelations of stationary and ergodic proper-
ties of processes that are stationary or ergodic with respect to block shifts, that is, processes
that produce stationary or ergodic vectors rather than scalars | a topic largely developed b
Nedoma [49] which plays an important role in the general versions of Shannon channel and
source coding theorems.
Process distance measures We develop measures of a \distance" between random processes.
Suchresultsquantifyhow\close"oneprocessistoanotherandareusefulforconsideringspaces
of random processes. These in turn provide the means of proving the ergodic decomposition
ofcertainfunctionalsofrandomprocessesandofcharacterizinghowcloseordifierentthelong
term behavior of distinct random processes can be expected to be.
Havingdescribedthetopicstreatedherethatarelackinginmosttexts,weadmittotheomission
ofmanytopicsusuallycontainedinadvancedtextsonrandomprocessesorsecondbooksonrandom
processes for engineers. The most obvious omission is that of continuous time random processes. A
variety of excuses explain this: The advent of digital systems and sampled-data systems has made
discrete time processes at least equally important as continuous time processes in modeling real
PREFACE ix
worldphenomena. Theshiftinemphasisfromcontinuoustimetodiscretetimeintextsonelectrical
engineeringsystemscanbeverifledbysimplyperusingmoderntexts. Thetheoryofcontinuoustime
processes is inherently more di–cult than that of discrete time processes. It is harder to construct
themodelspreciselyandmuchhardertodemonstratetheexistenceofmeasurementsonthemodels,
e.g., it is usually harder to prove that limiting integrals exist than limiting sums. One can approach
continuous time models via discrete time models by letting the outputs be pieces of waveforms.
Thus, in a sense, discrete time systems can be used as a building block for continuous time systems.
Another topic clearly absent is that of spectral theory and its applications to estimation and
prediction. This omission is a matter of taste and there are many books on the subject.
A further topic not given the traditional emphasis is the detailed theory of the most popular
particular examples of random processes: Gaussian and Poisson processes. The emphasis of this
bookisongeneralpropertiesofrandomprocessesratherthanthespeciflcpropertiesofspecialcases.
The flnal noticeably absent topic is martingale theory. Martingales are only brie(cid:176)y discussed in
the treatment of conditional expectation. My excuse is again that of personal taste. In addition,
this powerful theory is simply not required in the intended sequel to this book on information and
ergodic theory.
The book’s original goal of providing the needed machinery for a book on information and
ergodic theory remains. That book will rest heavily on this book and will only quote the needed
material, freeing it to focus on the information measures and their ergodic theorems and on source
and channel coding theorems. In hindsight, this manuscript also serves an alternative purpose. I
have been approached by engineering students who have taken a master’s level course in random
processesusingmybookwithLeeDavisson[24]andwhoareinterestedinexploringmoredeeplyinto
the underlying mathematics that is often referred to, but rarely exposed. This manuscript provides
such a sequel and fllls in many details only hinted at in the lower level text.
As a flnal, and perhaps less idealistic, goal, I intended in this book to provide a catalogue of
many results that I have found need of in my own research together with proofs that I could follow.
ThisisonegoalwhereinIcanjudgethesuccess;Ioftenflndmyselfconsultingthesenotestoflndthe
conditionsforsomeconvergenceresultorthereasonsforsomerequiredassumptionorthegenerality
of the existence of some limit. If the manuscript provides similar service for others, it will have
succeeded in a more global sense.
Assumed Background
The book is aimed at graduate engineers and hence does not assume even an undergraduate math-
ematical background in functional analysis or measure theory. Hence topics from these areas are
developed from scratch, although the developments and discussions often diverge from traditional
treatments in mathematics texts. Some mathematical sophistication is assumed for the frequent
manipulation of deltas and epsilons, and hence some background in elementary real analysis or a
strong calculus knowledge is required.
Acknowledgments
The research in information theory that yielded many of the results and some of the new proofs for
old resultsin thisbookwas supportedbytheNational ScienceFoundation. Portions ofthe research
and much of the early writing were supported by a fellowship from the John Simon Guggenheim
Memorial Foundation.
PREFACE 1
Thebookbenefltedgreatlyfromcommentsfromnumerousstudentsandcolleaguesthroughmany
years: most notably Paul Shields, Lee Davisson, John Kiefier, Dave Neuhofi, Don Ornstein, Bob
Fontana,JimDunham,FarivarSaadat,MariOstendorf,MichaelSabin,PaulAlgoet,WuChou,Phil
Chou, and Tom Lookabaugh. They should not be blamed, however, for any mistakes I have made
in implementing their suggestions.
IwouldalsoliketoacknowledgemydebttoAlDrakeforintroducingmetoelementaryprobability
theory and to Tom Pitcher for introducing me to measure theory. Both are extraordinary teachers.
Finally, I would like to apologize to Lolly, Tim, and Lori for all the time I did not spend with
them while writing this book.
The New Millenium Edition
After a decade and a half I am flnally converting the ancient trofi to LaTex in order to post a
corrected and revised version of the book on the Web. I have received a few requests to do so
since the book went out of print, but the electronic manuscript was lost years ago during my many
migrationsamongcomputersystemsandmylessthanthoroughbackupprecautions. Duringsummer
2001 a thorough search for something else in my Stanford o–ce led to the discovery of an old data
cassette, with a promising inscription. Thanks to assistance from computer wizards Charlie Orgish
and Pat Burke, prehistoric equipment was found to read the cassette and the original trofi flles for
the book were read and converted into LaTeX with some assistance from Kamal Al-Yahya’s and
Christian Engel’s tr2latex program. I am still in the progress of flxing conversion errors and slowly
making long planned improvements.