Applied Probability Kenneth Lange Springer Springer Texts in Statistics Advisors: George Casella Stephen Fienberg Ingram Olkin Springer New York Berlin Heidelberg Hong Kong London Milan Paris Tokyo Springer Texts in Statistics Alfrd: Elements of Statistics for the Life and Social Sciences Berger: An Introduction to Probability and Stochastic Processes Bilodeau and Brenner: Theory of Multivariate Statistics Blom: Probability and Statistics: Theory and Applications Brockwell and Davis: Introduction to Times Series and Forecasting, Second Edition Chow and Teicher: Probability Theory: Independence, Interchangeability, Martingales, Third Edition Chrisfensen: Advanced Linear Modeliig: Multivariate, Time Series, and Spatial Data-Nonparamekic Regression and Response Surface Maximization, Second Edition Chrisfensen: Log-Linear Models and Lagistic Regression, Second Edition Chrisfensen: Plane Answers to Complex Questions: The Theory of Linear Models, Third Edition Creighfon:A First Course in Probability Models and Statistical Inference Davis.' Statistical Methods for the Analysis of Repeated Measurements Dean and Vow Design and Analysis of Experiments du Toif,S fqn, and Stump$ Graphical Exploratory Data Analysis Durreft:E ssentials of Stochastic Processes Edwarak Introduction to Graphical Modelling, Second Edition Finkelstein and Levin: Statistics for Lawyers Flury: A First Course in Multivariate Statistics Jobson: Applied Multivariate Data Analysis, Volume I Regression and Experimental Design Jobson: Applied Multivariate Data Analysis, Volume 11: Categorical and Multivariate Methods Kulbjleisch: Probability and Statistical Inference, Volume I: Probability, Second Edition Kalbjleisch: Probability and Statistical Inference, Volume 11: Statistical Inference, Second Edition Karr: Probability Kqfifz: Applied Mathematical Demography, Second Edition Kiefer: Introduction to Statistical Inference Kokoska and Nevison: Statistical Tables and Formulae Kulhrni: Modeling, Analysis, Design, and Control of Stochastic Systems Lunge: Applied Probability Lehmann: Elements of Large-Sample Theory Lehmann: Testing statistical Hypotheses, Second Edition Lehmann and CareNa: Theory of Point Estimation, Second Edition Lindman: Analysis of Variance in Experimental Design Lindsey: Applying Generalized Linear Models (continued aJler index) Kenneth Lange Applied Probability Springer Kenneth Iange Department of Biomathematics UCLA School of Medicine Las Angels, CA 9W5-I766 USA [email protected] Editorial Board George Casella Stephen Fienberg Ingram Olkin Depamnent of Statisti- Depaltmnt of staristics Department of Statistics University of Florida Carnegie Mellon University Stanford University Gainesville, FL. 32611-8545 Pitlsburgh. PA 15213-3890 Stanford. CA 94305 USA USA USA Library of Congress Cataloging-in-Publication Data Lange. Kenneth. Applied probability I Kenneth Lange. p. cm. -(Springer texts in statistics) Includes bibliopphical lefcrrncca and index. ISBN 0-387004254 (Ilk. paper) l. Rohdxlities. 1.S ~octusdsy .1. T ick. R.S eries QA273.U6&1 2W3 5 19.2-dc2 I 2003042436 ISBN CL38740425-4 Rinted on acid-frec paper. @ 2003 Springer-Vedag New Yo&, he. All rightr reserved. This wok my not be kurlated or copied in whole or in part without the wrirtcn permission of the publisher (Sprbger-Verlag New York. Inc.. 175 Fim Avenue. New Yak NY I00LO. USA), ~XCCQfa~r brief CK-~ in cauwtim with wkun OT scholarly analysis. Use in connection with any fomi of infomuon srorage md nuievnl. clmmnic adaptation. somputcr sofware. or by similar M dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of wde names. mdcmarks. service marks. and similar terms. even if they are nor identified as such. is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Rinted in the United States of America. 9 8 7 6 5 4 1 2 1 SPW lawma Typcserting: Pages cmred by Ule author using a Springer TEX maCi-0 package w.springer-ny.cam Springer-Vedag New York Berlin Heidelberg A membcr of BertcLmnnSpringcr Scirnce+Bwimss Media Gm6H Preface Despite the fears of university mathematics departments, mathematics educat,ion is growing rather than declining. But the truth of the matter is that the increases are occurring outside departments of mathematics. Engineers, computer scientists, physicists, chemists, economists, statisti- cians, biologists, and even philosophers teach and learn a great deal of mathematics. The teaching is not always terribly rigorous, but it tends to be better motivated and better adapted to the needs of students. In my own experience teaching students of biostatistics and mathematical biol- ogy, I attempt to convey both the beauty and utility of probability. This is a tall order, partially because probability theory has its own vocabulary and habits of thought. The axiomatic presentation of advanced probability typically proceeds via measure theory. This approach has the advantage of rigor, but it inwitably misses most of the interesting applications, and many applied scientists rebel against the onslaught of technicalities. In the current book, I endeavor to achieve a balance between theory and appli- cations in a rather short compass. While the combination of brevity apd balance sacrifices many of the proofs of a rigorous course, it is still consis- tent with supplying students with many of the relevant theoretical tools. In my opinion, it better to present the mathematical facts without proof rather than omit them altogether. In the preface to his lovely recent textbook (1531, David Williams writes, “Probability and Statistics used to be married; then they separated, then they got divorced; now they hardly see each other.” Although this split is doubtless irreversible, at least we ought to be concerned with properly vi Preface bringing up their children, applied probability and computational statis- tics. If we fail, then science as a whole will suffer. You see before you my attempt to give applied probability the attention it deserves. My other re- cent book (951 covers computational statistics and aspects of computational probability glossed over here. This graduate-level textbook presupposes knowledge of multivariate cal- culus, linear algehra, and ordinary differential equations. In probability theory, students should be comfortable with elementary combinatorics, gen- erating functions, probability densities and distributions, expectations, and conditioning arguments. My intended audience includes graduate students in applied mathematics, biostatistics, computational biology, computer sci- ence, physics, and statistics. Because of the diversity of needs, instructors are encouraged to exercise their own judgment in deciding what chapters and.topics to cover. Chapter 1 reviews elementary probability while striving to give a brief survey of relevant results from measure theory. Poorly prepared students should supplement this material with outside reading. Well-prepared stu- dents can skim Chapter 1 until they reach the less well-knom' material of the final two sections. Section 1.8 develops properties of the multivariate normal distribution of special interest to students in biostatistics and sta- tistics. This material h applied to optimization theory in Section 3.3 and to diffusion processes in Chapter 11. We get down to serious business in Chapter 2, which is an extended essay on calculating expectations. Students often camplain that probability is nothing more than a bag of tricks. For better or worse, they are confronted here with some of those tricks. Readers may want to skip the ha1 two sections of the chapter on surface area distributions on a first pass through the book. Chapter 3 touches on advanced topics from convexity, inequalities, and optimization. Beside the obvious applications to computational statistics, part of the motivation for this material is its applicability in calculating bounds on probabilities and moments. Combinatorics has the odd reputation of being difficult in spite of rely- ing on elementary methods. Chapters 4 and 5 are my stab at making the subject accessible and interesting. There is no doubt in my mind of combi- natorics' practical importance. More and more we live in a world domiuated by discrete bits of information. The stress on algorithms in Chapter 5 is intended to appeal to computer scientists. Chapt,ers 6 through 11 cover core material on stochastic processes that I have taught to students in mathematical biology over a span of many years. If supplemented with appropriate sections from Chapters 1 and 2, there is su6cient material here for a traditional semester-long course in stochastic processes. Although my examples are weighted toward biology, particularly genetics, I have tried to achieve variety. The fortunes of this hook doubtless will hinge on how cornpelling readers find these example. Preface vii You can leaf through the Table of Contents to get a better idea of the topics covered in these chapters. In the final two chapters on Poisson approximation and number the- ory, the applications of probability to other branches of mathematics come to the fore. These chapters are hardly in the mainstream of stocliastic processes and are meant for independent reading as much as for classrootn presentation. All chapters come with exercises. These are not graded by difficulty, but hints are provided for some of the more difficult ones. My own practice is to require one problem for each hour and a half of lecture. Students are allowed to choose among the problems within each chapter and are graded on the best of the solutions they present. This strategy provides incentive for the students to attempt more than the minimum number of problems. I would like to thank my former and current UCLA and University of Michigan students for their help in debngging this text. In retrospect, there were far more contributing students than I can possibly credit. At the risk of offending the many, let me single out Brian Dolan, Ruzong Fan, David Hunter, Wei-hsnn Liao, Ben Redelings, Eric Schadt, Marc Suchard, Janet Sinsheinier, and Andy Ming-Ham Yip. I also thank John Kimmel of Springer-Verlag for his editorial assistance. Finally, I dedicate this book to my mother, Alma Lange, on the occasion of her 80th birthday. Thanks, Mom, for your cheerfulness and generosity in raising me. You were, and always will be, an inspiration to the whole family. Preface to the First Edition When I was a postdoctoral fellow at UCLA more than two decades ago, I learned genetic modeling from the delightful texts of Elandt-Johnson [2] andCavalli-SforzaandBodmer[1].Inteachingmyowngeneticscourseover the past few years,first at UCLA and later at the University of Michigan, Ilongedforanupdatedversionofthesebooks.NeitherappearedandIwas left to my own devices. As my hastily assembled notes gradually acquired morepolish,it occurredtomethat they mightfillausefulniche.Research in mathematical and statistical genetics has been proceeding at such a breathless pace that the best minds in the field would rather create new theories than take time to codify the old. It is also far more profitable to write another grant proposal. Needless to say, this state of affairs is not ideal for students, who are forced to learn by wading unguided into the confusing swamp of the current scientific literature. Having set the stage for nobly rescuing a generation of students, let me injectanoteofhonesty.Thisbookisnotthemonumentalsynthesisofpop- ulation genetics and genetic epidemiology achieved by Cavalli-Sforza and Bodmer. It is also not the sustained integration of statistics and genetics achieved by Elandt-Johnson. It is not even a compendium of recommen- dations for carrying out a genetic study, useful as that may be. My goal is different and more modest. I simply wish to equip students already so- phisticated in mathematics and statistics to engage in genetic modeling. These are the individuals capable of creating new models and methods for analyzing genetic data. No amount of expertise in genetics can over- come mathematical and statistical deficits. Conversely, no mathematician or statistician ignorant of the basic principles of genetics can ever hope to identify worthy problems. Collaborations between geneticists on one side and mathematicians and statisticians on the other can work, but it takes patience and a willingness to learn a foreign vocabulary. So what are my expectations of readers and students? This is a hard question to answer, in part because the level of the mathematics required builds as the book progresses. At a minimum, readers should be familiar withnotionsoftheoreticalstatisticssuchaslikelihoodandBayes’theorem. Calculus and linear algebra are used throughout. The last few chapters make fairly heavy demands on skills in theoretical probability and combi- natorics. For a few subjects such as continuous time Markov chains and Poisson approximation, I sketch enough of the theory to make the expo- sition of applications self-contained. Exposure to interesting applications shouldwhetstudents’appetitesforself-studyoftheunderlyingmathemat- x Preface ics.Everythingconsidered,Irecommendthatinstructorscoverthechapters in the orderindicated and determine the speed of the course by the math- ematicalsophisticationofthe students. Thereis morethan amplematerial here for a full semester, so it is pointless to rush through basic theory if studentsencounterdifficultyearlyon.Laterchapterscanbecoveredatthe discretion of the instructor. The matter of biological requirements is also problematic. Neither the brief review of population genetics in Chapter 1 nor the primer of molecu- lar genetics in Appendix A is a substitute for a rigorous course in modern genetics. Although many of my classroom students have had little prior exposure to genetics, I have always insisted that those intending to do re- search fill in the gaps in their knowledge. Students in the mathematical sciences occasionallycomplain to me that learning genetics is hopeless be- causethefieldisinsuchrapidflux.WhileIamsympathetictothedifficult intellectualhurdlesaheadofthem,thisattitudeisaprescriptionforfailure. Althoughgeneticslacksthetheoreticalcoherenceofmathematics,thereare fundamentalprinciples and crucialfacts that will neverchange.My advice is follow your curiosityand learnas much genetics as you can. In scientific researchchance always favors the well prepared. The incredible flowering of mathematical and statistical genetics over the past two decades makes it impossible to summarize the field in one book. I am acutely aware of my failings in this regard, and it pains me to excludemostofthehistoryofthesubjectandtoleaveunmentionedsomany important ideas. I apologize to my colleagues. My own work receives too much attention; my only excuse is that I understand it best. Fortunately, the recent book of Michael Waterman delves into many of the important topics in molecular genetics missing here [4]. I have many people to thank for helping me in this endeavor. Carol Newtonnurturedmyearlycareerinmathematicalbiologyandencouraged me to write abook in the first place. Daniel Weeks and EricSobel deserve specialcreditfortheirmanyhelpfulsuggestionsforimprovingthetext.My genetics colleagues David Burke, Richard Gatti, and Miriam Meisler read andcorrectedmyfirstdraftofAppendixA.DavidCox,RichardGatti,and James Lake kindly contributed data. Janet Sinsheimer and Hongyu Zhao provided numerical examples for Chapters 10 and 12, respectively. Many students at UCLA and Michigan checked the problems and proofread the text. Let me single out Ruzong Fan, Ethan Lange, Laura Lazzeroni, Eric Schadt, JanetSinsheimer, Heather Stringham, and Wynn Walker for their diligence. David Hunter kindly prepared the index. Doubtless a few errors remain, and I would be grateful to readers for their corrections. Finally, I thank my wife, Genie, to whom I dedicate this book, for her patience and love.