Formalized Probability Theory and Applications Using Theorem Proving Osman Hasan National University of Sciences and Technology, Pakistan Sofiène Tahar Concordia University, Canada Managing Director: Lindsay Johnston Managing Editor: Austin DeMarco Director of Intellectual Property & Contracts: Jan Travers Acquisitions Editor: Kayla Wolfe Production Editor: Christina Henning Typesetter: Amanda Smith Cover Design: Jason Mull Published in the United States of America by Information Science Reference (an imprint of IGI Global) 701 E. Chocolate Avenue Hershey PA 17033 Tel: 717-533-8845 Fax: 717-533-8661 E-mail: [email protected] Web site: http://www.igi-global.com Copyright © 2015 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher. Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark. Library of Congress Cataloging-in-Publication Data Hasan, Osman, 1975- Formalized probability theory and applications using theorem proving / by Osman Hasan and Sofiène Tahar. pages cm Includes bibliographical references and index. ISBN 978-1-4666-8315-0 (hardcover) -- ISBN 978-1-4666-8316-7 (ebook) 1. Computer systems--Evaluation. 2. Automatic theorem proving. 3. Stochastic analysis--Data processing. I. Tahar, Sofiène, 1966- II. Title. QA76.9.E95H37 2015 004.029--dc23 2015006750 British Cataloguing in Publication Data A Cataloguing in Publication record for this book is available from the British Library. All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the authors, but not necessarily of the publisher. To our Families. Table of Contents Preface.................................................................................................................viii ; ; Acknowledgment.................................................................................................xii ; ; Chapter 1 ; Probabilistic.Analysis.............................................................................................1 ; ; 1.1..MOTIVATION....................................................................................................................2 1.2..RANDOMIZED.MODELS................................................................................................2 1.3..PROBABILISTIC.PROPERTIES.......................................................................................3 1.4..STATISTICAL.PROPERTIES...........................................................................................4 1.5..TRADITIONAL.PROBABILISTIC.ANALYSIS.METHODS..........................................6 1.6..CONCLUSION...................................................................................................................8 Chapter 2 ; Formal.Verification.Methods................................................................................10 ; ; 2.1..INTRODUCTION............................................................................................................11 2.2..MODEL.CHECKING.......................................................................................................12 2.3..THEOREM.PROVING.....................................................................................................14 2.4.CONCLUSION..................................................................................................................18 Chapter 3 ; Probabilistic.Analysis.Using.Theorem.Proving....................................................21 ; ; 3.1.METHODOLOGY............................................................................................................22 3.2.HOL4.THEOREM.PROVER............................................................................................25 3.3.CONCLUSION..................................................................................................................28 Chapter 4 ; Measure.Theory.and.Lebesgue.Integration.Theories...........................................29 ; ; 4.1.FORMALIZATION.OF.EXTENDED.REAL.NUMBERS...............................................30 4.2.FORMALIZATION.OF.MEASURE.THEORY................................................................33 4.3.FORMALIZATION.OF.LEBESGUE.INTEGRATION.IN.HOL.....................................41 4.4.CONCLUSION..................................................................................................................45 Chapter 5 ; Probability.Theory................................................................................................47 ; ; 5.1.FORMALIZATION.OF.PROBABILITY.THEORY........................................................48 5.2.FORMALIZATION.OF.STATISTICAL.PROPERTIES...................................................50 5.3.HEAVY.HITTER.PROBLEM...........................................................................................52 5.4.FORMALIZATION.OF.CONDITIONAL.PROBABILITIES..........................................57 5.5.CONCLUSION..................................................................................................................62 Chapter 6 ; Discrete-Time.Markov.Chains.in.HOL.................................................................65 ; ; 6.1.FORMALIZATION.OF.DISCRETE-TIME.MARKOV.CHAIN......................................66 6.2.FORMAL.VERIFICATION.DTMC.PROPERTIES.........................................................69 6.3.FORMALIZATION.OF.STATIONARY.DISTRIBUTIONS............................................73 6.4.FORMALIZATION.OF.STATIONARY.PROCESS.........................................................75 6.5.BINARY.COMMUNICATION.MODEL..........................................................................77 6.6.AMQM.PROTOCOL.........................................................................................................80 6.7.CONCLUSION..................................................................................................................85 Chapter 7 ; Classified.Discrete-Time.Markov.Chains.............................................................87 ; ; 7.1.FORMALIZATION.OF.CLASSIFIED.STATES..............................................................88 7.2.FORMALIZATION.OF.CLASSIFIED.DTMCs...............................................................90 7.3.FORMAL.VERIFICATION.OF.LONG-TERM.PROPERTIES.......................................91 7.4.APPLICATIONS...............................................................................................................95 7.5.CONCLUSION................................................................................................................113 Chapter 8 ; Formalization.of.Hidden.Markov.Model............................................................116 ; ; 8.1.DEFINITION.OF.HMM..................................................................................................117 8.2.HMM.PROPERTIES.......................................................................................................119 8.3.PROOF.AUTOMATION.................................................................................................122 8.4.APPLICATION:.DNA.SEQUENCE.ANALYSIS...........................................................123 8.5.CONCLUSION................................................................................................................127 Chapter 9 ; Information.Measures.........................................................................................129 ; ; 9.1.FORMALIZATION.OF.RADON-NIKODYM.DERIVATIVE......................................130 9.2.FORMALIZATION.OF.KULLBACK-LEIBLER.DIVERGENCE................................132 9.3.FORMALIZATION.OF.MUTUAL.INFORMATION....................................................133 9.4.ENTROPY.......................................................................................................................134 9.5.FORMALIZATION.OF.CONDITIONAL.MUTUAL.INFORMATION........................135 9.6.FORMALIZATION.OF.QUANTITATIVE.ANALYSIS.OF.INFORMATION.............137 9.7.CONCLUSION................................................................................................................140 Chapter 10 ; Formal.Analysis.of.Information.Flow.Using.Min-Entropy.and.Belief.Min- Entropy................................................................................................................143 ; ; 10.1.INFORMATION.FLOW.ANALYSIS...........................................................................144 10.2.FORMALIZATION.OF.MIN-ENTROPY.AND.BELIEF.MIN-ENTROPY................146 10.3.FORMAL.ANALYSIS.OF.INFORMATION.FLOW...................................................149 10.4.APPLICATION:.CHANNELS.IN.CASCADE.............................................................153 10.5.CONCLUSION..............................................................................................................156 Chapter 11 ; Applications.of.Formalized.Information.Theory................................................159 ; ; 11.1.DATA.COMPRESSION................................................................................................160 11.2.ANONYMITY-BASED.SINGLE.MIX.........................................................................167 11.3.ONE-TIME.PAD............................................................................................................171 11.4.CONCLUSION..............................................................................................................176 Chapter 12 ; Reliability.Theory...............................................................................................179 ; ; 12.1.LIFETIME.DISTRIBUTIONS......................................................................................180 12.2.CUMULATIVE.DISTRIBUTION.FUNCTION...........................................................181 12.3.SURVIVAL.FUNCTION...............................................................................................183 12.4.RELIABILITY.BLOCK.DIAGRAMS..........................................................................185 12.5.APPLICATIONS...........................................................................................................197 12.6.CONCLUSION..............................................................................................................205 Chapter 13 ; Scheduling.Algorithm.for.Wireless.Sensor.Networks........................................208 ; ; 13.1.COVERAGE-BASED.RANDOMIZED.SCHEDULING.ALGORITHM....................209 13.2.FORMAL.ANALYSIS.OF.THE.K-SET.RANDOMIZED.SCHEDULING.................210 13.3.FORMAL.ANALYSIS.OF.WSN.FOR.FOREST.FIRE.DETECTION.........................218 13.4.CONCLUSION..............................................................................................................225 Chapter 14 ; Formal.Probabilistic.Analysis.of.Detection.Properties.in.Wireless.Sensor. Networks.............................................................................................................228 ; ; 14.1.DETECTION.OF.A.WIRELESS.SENSOR.NETWORK.............................................229 14.2.DETECTION.PROPERTIES.........................................................................................231 14.3.WSN.FOR.BORDER.SURVEILLANCE......................................................................247 14.4.CONCLUSION..............................................................................................................256 Conclusion.........................................................................................................259 ; ; Related References............................................................................................263 ; ; Compilation of References...............................................................................287 ; ; About the Authors.............................................................................................295 ; ; Index...................................................................................................................296 ; ; viii Preface Probabilistic analysis is a tool of fundamental importance for virtually all scientists and engineers as they often have to deal with systems that exhibit random or unpre- dictable elements. Traditionally, computer simulation techniques are used to perform probabilistic analysis. However, they provide less accurate results and cannot handle large-scale problems due to their enormous computer processing time re- quirements. To overcome these limitations, this book presents a rather novel idea to perform probabilistic analysis by formally specifying the behavior of random systems in higher-order logic and using these formal models for verifying the in- tended probabilistic and statistical properties in a computer based theorem prover. The analysis carried out in this way is free from any approximation or precision issues due to the mathematical nature of the models and the inherent soundness of the theorem proving approach. The book presents the higher-order-logic formalizations of foundational math- ematical theories for conducting probabilistic analysis. These foundations mainly include measure, Lebesgue integration, probability, Markov chain, information and reliability theories. The most important notions in these theories have been defined in higher-order logic and most of their commonly used characteristics are then formally verified within the sound core of the HOL4 theorem prover. This formalization can be utilized to conduct accurate probabilistic analysis of real-world systems and for illustration purposes the book presents several examples. The book starts with a brief introduction to the foundations (Chapters 1-3). We can divide the contents of the rest of the book into five main formalizations: Probability Theory (Chapters 4-5), Discrete-Time Markov Chains (DTMCs) (Chapters 6-8), Information Theory (Chapters 9-11), Reliability Theory (Chapter 12), and Wireless Sensor Network (WSN) Analysis (Chapters 13-14). The last four formalizations do not have any inter-dependency and thus can be read in any order after reading the first five chapters of the book. More details of the chapters are as follows: ix Chapter 1 provides some background information related to the domains of proba- bilistic analysis and traditional analysis methods, like paper-and-pencil methods, simulation, and computer algebra systems. The intent is to introduce the foundations that we build upon in the rest of the manuscript. Chapter 2 provides a general overview of formal verification methods. In particu- lar, the two most commonly used formal methods (i.e., model checking and theorem proving) are introduced along with some examples. The chapter also includes some convincing arguments about using higher-order-logic theorem proving for conduct- ing probabilistic analysis. Chapter 3 presents the proposed methodology followed throughout the book for conducting probabilistic analysis including an overview about the HOL4 theorem prover, which is the main tool of focus in this book. The main reasons for this choice include the availability of foundational probabilistic analysis formalizations in HOL4 along with a very comprehensive support for real and set theoretic reasoning. The chapter also provides some of the frequently used HOL4 symbols in this manuscript. Next, Chapter 4 provides the higher-order-logic formalization of the foundational theories of measure and Lebesgue integration. These theories are based on extended- real numbers (real numbers with ±∞). This allows us to define sigma-finite and even infinite measures and handle extended-real-valued measurable functions. It also allows us to verify the properties of the Lebesgue integral and its convergence theorems for arbitrary functions. We build upon the higher-order-logic foundations, presented in the last chapter, to formalize, in Chapter 5, the probability theory in higher-order logic. This chapter also includes the formalizations of statistical properties, like expectation and vari- ance, as well as conditional probability, and provides our first example of formal probabilistic analysis of the Heavy Hitter problem. Here the Heavy Hitter problem is formalized in higher-order logic and based on this formalization; some of its commonly used properties are formally verified. In Chapter 6, we build upon the formalizations of the last two chapters and provide the higher-order-logic formalizations of Discrete-Time Markov Chains (DTMCs) and stationary distributions. These results are then used to conduct the formal probabilistic analysis of a binary communication channel and the Automatic Mail Quality Measurement (AMQM) protocol. These examples illustrate how to construct formal Markovian models of the given system and how to analyze it within a theorem prover. A comprehensive discussion about the comparison of model checking and theorem proving for formal probabilistic analysis is also included in this chapter. Chapter 7 extends the DTMC formalization of Chapter 6 and presents the formal- izations of classified states and classified DTMCs. We then use these formalizations to formally verify long-term properties, such as positive transition probability and