ebook img

Fault-Tolerant Design PDF

194 Pages·2013·2.309 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Fault-Tolerant Design

Elena Dubrova Fault-Tolerant Design Fault-Tolerant Design Elena Dubrova Fault-Tolerant Design 123 Elena Dubrova KTHRoyal InstituteofTechnology Krista Sweden ISBN 978-1-4614-2112-2 ISBN 978-1-4614-2113-9 (eBook) DOI 10.1007/978-1-4614-2113-9 SpringerNewYorkHeidelbergDordrechtLondon LibraryofCongressControlNumber:2013932644 (cid:2)SpringerScience+BusinessMediaNewYork2013 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionor informationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purposeofbeingenteredandexecutedonacomputersystem,forexclusiveusebythepurchaserofthe work. Duplication of this publication or parts thereof is permitted only under the provisions of theCopyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the CopyrightClearanceCenter.ViolationsareliabletoprosecutionundertherespectiveCopyrightLaw. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) To my mother Preface This textbook serves as an introduction to fault tolerance, intended for upper division undergraduate students, graduate-level students, and practicing engineers in need of an overview of the field. Readers will develop skills in modeling and evaluating fault-tolerant architectures in terms of reliability, availability, and safety. They will gain a thorough understanding of fault-tolerant computing, including both the theory of how to achieve fault tolerance through hardware, software, information, and time redundancy and the practical knowledge of designing fault-tolerant hardware and software systems. Thebookcontainseightchapterscoveringthefollowingtopics.Chapter1isan introduction, discussing the importance of fault tolerance in developing a dependable system. Chapter 2 describes three fundamental characteristics of dependability: attributes, impairment, and means. Chapter 3 introduces depend- ability evaluation techniques and dependability models such as reliability block diagrams and Markov chains. Chapter 4 presents commonly used approaches for the design offault-tolerant hardware systems, such as triple modular redundancy, standby redundancy, and self-purging redundancy and evaluates their effect on system dependability. Chapter 5 shows how fault tolerance can be achieved by means of coding. It covers many important families of codes, including parity, linear, cyclic, unordered, and arithmetic codes. Chapter 6 presents time redun- dancy techniques which can be used for detecting and correcting transient and permanentfaults.Chapter7describesthemainapproachesforthedesignoffault- tolerant software systems, including checkpoint and restart, recovery blocks, N-version programming, and N self-checking programming. Chapter 8 concludes the book. The content is designed to be highly accessible, including numerous examples andproblemstoreinforce the materiallearned.Solutions toproblems andPower- Point slides are available from the author upon request. Stockholm, Sweden, December 2012 Elena Dubrova vii Acknowledgments Thisbookhasevolvedfromthelecturenotesofthecourse‘‘DesignofFault-Tolerant Systems’’thatIhavetaughtattheRoyalInstituteofTechnology(KTH),since2000. Throughouttheyears,manystudentsandcolleagueshavehelpedmepolishthetext. Iamgratefultoallwhogavemefeedback.Inparticular,IwouldliketothankNanLi, ShohrehSharifMansouri,NasimFarahini,BayronNavas,JonianGrazhdani,Xavier Lowagie,PieterNuyts,HenrikKirkeby,ChenFu,KareemRefaat,SergejKoziner, JuliaKuznetsova,ZhonghaiLu,andRomanMorawekforfindingmultiplemistakes and suggesting valuable improvements in the manuscript. I am grateful to Hannu Tenhunenwhoinspiredmetoteachthiscourseandconstantlysupportedmeduring my academic career. I am also indebted to Charles Glaser from Springer for his encouragementandassistanceinpublishingthebookandtoSandraBrunsbergfor herverycarefulproofreadingofthefinaldraft.Specialthankstoafriendofminewho advised metousetheMetaPost toolfordrawingpictures.Indeed,MetaPost gives perfectresults. Finally, I am grateful to the Swedish Foundation for International Cooperation in Research and Higher Education (STINT), for the scholarship KU2002-4044 whichsupportedmytriptotheUniversityofNewSouthWales,Sydney,Australia, where the first draft of this book was written during October–December 2002. ix Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Definition of Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Fault Tolerance and Redundancy. . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Applications of Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . 2 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Fundamentals of Dependability . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Dependability Attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.2 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.3 Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Dependability Impairments. . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 Faults, Errors, and Failures. . . . . . . . . . . . . . . . . . . . . . 9 2.3.2 Origins of Faults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.3 Common-Mode Faults. . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.4 Hardware Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.5 Software Faults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4 Dependability Means. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.1 Fault Tolerance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.2 Fault Prevention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4.3 Fault Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4.4 Fault Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3 Dependability Evaluation Techniques. . . . . . . . . . . . . . . . . . . . . . 21 3.1 Basics of Probability Theory. . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 Common Measures of Dependability. . . . . . . . . . . . . . . . . . . . 24 3.2.1 Failure Rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.2 Mean Time to Failure . . . . . . . . . . . . . . . . . . . . . . . . . 27 xi xii Contents 3.2.3 Mean Time to Repair . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2.4 Mean Time Between Failures. . . . . . . . . . . . . . . . . . . . 30 3.2.5 Fault Coverage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3 Dependability Modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3.1 Reliability Block Diagrams . . . . . . . . . . . . . . . . . . . . . 31 3.3.2 Fault Trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.3 Reliability Graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.4 Markov Processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4 Dependability Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4.1 Reliability Evaluation Using Reliability Block Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4.2 Dependability Evaluation Using Markov Processes. . . . . 39 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4 Hardware Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.1 Redundancy Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2 Passive Redundancy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.2.1 Triple Modular Redundancy. . . . . . . . . . . . . . . . . . . . . 58 4.2.2 N-Modular Redundancy. . . . . . . . . . . . . . . . . . . . . . . . 63 4.3 Active Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.3.1 Duplication with Comparison. . . . . . . . . . . . . . . . . . . . 65 4.3.2 Standby Redundancy. . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.3.3 Pair-And-A-Spare . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.4 Hybrid Redundancy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.4.1 Self-Purging Redundancy. . . . . . . . . . . . . . . . . . . . . . . 76 4.4.2 N-Modular Redundancy with Spares. . . . . . . . . . . . . . . 79 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5 Information Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.1 History. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.2 Fundamental Notions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.2.1 Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.2.2 Encoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.2.3 Information Rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.2.4 Decoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.2.5 Hamming Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.2.6 Code Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.3 Parity Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.3.1 Definition and Properties. . . . . . . . . . . . . . . . . . . . . . . 92 5.3.2 Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.3.3 Horizontal and Vertical Parity Codes. . . . . . . . . . . . . . . 95 Contents xiii 5.4 Linear Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.4.1 Basic Notions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.4.2 Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.4.3 Generator Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.4.4 Parity Check Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.4.5 Syndrome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.4.6 Construction of Linear Codes. . . . . . . . . . . . . . . . . . . . 102 5.4.7 Hamming Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.4.8 Lexicographic Parity Check Matrix. . . . . . . . . . . . . . . . 107 5.4.9 Applications of Hamming Codes . . . . . . . . . . . . . . . . . 108 5.4.10 Extended Hamming Codes. . . . . . . . . . . . . . . . . . . . . . 109 5.5 Cyclic Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.5.1 Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.5.2 Polynomial Manipulation. . . . . . . . . . . . . . . . . . . . . . . 111 5.5.3 Generator Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.5.4 Parity Check Polynomial . . . . . . . . . . . . . . . . . . . . . . . 114 5.5.5 Syndrome Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.5.6 Implementation of Encoding and Decoding . . . . . . . . . . 115 5.5.7 Separable Cyclic Codes. . . . . . . . . . . . . . . . . . . . . . . . 118 5.5.8 Cyclic Redundancy Check Codes . . . . . . . . . . . . . . . . . 121 5.5.9 Reed-Solomon Codes . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.6 Unordered Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.6.1 M-of-N Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.6.2 Berger Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.7 Arithmetic Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 5.7.1 AN Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.7.2 Residue Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6 Time Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.1 Transient Faults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.2 Permanent Faults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 6.2.1 Alternating Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 6.2.2 Recomputing with Modified Operands. . . . . . . . . . . . . . 142 6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7 Software Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 7.1 Software Versus Hardware. . . . . . . . . . . . . . . . . . . . . . . . . . . 157 7.2 Single-Version Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . 159 7.2.1 Fault Detection Techniques . . . . . . . . . . . . . . . . . . . . . 159 7.2.2 Fault Containment Techniques . . . . . . . . . . . . . . . . . . . 160 7.2.3 Fault Recovery Techniques . . . . . . . . . . . . . . . . . . . . . 161

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.