Alessandro Birolini Reliability Engineering Theory and Practice Seventh Edition Reliability Engineering Alessandro Birolini Reliability Engineering Theory and Practice Seventh Edition With 190 Figures, 60 Tables, 140 Examples, and 70 Problems for Homework 123 Prof.Dr. Alessandro Birolini* CentroStorico—Bargello I-50122Firenze Tuscany,Italy [email protected] www.ethz.ch/people/whoiswho, www.birolini.ch *Ingénieur etpenseur, Ph.D.,Professor Emeritus ofReliability Eng. attheSwiss Federal Instituteof Technology(ETH), Zurich ISBN 978-3-642-39534-5 ISBN 978-3-642-39535-2 (eBook) DOI 10.1007/978-3-642-39535-2 SpringerHeidelbergNewYorkDordrechtLondon LibraryofCongressControlNumber:2013945800 (cid:2)Springer-VerlagBerlinHeidelberg1994,1997,1999,2004,2007,2010,2014 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionor informationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purposeofbeingenteredandexecutedonacomputersystem,forexclusiveusebythepurchaserofthe work. Duplication of this publication or parts thereof is permitted only under the provisions of theCopyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the CopyrightClearanceCenter.ViolationsareliabletoprosecutionundertherespectiveCopyrightLaw. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) "La chance vient à l'esprit qui est prêt à la recevoir." 1) Louis Pasteur "Quand on aperçoit combien la somme de nos "ignorances dépasse celle de nos connaissances, "on se sent peu porté à conclure trop vite." 2) Louis De Broglie "One has to learn to consider causes rather than "symptoms of undesirable events and avoid hypo- "critical attitudes." Alessandro Birolini 1) "Opportunity comes to the intellect which is ready to receive it." 2) "When one recognizes how much the sum of our ignorance 2) "exceeds that of our knowledge, one is less ready to draw rapid 2) "conclusions." Preface to the 7th Edition The large interest granted to the 6th edition (over 2000 on-line requests per year) incited me for a 7th and last edition of this book (11 editions with the 4 German editions 1985 - 97). The book shows how to build in, evaluate, and demonstrate reliability, maintainability, and availability of components, equipment, and systems. It presents the state-of-the-art of reliability engineering, both in theory and practice, and is based on the author's more than 30 years experience in this field, half in industry (part of which in setting up the Swiss Test. Lab. for VLSI, 1979-83 in Neuchâtel) and half as Professor of Reliability Engineering at the Swiss Federal Institute of Technology (ETH), Zurich. Considering that performance, dependability, cost, and time to market are key factors for today's products and services, but also that failure of complex systems can have major safety consequences, reliability engineering becomes a necessary support in developing and producing complex equipment and systems. The structure of the book has been conserved through all editions, with main Chapters 1 to 8 and Appendices A1 to A11 (A10 & A11 since the 5th Edition 2007). Chapters 2, 4, and 6 deal carefully with analytical investigations, Chapter 5 with design guidelines, Chapters 3 and 7 with tests, and Chapter 8 with activities during production. Appendix A1 defines and comment on the terms commonly used in reliability engineering. Appendices A2-A5 have been added to support managers in answering the question of how to specify and achieve high reliability (RAMS) targets for complex equipment and systems. Appendices A6-A8 are a compendium of probability theory, stochastic processes, and mathematical statistics, as necessary for Chapters 2, 4, 6, and 7, consistent from a mathematical point of view but still with reliability engineering applications in mind (demonstration of established theorems is referred, and for all other propositions or equations, sufficient details for complete demonstration are given). Appendix A9 includes statistical tables, Laplace transforms, and probability charts. Appendix A10 resumes basic technological component's properties, and Appendix A11 gives a set of 70 problems for homework. This structure makes the book self contained as a text book for postgraduate students or courses in industry (Fig. 1.9 on p. 24), allows a rapid access to practical results (as a desktop reference), and offers to theoretically oriented readers all mathematical tools to continue research in this field. The book covers many aspects of reliability engineering using a common language, and has been improved step by step. Methods & tools are given in a way that they can be tailored to cover different reliability requirement levels, and be used for safety analysis too. A large number of tables (60), figures (190), and examples (210 of which 70 as problems for homework), as well as comprehensive reference list and index, amply support the text. This last edition reviews, refines, and extends all previous editions. New in particular includes: • A strategy to mitigate incomplete coverage (p. 255), yielding new models (Table 6.12 c & d, p. 256). • A comprehensive introduction to human reliability with a set of design guidelines to avoid human errors (pp. 158-159) and new models combining human errors probability and time to accomplish a task, based on semi-Markov processes (pp. 294-298). • An improvement of the design guidelines for maintainability (pp. 154-158). • An improvement of reliability allocation using Lagrange multiplier to consider cost aspects (p. 67). • A comparison of four repair strategies (Table 4.4, p. 141). • A comparison of basic models for imperfect switching (Table 6.11, p. 248). • A refinement of approximate expressions, of concepts related to regenerative processes, and of the use and limitations of stochastic processes in modeling reliability problems (e.g. Table 6.1, p. 171). • New is also that relevant statements and rules have been written cursive and centered on the text. Furthermore, • Particular importance has been given to the selection of design guidelines and rules, the devel- opment of approximate expressions for large series-parallel systems, the careful simplification of exact results to allow in-depth trade off studies, and the investigation of systems with complex structure (preventive maintenance, imperfect switching, incomplete coverage, elements with more than one failure mode, fault tolerant reconfigurable systems, common cause failures). VII VIII • The central role of software quality assurance for complex equipment and systems is highlighted. • The use of interarrival times starting by x=0 at each occurrence of the event considered, instead of the variable t, giving a sense to MTBF and allowing the introduction of a failure rate λ(x) and a mean time to failure MTTF also for repairable systems, is carefully discussed (pp. 5-6, 41, 175, 316, 341, 378, 380) and consequently applied. Similar is for the basic difference between failure rate, (probability) density, and renewal density or intensity of a point process (pp. 7, 378, 426, 466, 524). In this context, the assumption as-good-as-new after repair is critically discussed wherever necessary, and the historical distinction between nonrepairable and repairable items is scaled down (removed for reliability function, failure rate, MTTF, and MTBF); national and international standards should better consider this fact and avoid definitions intrinsically valid only for constant (time independent) failure rates. • Also valid is the introduction since the 1st edition of indices Si for reliability figures at system level (e. g.MTTFSi) ,where S stands for system and i is the state entered at t=0 (system referring to the highest integration level of the item considered, and t=0 being the beginning of observations, x=0 for interarrival times). This is mandatory for judicious investigations at the system level. • In agreement with the practical applications, MTBF is reserved for MTBF=1/λ. • Important prerequisites for accelerated tests are carefully discussed (pp. 329-334), in particular to transfer an acceleration factor A from the MTTF (MTTF=A.MTTF ) to the (random) failure- free time τ (τ =A.τ ). 1 2 1 2 • Asymptotic & steady-state is used for stationary, by assuming irreducible embedded chains; repair for restoration, by neglecting administrative, logistical, technical delays; mean for expected value. For reliability applications, pairwise independence assures, in general, totally (mutually, statisti- cally, stochastically) independence, independent is thus used for totally independent. The book has growth from about 400 to 600 pages, with main improvements in the 4th to 7th Editions. • 4th Edition: Complete review and general refinements. • 5th Edition: Introduction to phased-mission systems, common cause failures, Petri nets, dynamic FTA, nonhomogeneous Poisson processes, and trend tests; problems for homework. • 6th Edition: Proof of Eqs. (6.88) & (6.94), introduction to network reliability, event trees & binary decision diagrams, extensions of maintenance strategies and incomplete coverage, refinements for large complex systems and approximate expressions. The launching of the 6th Edition of this book coincided with my 70th anniversary, this was celebrated with a special Session at the 12th Int. Conf. on Quality and Dependability CCF2010 held in Sinaia (RO), 22-24 September 2010. My response to the last question at the interview [1.0] given to Prof. Dr. Ioan C. Bacivarov, Chairman of the International Scientific Committee of CCF2010, can help to explain the acceptance of this book: "Besides more than 15 years experience in the industry, and a predisposition to be a self-taught man, my attitude to life was surely an important key for the success of my book. This is best expressed in the three sentences given on the first page of this book. These sentences, insisting on generosity, modesty and responsibility apply quite general to a wide class of situations and people, from engineers to politicians, and it is to hope that the third sentence, in particular, will be considered by a growing number of humans, now, in front of the ecological problems we are faced and in front of the necessity to create a federal world wide confederation of democratic states in which freedom is primarily respect for the other." The comments of many friends and the agreeable cooperation with Springer-Verlag are gratefully acknowledged. Looking back to all editions (1st German 1985), thanks are due, in particular, to K.P. LaSala for reviewing the 4th & 6th Editions [1.17], I.C. Bacivarov for reviewing the 6th Edition [1.0], book reviewers of the German editions, P. Franken and I. Kovalenko for commenting Appendices A6- A8, A. Bobbio F. Bonzanigo, M. Held for supporting numerical evaluations, J. Thalhammer for supporting the edition of all figures, and L. Lambert for reading final manuscripts. Zurich and Florence, September 13, 2013 Alessandro Birolini Contents 1 Basic Concepts, Quality &Reliability (RAMS) Assurance of Complex Equip. & Systems. . 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Basic Concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Reliability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.2 Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.3 Failure Rate, MTTF, MTBF . . . . . . . . . . . . . . . . . . . . . . 4 1.2.4 Maintenance, Maintainability . . . . . . . . . . . . . . . . . . . . . 8 1.2.5 Logistic Support . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.6 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.7 Safety, Risk, and Risk Acceptance . . . . . . . . . . . . . . . . . . . 9 1.2.8 Quality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.9 Cost and System Effectiveness. . . . . . . . . . . . . . . . . . . . 11 1.2.10 Product Liability. . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2.11 Historical Development. . . . . . . . . . . . . . . . . . . . . . . 16 1.3 Basic Tasks & Rules for Quality & Rel. (RAMS) Assurance of Complex Eq. & Systems . 17 1.3.1 Quality and Reliability (RAMS) Assurance Tasks . . . . . . . . . . . . . 17 1.3.2 Basic Quality and Reliability (RAMS) Assurance Rules . . . . . . . . . . . 19 1.3.3 Elements of a Quality Assurance System. . . . . . . . . . . . . . . . . . 21 1.3.4 Motivation and Training . . . . . . . . . . . . . . . . . . . . . . . 24 2 Reliability Analysis During the Design Phase (Nonrepairable Elements up to System Failure) . . 25 2.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2 Predicted Reliability of Equipment and Systems with Simple Structure . . . . . . . 28 2.2.1 Required Function . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.2 Reliability Block Diagram . . . . . . . . . . . . . . . . . . . . . . 28 2.2.3 Operating Conditions at Component Level, Stress Factors . . . . . . . . . 33 2.2.4 Failure Rate of Electronic Components . . . . . . . . . . . . . . . . . 35 2.2.5 Reliability of One-Item Structures . . . . . . . . . . . . . . . . . . . 39 2.2.6 Reliability of Series-Parallel Structures . . . . . . . . . . . . . . . . . 41 2.2.6.1 Systems without Redundancy . . . . . . . . . . . . . . . . . 41 2.2.6.2 Concept of Redundancy . . . . . . . . . . . . . . . . . . . 42 2.2.6.3 Parallel Models . . . . . . . . . . . . . . . . . . . . . . 43 2.2.6.4 Series - Parallel Structures . . . . . . . . . . . . . . . . . . 45 2.2.6.5 Majority Redundancy . . . . . . . . . . . . . . . . . . . . 49 2.2.7 Part Count Method . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.3 Reliability of Systems with Complex Structure . . . . . . . . . . . . . . . . . 52 2.3.1 Key Item Method . . . . . . . . . . . . . . . . . . . . . . . . . 52 2.3.1.1 Bridge Structure . . . . . . . . . . . . . . . . . . . . . . 53 2.3.1.2 Rel. Block Diagram in which Elements Appear More than Once . . . 54 2.3.2 Successful Path Method . . . . . . . . . . . . . . . . . . . . . . . 55 2.3.3 State Space Method . . . . . . . . . . . . . . . . . . . . . . . . 56 2.3.4 Boolean Function Method . . . . . . . . . . . . . . . . . . . . . . 57 2.3.5 Parallel Models with Constant Failure Rates and Load Sharing . . . . . . . 61 2.3.6 Elements with more than one Failure Mechanism or one Failure Mode . . . . 64 2.3.7 Basic Considerations on Fault Tolerant Structures . . . . . . . . . . . . 66 2.4 Reliability Allocation and Optimization . . . . . . . . . . . . . . . . . . . 67 IX X Contents 2.5 Mechanical Reliability, Drift Failures . . . . . . . . . . . . . . . . . . . . 68 2.6 Failure Modes Analyses. . . . . . . . . . . . . . . . . . . . . . . . . . 72 2.7 Reliability Aspects in Design Reviews . . . . . . . . . . . . . . . . . . . . 77 3 Qualification Tests for Components and Assemblies . . . . . . . . . . . . . . . . 81 3.1 Basic Selection Criteria for Electronic Components . . . . . . . . . . . . . . . 81 3.1.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.1.2 Performance Parameters . . . . . . . . . . . . . . . . . . . . . . 84 3.1.3 Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.1.4 Manufacturing Quality . . . . . . . . . . . . . . . . . . . . . . . 86 3.1.5 Long-Term Behavior of Performance Parameters. . . . . . . . . . . . . 86 3.1.6 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.2 Qualification Tests for Complex Electronic Components . . . . . . . . . . . . . 87 3.2.1 Electrical Test of Complex ICs . . . . . . . . . . . . . . . . . . . . 88 3.2.2 Characterization of Complex ICs . . . . . . . . . . . . . . . . . . . 90 3.2.3 Environmental and Special Tests of Complex ICs. . . . . . . . . . . . . 92 3.2.4 Reliability Tests. . . . . . . . . . . . . . . . . . . . . . . . . . 101 3.3 Failure Modes, Mechanisms, and Analysis of Electronic Components . . . . . . . 101 3.3.1 Failure Modes of Electronic Components. . . . . . . . . . . . . . . . 101 3.3.2 Failure Mechanisms of Electronic Components . . . . . . . . . . . . . 102 3.3.3 Failure Analysis of Electronic Components . . . . . . . . . . . . . . . 102 3.3.4 Present VLSI Production-Related Reliability Problems . . . . . . . . . . 106 3.4 Qualification Tests for Electronic Assemblies . . . . . . . . . . . . . . . . . 107 4 Maintainability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.1 Maintenance, Maintainability . . . . . . . . . . . . . . . . . . . . . . . 112 4.2 Maintenance Concept. . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.2.1 Fault Detection (Recognition) and Localization. . . . . . . . . . . . . . 116 4.2.2 Equipment and Systems Partitioning. . . . . . . . . . . . . . . . . . 118 4.2.3 User Documentation . . . . . . . . . . . . . . . . . . . . . . . . 118 4.2.4 Training of Operation and Maintenance Personnel . . . . . . . . . . . . 119 4.2.5 User Logistic Support . . . . . . . . . . . . . . . . . . . . . . . 119 4.3 Maintainability Aspects in Design Reviews . . . . . . . . . . . . . . . . . . 121 4.4 Predicted Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.4.1 Calculation of MTTR . . . . . . . . . . . . . . . . . . . . . . . 121 S 4.4.2 Calculation of MTTPM . . . . . . . . . . . . . . . . . . . . . . 125 S 4.5 Basic Models for Spare Parts Provisioning . . . . . . . . . . . . . . . . . . 125 4.5.1 Centralized Logistic Support, Nonrepairable Spare Parts. . . . . . . . . . 125 4.5.2 Decentralized Logistic Support, Nonrepairable Spare Parts . . . . . . . . . 129 4.5.3 Repairable Spare Parts . . . . . . . . . . . . . . . . . . . . . . . 130 4.6 Maintenance Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 134 4.6.1 Complete renewal at each maintenance action . . . . . . . . . . . . . . 134 4.6.2 Block replacement with minimal repair at failure. . . . . . . . . . . . . 138 4.6.3 Further considerations on maintenance strategies . . . . . . . . . . . . 139 4.7 Basic Cost Considerations . . . . . . . . . . . . . . . . . . . . . . . . 142 5 Design Guidelines for Reliability, Maintainability, and Software Quality . . . . . . . 144 5.1 Design Guidelines for Reliability . . . . . . . . . . . . . . . . . . . . . . 144 5.1.1 Derating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Contents XI 5.1.2 Cooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5.1.3 Moisture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 5.1.4 Electromagnetic Compatibility, ESD Protection . . . . . . . . . . . . . 148 5.1.5 Components and Assemblies. . . . . . . . . . . . . . . . . . . . . 150 5.1.5.1 Component Selection . . . . . . . . . . . . . . . . . . . . 150 5.1.5.2 Component Use . . . . . . . . . . . . . . . . . . . . . . 150 5.1.5.3 PCB and Assembly Design . . . . . . . . . . . . . . . . . . 151 5.1.5.4 PCB and Assembly Manufacturing. . . . . . . . . . . . . . . 152 5.1.5.5 Storage and Transportation . . . . . . . . . . . . . . . . . . 153 5.1.6 Particular Guidelines for IC Design and Manufacturing . . . . . . . . . . 153 5.2 Design Guidelines for Maintainability . . . . . . . . . . . . . . . . . . . . 154 5.2.1 General Guidelines . . . . . . . . . . . . . . . . . . . . . . . . 154 5.2.2 Testability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 5.2.3 Connections, Accessibility, Exchangeability. . . . . . . . . . . . . . . 157 5.2.4 Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 5.2.5 Human, Ergonomic, and Safety Aspects . . . . . . . . . . . . . . . . 158 5.3 Design Guidelines for Software Quality . . . . . . . . . . . . . . . . . . . 159 5.3.1 Guidelines for Software Defect Prevention . . . . . . . . . . . . . . . 162 5.3.2 Configuration Management . . . . . . . . . . . . . . . . . . . . . 165 5.3.3 Guidelines for Software Testing . . . . . . . . . . . . . . . . . . . 166 5.3.4 Software Quality Growth Models . . . . . . . . . . . . . . . . . . . 166 6 Reliability and Availability of Repairable Systems . . . . . . . . . . . . . . . . 169 6.1 Introduction, General Assumptions, Conclusions . . . . . . . . . . . . . . . 169 6.2 One-Item Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 6.2.1 One-Item Structure New at Time t=0. . . . . . . . . . . . . . . . . 176 6.2.1.1 Reliability Function. . . . . . . . . . . . . . . . . . . . . 176 6.2.1.2 Point Availability . . . . . . . . . . . . . . . . . . . . . 177 6.2.1.3 Average Availability . . . . . . . . . . . . . . . . . . . . 178 6.2.1.4 Interval Reliability . . . . . . . . . . . . . . . . . . . . . 179 6.2.1.5 Special Kinds of Availability . . . . . . . . . . . . . . . . . 180 6.2.2 One-Item Structure New at Time t=0 and with Constant Failure Rate λ . . . 183 6.2.3 One-Item Structure with Arbitrary Conditions at t=0 . . . . . . . . . . 184 6.2.4 Asymptotic Behavior . . . . . . . . . . . . . . . . . . . . . . . 185 6.2.5 Steady-State Behavior. . . . . . . . . . . . . . . . . . . . . . . . 187 6.3 Systems without Redundancy . . . . . . . . . . . . . . . . . . . . . . . . 189 6.3.1 Series Structure with Constant Failure and Repair Rates . . . . . . . . . . 189 6.3.2 Series Structure with Constant Failure and Arbitrary Repair Rates . . . . . . 192 6.3.3 Series Structure with Arbitrary Failure and Repair Rates . . . . . . . . . . 193 6.4 1-out-of-2 Redundancy (Warm, one Repair Crew) . . . . . . . . . . . . . . . . 196 6.4.1 1-out-of-2 Redundancy with Constant Failure and Repair Rates . . . . . . . 196 6.4.2 1-out-of-2 Redundancy with Constant Failure and Arbitrary Rep. Rates . . . . 204 6.4.3 1-out-of-2 Red. with Const. Failure Rate in Reserve State & Arbitr. Rep. Rates . 207 6.5 k-out-of-n Redundancy (Warm, Identical Elements, one Repair Crew) . . . . . . . . 213 6.5.1 k-out-of-n Redundancy with Constant Failure and Repair Rates . . . . . . . 214 6.5.2 k-out-of-n Redundancy with Constant Failure and Arbitrary Repair Rates . . . 218 6.6 Simple Series - Parallel Structures (one Repair Crew) . . . . . . . . . . . . . . 220 6.7 Approximate Expressions for Large Series - Parallel Structures. . . . . . . . . . 226 6.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 6.7.2 Application to a Practical Example . . . . . . . . . . . . . . . . . . 230