ebook img

Performance, Reliability, and Availability Evaluation of Computational Systems, Volume 2: Reliability, Availability Modeling PDF

748 Pages·2023·27.882 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Performance, Reliability, and Availability Evaluation of Computational Systems, Volume 2: Reliability, Availability Modeling

Performance, Reliability, and Availability Evaluation of Computational Systems, Volume 2 This textbook intends to be a comprehensive and substantially self-contained two-volume book covering performance, reliability, and availability evaluation subjects. The volumes focus on computing systems, although the methods may also be applied to other systems. The first volume covers Chapter 1 to Chapter 14, whose subtitle is “Performance Model- ing and Background”. The second volume encompasses Chapter 15 to Chapter 25 and has the subtitle “Reliability and Availability Modeling, Measuring and Workload, and Lifetime Data Analysis”. This text is helpful for computer performance professionals for supporting planning, de- sign, configuring, and tuning the performance, reliability, and availability of computing systems. Such professionals may use these volumes to get acquainted with specific sub- jects by looking at the particular chapters. Many examples in the textbook on computing systems will help them understand the concepts covered in each chapter. The text may also be helpful for the instructor who teaches performance, reliability, and availability evalua- tion subjects. Many possible threads could be configured according to the interest of the audience and the duration of the course. Chapter 1 presents a good number of possible courses programs that could be organized using this text. Volume 2 is composed of the last two parts. Part III examines reliability and availability modeling by covering a set of fundamental notions, definitions, redundancy procedures, and modeling methods such as Reliability Block Diagrams (RBD) and Fault Trees (FT) with the respective evaluation methods, adopts Markov chains, Stochastic Petri nets and even hierarchical and heterogeneous modeling to represent more complex systems. Part IV discusses performance measurements and reliability data analysis. It first depicts some basic measuring mechanisms applied in computer systems, then discusses workload gen- eration. After, we examine failure monitoring and fault injection, and finally, we discuss a set of techniques for reliability and maintainability data analysis. Performance, Reliability, and Availability Evaluation of Computational Systems, Volume 2 Reliability, Availability Modeling, Measuring, and Data Analysis Paulo Romero Martins Maciel First edition published 2023 by CRC Press 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742 and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN CRC Press is an imprint of Taylor & Francis Group, LLC © 2023 Taylor & Francis Group, LLC Reasonable efforts have been made to publish reliable data and information, but the author and pub- lisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information stor- age or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact [email protected] Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. ISBN: 978-1-032-30640-7 (hbk) ISBN: 978-1-032-30642-1 (pbk) ISBN: 978-1-003-30603-0 (ebk) DOI: 10.1201/9781003306030 Typeset in Nimbus Roman by KnowledgeWorks Global Ltd. Publisher’s note: This book has been prepared from camera-ready copy provided by the authors. Dedication TotheOneandTriuneGod,theHolyMysterythatisWhollyLove. Contents Preface.....................................................................................................................xiii Acknowledgement...................................................................................................xv Chapter15 Introduction.....................................................................................1 PART III Reliability and Availability Modeling Chapter16 FundamentalsofDependability.....................................................15 16.1 ABriefHistory...................................................................15 16.2 FundamentalConcepts........................................................17 16.3 SomeImportantProbabilityDistributions..........................30 Chapter17 Redundancy...................................................................................43 17.1 HardwareRedundancy........................................................44 17.2 SoftwareRedundancy.........................................................59 Chapter18 ReliabilityBlockDiagram.............................................................63 18.1 ModelsClassification..........................................................63 18.2 BasicComponents..............................................................63 18.3 LogicalandStructureFunctions.........................................65 18.4 CoherentSystem.................................................................70 18.5 Compositions......................................................................70 18.6 SystemRedundancyandComponentRedundancy..........103 18.7 CommonCauseFailure....................................................106 18.8 PathsandCuts...................................................................107 18.9 ImportanceIndices............................................................108 Chapter19 FaultTree.....................................................................................139 19.1 ComponentsofaFaultTree..............................................139 19.2 BasicCompositions..........................................................144 19.3 Compositions....................................................................154 19.4 CommonCauseFailure....................................................167 Chapter20 CombinatorialModelAnalysis....................................................171 20.1 StructureFunctionMethod...............................................171 20.2 EnumerationMethod........................................................172 20.3 FactoringMethod..............................................................180 20.4 Reductions........................................................................184 20.5 Inclusion-ExclusionMethod.............................................189 20.6 SumofDisjointProductsMethod....................................195 20.7 MethodsforEstimatingBounds.......................................200 20.7.1 MethodBasedonInclusionandExclusion........200 vii viii Contents 20.7.2 MethodBasedontheSumofDisjoint Products..............................................................203 20.7.3 Min-MaxBoundMethod...................................205 20.7.4 Esary-ProschanMethod.....................................206 20.7.5 Decomposition...................................................207 Chapter21 ModelingAvailability,Reliability,andCapacitywithCTMC....213 21.1 SingleComponent............................................................213 21.2 Hot-StandbyRedundancy.................................................216 21.3 Hot-StandbywithNon-ZeroDelaySwitching.................223 21.4 ImperfectCoverage...........................................................227 21.5 Cold-StandbyRedundancy...............................................233 21.6 Warm-StandbyRedundancy.............................................237 21.7 Active-ActiveRedundancy...............................................243 21.8 ManySimilarMachineswithRepairFacilities................253 21.9 ManySimilarMachineswithSharedRepairFacility.......260 21.10 Phase-TypeDistributionandPreventiveMaintenance.....262 21.11 Two-StatesAvailabilityEquivalentModel.......................265 21.12 CommonCauseFailure....................................................268 Chapter22 ModelingAvailability,Reliability,andCapacitywithSPN........275 22.1 SingleComponent............................................................275 22.2 ModelingTTFandTTRwithPhase-TypeDistribution...277 22.3 Hot-StandbyRedundancy.................................................281 22.4 ImperfectCoverage...........................................................284 22.5 Cold-StandbyRedundancy...............................................289 22.6 Warm-StandbyRedundancy.............................................292 22.7 Active-ActiveRedundancy...............................................295 22.8 KooN Redundancy............................................................297 22.8.1 ModelingMultipleResourcesonMultiple Servers................................................................299 22.9 CorrectiveMaintenance....................................................305 22.10 PreventiveMaintenance....................................................310 22.11 CommonCauseFailure....................................................317 22.12 SomeAdditionalModels..................................................318 22.12.1 DataCenterDisasterRecovery..........................318 22.12.2 DisasterTolerantCloudSystems.......................325 22.12.3 MHealthSystemInfrastructure..........................333 PART IV Measuring and Data Analysis Chapter23 PerformanceMeasuring...............................................................347 23.1 BasicConcepts..................................................................347 23.2 MeasurementStrategies....................................................351 23.3 BasicPerformanceMetrics...............................................352 23.4 CountersandTimers.........................................................353 23.5 MeasuringShortTimeIntervals.......................................359 Contents ix 23.6 Profiling............................................................................364 23.6.1 DeterministicProfiling.......................................364 23.6.2 StatisticalProfiling.............................................369 23.7 CountersandBasicPerformanceToolsinLinux.............376 23.7.1 SystemInformation............................................377 23.7.2 ProcessInformation...........................................413 23.8 FinalComments................................................................431 Chapter24 WorkloadCharacterization..........................................................445 24.1 TypesofWorkloads..........................................................446 24.2 WorkloadGeneration........................................................451 24.2.1 Benchmarks........................................................451 24.2.2 SyntheticOperationalWorkloadGeneration.....468 24.3 WorkloadModeling..........................................................477 24.3.1 ModelingWorkloadImpact...............................478 24.3.2 ModelingIntendedWorkload............................493 Chapter25 LifetimeDataAnalysis................................................................527 25.1 Introduction.......................................................................527 25.1.1 ReliabilityDataSources....................................528 25.1.2 Censoring...........................................................532 25.2 Non-ParametricMethods..................................................534 25.2.1 UngroupedCompleteDataMethod...................535 25.2.2 GroupedCompleteDataMethod.......................539 25.2.3 UngroupedMultiplyCensoredDataMethod....542 25.2.4 Kaplan-MeierMethod........................................545 25.3 ParametricMethods..........................................................557 25.3.1 GraphicalMethods.............................................558 25.3.2 MethodofMoments...........................................569 25.3.3 MaximumLikelihoodEstimation......................577 25.3.4 ConfidenceIntervals..........................................595 Chapter26 FaultInjectionandFailureMonitoring.......................................613 26.1 FaultAcceleration.............................................................614 26.2 SomeNotableFaultInjectionTools.................................625 26.3 Software-BasedFaultInjection........................................632 Bibliography..........................................................................................................641 AppendixA MTTF2oo5.................................................................................681 AppendixB Whetsone.....................................................................................683 AppendixC Linpack Bench............................................................................693 AppendixD LivermoreLoops.........................................................................717 AppendixE MMP-CTMCTraceGenerator..................................................729

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.