Table Of ContentPerformance,
Reliability, and
Availability Evaluation
of Computational
Systems, Volume 2
This textbook intends to be a comprehensive and substantially self-contained two-volume
book covering performance, reliability, and availability evaluation subjects. The volumes
focus on computing systems, although the methods may also be applied to other systems.
The first volume covers Chapter 1 to Chapter 14, whose subtitle is “Performance Model-
ing and Background”. The second volume encompasses Chapter 15 to Chapter 25 and has
the subtitle “Reliability and Availability Modeling, Measuring and Workload, and Lifetime
Data Analysis”.
This text is helpful for computer performance professionals for supporting planning, de-
sign, configuring, and tuning the performance, reliability, and availability of computing
systems. Such professionals may use these volumes to get acquainted with specific sub-
jects by looking at the particular chapters. Many examples in the textbook on computing
systems will help them understand the concepts covered in each chapter. The text may also
be helpful for the instructor who teaches performance, reliability, and availability evalua-
tion subjects. Many possible threads could be configured according to the interest of the
audience and the duration of the course. Chapter 1 presents a good number of possible
courses programs that could be organized using this text.
Volume 2 is composed of the last two parts. Part III examines reliability and availability
modeling by covering a set of fundamental notions, definitions, redundancy procedures,
and modeling methods such as Reliability Block Diagrams (RBD) and Fault Trees (FT)
with the respective evaluation methods, adopts Markov chains, Stochastic Petri nets and
even hierarchical and heterogeneous modeling to represent more complex systems. Part
IV discusses performance measurements and reliability data analysis. It first depicts some
basic measuring mechanisms applied in computer systems, then discusses workload gen-
eration. After, we examine failure monitoring and fault injection, and finally, we discuss a
set of techniques for reliability and maintainability data analysis.
Performance,
Reliability, and
Availability Evaluation
of Computational
Systems, Volume 2
Reliability, Availability Modeling,
Measuring, and Data Analysis
Paulo Romero
Martins Maciel
First edition published 2023
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
and by CRC Press
4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
CRC Press is an imprint of Taylor & Francis Group, LLC
© 2023 Taylor & Francis Group, LLC
Reasonable efforts have been made to publish reliable data and information, but the author and pub-
lisher cannot assume responsibility for the validity of all materials or the consequences of their use.
The authors and publishers have attempted to trace the copyright holders of all material reproduced
in this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know so
we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.com
or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. For works that are not available on CCC please contact mpkbookspermissions@tandf.co.uk
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are
used only for identification and explanation without intent to infringe.
ISBN: 978-1-032-30640-7 (hbk)
ISBN: 978-1-032-30642-1 (pbk)
ISBN: 978-1-003-30603-0 (ebk)
DOI: 10.1201/9781003306030
Typeset in Nimbus Roman
by KnowledgeWorks Global Ltd.
Publisher’s note: This book has been prepared from camera-ready copy provided by the authors.
Dedication
TotheOneandTriuneGod,theHolyMysterythatisWhollyLove.
Contents
Preface.....................................................................................................................xiii
Acknowledgement...................................................................................................xv
Chapter15 Introduction.....................................................................................1
PART III Reliability and Availability Modeling
Chapter16 FundamentalsofDependability.....................................................15
16.1 ABriefHistory...................................................................15
16.2 FundamentalConcepts........................................................17
16.3 SomeImportantProbabilityDistributions..........................30
Chapter17 Redundancy...................................................................................43
17.1 HardwareRedundancy........................................................44
17.2 SoftwareRedundancy.........................................................59
Chapter18 ReliabilityBlockDiagram.............................................................63
18.1 ModelsClassification..........................................................63
18.2 BasicComponents..............................................................63
18.3 LogicalandStructureFunctions.........................................65
18.4 CoherentSystem.................................................................70
18.5 Compositions......................................................................70
18.6 SystemRedundancyandComponentRedundancy..........103
18.7 CommonCauseFailure....................................................106
18.8 PathsandCuts...................................................................107
18.9 ImportanceIndices............................................................108
Chapter19 FaultTree.....................................................................................139
19.1 ComponentsofaFaultTree..............................................139
19.2 BasicCompositions..........................................................144
19.3 Compositions....................................................................154
19.4 CommonCauseFailure....................................................167
Chapter20 CombinatorialModelAnalysis....................................................171
20.1 StructureFunctionMethod...............................................171
20.2 EnumerationMethod........................................................172
20.3 FactoringMethod..............................................................180
20.4 Reductions........................................................................184
20.5 Inclusion-ExclusionMethod.............................................189
20.6 SumofDisjointProductsMethod....................................195
20.7 MethodsforEstimatingBounds.......................................200
20.7.1 MethodBasedonInclusionandExclusion........200
vii
viii Contents
20.7.2 MethodBasedontheSumofDisjoint
Products..............................................................203
20.7.3 Min-MaxBoundMethod...................................205
20.7.4 Esary-ProschanMethod.....................................206
20.7.5 Decomposition...................................................207
Chapter21 ModelingAvailability,Reliability,andCapacitywithCTMC....213
21.1 SingleComponent............................................................213
21.2 Hot-StandbyRedundancy.................................................216
21.3 Hot-StandbywithNon-ZeroDelaySwitching.................223
21.4 ImperfectCoverage...........................................................227
21.5 Cold-StandbyRedundancy...............................................233
21.6 Warm-StandbyRedundancy.............................................237
21.7 Active-ActiveRedundancy...............................................243
21.8 ManySimilarMachineswithRepairFacilities................253
21.9 ManySimilarMachineswithSharedRepairFacility.......260
21.10 Phase-TypeDistributionandPreventiveMaintenance.....262
21.11 Two-StatesAvailabilityEquivalentModel.......................265
21.12 CommonCauseFailure....................................................268
Chapter22 ModelingAvailability,Reliability,andCapacitywithSPN........275
22.1 SingleComponent............................................................275
22.2 ModelingTTFandTTRwithPhase-TypeDistribution...277
22.3 Hot-StandbyRedundancy.................................................281
22.4 ImperfectCoverage...........................................................284
22.5 Cold-StandbyRedundancy...............................................289
22.6 Warm-StandbyRedundancy.............................................292
22.7 Active-ActiveRedundancy...............................................295
22.8 KooN Redundancy............................................................297
22.8.1 ModelingMultipleResourcesonMultiple
Servers................................................................299
22.9 CorrectiveMaintenance....................................................305
22.10 PreventiveMaintenance....................................................310
22.11 CommonCauseFailure....................................................317
22.12 SomeAdditionalModels..................................................318
22.12.1 DataCenterDisasterRecovery..........................318
22.12.2 DisasterTolerantCloudSystems.......................325
22.12.3 MHealthSystemInfrastructure..........................333
PART IV Measuring and Data Analysis
Chapter23 PerformanceMeasuring...............................................................347
23.1 BasicConcepts..................................................................347
23.2 MeasurementStrategies....................................................351
23.3 BasicPerformanceMetrics...............................................352
23.4 CountersandTimers.........................................................353
23.5 MeasuringShortTimeIntervals.......................................359
Contents ix
23.6 Profiling............................................................................364
23.6.1 DeterministicProfiling.......................................364
23.6.2 StatisticalProfiling.............................................369
23.7 CountersandBasicPerformanceToolsinLinux.............376
23.7.1 SystemInformation............................................377
23.7.2 ProcessInformation...........................................413
23.8 FinalComments................................................................431
Chapter24 WorkloadCharacterization..........................................................445
24.1 TypesofWorkloads..........................................................446
24.2 WorkloadGeneration........................................................451
24.2.1 Benchmarks........................................................451
24.2.2 SyntheticOperationalWorkloadGeneration.....468
24.3 WorkloadModeling..........................................................477
24.3.1 ModelingWorkloadImpact...............................478
24.3.2 ModelingIntendedWorkload............................493
Chapter25 LifetimeDataAnalysis................................................................527
25.1 Introduction.......................................................................527
25.1.1 ReliabilityDataSources....................................528
25.1.2 Censoring...........................................................532
25.2 Non-ParametricMethods..................................................534
25.2.1 UngroupedCompleteDataMethod...................535
25.2.2 GroupedCompleteDataMethod.......................539
25.2.3 UngroupedMultiplyCensoredDataMethod....542
25.2.4 Kaplan-MeierMethod........................................545
25.3 ParametricMethods..........................................................557
25.3.1 GraphicalMethods.............................................558
25.3.2 MethodofMoments...........................................569
25.3.3 MaximumLikelihoodEstimation......................577
25.3.4 ConfidenceIntervals..........................................595
Chapter26 FaultInjectionandFailureMonitoring.......................................613
26.1 FaultAcceleration.............................................................614
26.2 SomeNotableFaultInjectionTools.................................625
26.3 Software-BasedFaultInjection........................................632
Bibliography..........................................................................................................641
AppendixA MTTF2oo5.................................................................................681
AppendixB Whetsone.....................................................................................683
AppendixC Linpack Bench............................................................................693
AppendixD LivermoreLoops.........................................................................717
AppendixE MMP-CTMCTraceGenerator..................................................729