Table Of ContentY
L
F
M
A
E
T
Team-Fly®
Software Fault Tolerance Techniques
and Implementation
LimitsofLiabilityandDisclaimerofWarranty
Every reasonable attempt has been made to ensure the accuracy, complete-
ness,andcorrectnessoftheinformationcontainedinthisbookatthetimeof
writing.However,neithertheauthornorthepublisher,ArtechHouse,Inc.,
shall be responsible or liable in negligence or otherwise, in respect to any
inaccuracyoromissionherein.Theauthorandthepublishermakenorepre-
sentation that this information is suitable for every application to which a
reader may attempt to apply the information. Many of the techniques and
theories are still subject to academic debate. The author and Artech House
makenowarrantyofanykind,expressedorimplied,includingwarrantiesof
fitness for a particular purpose, with regard to the information contained in
thisbook,allofwhichisprovidedìasis.îWithoutderogatingfromthegen-
erality of the foregoing, neither the author nor the publisher shall be liable
for any direct, indirect, incidental, or consequential damages or loss caused
byorarisingfromanyinformationoradvice,inaccuracy,oromissionherein.
This work is published with the understanding that the author and Artech
Housearesupplyinginformation,butarenotattemptingtorenderengineer-
ingjudgmentorotherprofessionalservices.
ForacompletelistingoftheArtechHouseComputingLibrary,
turntothebackofthisbook.
Software Fault Tolerance Techniques
and Implementation
Laura L. Pullum
Artech House
Boston London
www.artechhouse.com
Library of Congress Cataloging-in-Publication Data
Pullum, Laura.
Software fault tolerance techniques and implementation / Laura Pullum.
p. cm. - (Artech House computing library)
Includes bibliographical references and index.
ISBN 1-58053-137-7 (alk. paper)
1. Fault-tolerant computing. 2. Computer software-Reliability.
I. Title. II. Series.
QA76.9.F38 P85
2001 005.1-dc21
2001035915
British Library Cataloguing in Publication Data
Pullum, Laura
Software fault tolerance techniques and implementation. -
(Artech House computing library)
1. Computer software-Development2. Software failures
I. Title
005.1’2
ISBN1-58053-470-8
Cover design by Igor Valdman
© 2001 ARTECH HOUSE,
INC. 685 Canton Street
Norwood, MA 02062
All rights reserved. Printed and bound in the United States of America. No part of this book
may be reproduced or utilized in any form or by any means, electronic or mechanical, in-
cluding photocopying, recording, or by any information storage and retrieval system, with
out permission in writing from the publisher.
All terms mentioned in this book that are known to be trademarks or service marks have
been appropriately capitalized. Artech House cannot attest to the accuracy of this informa
tion. Use of a term in this book should not be regarded as affecting the validity of any trade
mark or service mark.
International Standard Book Number: 1-58053-137-7
Library of Congress Catalog Card Number: 2001035915
10 9 8 7 6 5 4 3 2 1
Contents
Preface xi
Acknowledgments xiii
1 Introduction 1
1.1 A Few Definitions 3
1.2 Organization and Intended Use 4
1.3 Means to Achieve Dependable Software 6
1.3.1 Fault Avoidance or Prevention 7
1.3.2 Fault Removal 9
1.3.3 Fault/Failure Forecasting 11
1.3.4 Fault Tolerance 12
1.4 Types of Recovery 13
1.4.1 Backward Recovery 14
1.4.2 Forward Recovery 16
1.5 Types of Redundancy for Software Fault Tolerance 18
1.5.1 Software Redundancy 18
v
vi SoftwareFaultToleranceTechniquesandImplementation
1.5.2 InformationorDataRedundancy 19
1.5.3 TemporalRedundancy 21
1.6 Summary 21
References 23
2 StructuringRedundancyforSoftwareFault
Tolerance 25
2.1 RobustSoftware 27
2.2 DesignDiversity 29
2.2.1 CaseStudiesandExperimentsinDesignDiversity 31
2.2.2 LevelsofDiversityandFaultToleranceApplication 33
2.2.3 FactorsInfluencingDiversity 34
2.3 DataDiversity 35
2.3.1 OverviewofDataRe-expression 37
2.3.2 OutputTypesandRelatedDataRe-expression 38
2.3.3 ExampleDataRe-expressionAlgorithms 40
2.4 TemporalDiversity 42
2.5 ArchitecturalStructureforDiverseSoftware 44
2.6 StructureforDevelopmentofDiverseSoftware 44
2.6.1 XuandRandellFramework 45
2.6.2 Daniels,Kim,andVoukFramework 51
2.7 Summary 53
References 53
3 DesignMethods,ProgrammingTechniques,
andIssues 59
3.1 ProblemsandIssues 59
Contents vii
3.1.1 SimilarErrorsandaLackofDiversity 60
3.1.2 ConsistentComparisonProblem 62
3.1.3 DominoEffect 68
3.1.4 Overhead 70
3.2 ProgrammingTechniques 76
3.2.1 Assertions 78
3.2.2 Checkpointing 80
3.2.3 AtomicActions 84
3.3 DependableSystemDevelopmentModeland
N-VersionSoftwareParadigm 88
3.3.1 DesignConsiderations 88
3.3.2 DependableSystemDevelopmentModel 91
3.3.3 DesignParadigmforN-VersionProgramming 93
3.4 Summary 94
References 97
4 DesignDiverseSoftwareFaultTolerance
Techniques 105
4.1 RecoveryBlocks 106
4.1.1 RecoveryBlockOperation 107
4.1.2 RecoveryBlockExample 113
4.1.3 RecoveryBlockIssuesandDiscussion 115
4.2 N-VersionProgramming 120
4.2.1 N-VersionProgrammingOperation 121
4.2.2 N-VersionProgrammingExample 125
4.2.3 N-VersionProgrammingIssuesandDiscussion 127
4.3 DistributedRecoveryBlocks 132
4.3.1 DistributedRecoveryBlockOperation 132
4.3.2 DistributedRecoveryBlockExample 137
4.3.3 DistributedRecoveryBlockIssuesandDiscussion 139
viii SoftwareFaultToleranceTechniquesandImplementation
4.4 NSelf-CheckingProgramming 144
4.4.1 NSelf-CheckingProgrammingOperation 144
4.4.2 NSelf-CheckingProgrammingExample 145
4.4.3 NSelf-CheckingProgrammingIssuesandDiscussion 149
4.5 ConsensusRecoveryBlock 152
4.5.1 ConsensusRecoveryBlockOperation 152
4.5.2 ConsensusRecoveryBlockExample 155
4.5.3 ConsensusRecoveryBlockIssuesandDiscussion 159
4.6 AcceptanceVoting 162
4.6.1 AcceptanceVotingOperation 162
4.6.2 AcceptanceVotingExample 166
4.6.3 AcceptanceVotingIssuesandDiscussion 169
4.7 TechniqueComparisons 172
4.7.1 N-VersionProgrammingandRecoveryBlock
TechniqueComparisons 176
4.7.2 RecoveryBlockandDistributedRecoveryBlock
TechniqueComparisons 180
4.7.3 ConsensusRecoveryBlock,RecoveryBlock
Technique,andN-VersionProgramming
Comparisons 181
4.7.4 AcceptanceVoting,ConsensusRecoveryBlock,
RecoveryBlockTechnique,andN-Version
ProgrammingComparisons 182
References 183
5 DataDiverseSoftwareFaultToleranceTechniques 191
5.1 RetryBlocks 192
5.1.1 RetryBlockOperation 193
5.1.2 RetryBlockExample 202
5.1.3 RetryBlockIssuesandDiscussion 204
5.2 N-CopyProgramming 207
Contents ix
5.2.1 N-CopyProgrammingOperation 208
5.2.2 N-CopyProgrammingExample 212
5.2.3 N-CopyProgrammingIssuesandDiscussion 214
5.3 Two-PassAdjudicators 218
5.3.1 Two-PassAdjudicatorOperation 218
5.3.2 Two-PassAdjudicatorsandMultipleCorrectResults 223
5.3.3 Two-PassAdjudicatorExample 227
5.3.4 Two-PassAdjudicatorIssuesandDiscussion 229
5.4 Summary 232
References 233
6 OtherSoftwareFaultToleranceTechniques 235
6.1 N-VersionProgrammingVariants 235
6.1.1 N-VersionProgrammingwithTie-Breakerand
AcceptanceTestOperation 236
6.1.2 N-VersionProgrammingwithTie-Breakerand
AcceptanceTestExample 241
6.2 ResourcefulSystems 244
6.3 Data-DrivenDependabilityAssuranceScheme 247
6.4 Self-ConfiguringOptimalProgramming 253
6.4.1 Self-ConfiguringOptimalProgrammingOperation 253
6.4.2 Self-ConfiguringOptimalProgrammingExample 257
6.4.3 Self-ConfiguringOptimalProgrammingIssuesand
Discussion 260
6.5 OtherTechniques 262
6.6 Summary 262
References 265
Description:Software fault tolerance techniques and implementation / Laura Pullum. p. cm. - (
Artech House computing library). Includes bibliographical references and index.