Y L F M A E T Team-Fly® Software Fault Tolerance Techniques and Implementation LimitsofLiabilityandDisclaimerofWarranty Every reasonable attempt has been made to ensure the accuracy, complete- ness,andcorrectnessoftheinformationcontainedinthisbookatthetimeof writing.However,neithertheauthornorthepublisher,ArtechHouse,Inc., shall be responsible or liable in negligence or otherwise, in respect to any inaccuracyoromissionherein.Theauthorandthepublishermakenorepre- sentation that this information is suitable for every application to which a reader may attempt to apply the information. Many of the techniques and theories are still subject to academic debate. The author and Artech House makenowarrantyofanykind,expressedorimplied,includingwarrantiesof fitness for a particular purpose, with regard to the information contained in thisbook,allofwhichisprovidedìasis.îWithoutderogatingfromthegen- erality of the foregoing, neither the author nor the publisher shall be liable for any direct, indirect, incidental, or consequential damages or loss caused byorarisingfromanyinformationoradvice,inaccuracy,oromissionherein. This work is published with the understanding that the author and Artech Housearesupplyinginformation,butarenotattemptingtorenderengineer- ingjudgmentorotherprofessionalservices. ForacompletelistingoftheArtechHouseComputingLibrary, turntothebackofthisbook. Software Fault Tolerance Techniques and Implementation Laura L. Pullum Artech House Boston London www.artechhouse.com Library of Congress Cataloging-in-Publication Data Pullum, Laura. Software fault tolerance techniques and implementation / Laura Pullum. p. cm. - (Artech House computing library) Includes bibliographical references and index. ISBN 1-58053-137-7 (alk. paper) 1. Fault-tolerant computing. 2. Computer software-Reliability. I. Title. II. Series. QA76.9.F38 P85 2001 005.1-dc21 2001035915 British Library Cataloguing in Publication Data Pullum, Laura Software fault tolerance techniques and implementation. - (Artech House computing library) 1. Computer software-Development2. Software failures I. Title 005.1’2 ISBN1-58053-470-8 Cover design by Igor Valdman © 2001 ARTECH HOUSE, INC. 685 Canton Street Norwood, MA 02062 All rights reserved. Printed and bound in the United States of America. No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, in- cluding photocopying, recording, or by any information storage and retrieval system, with out permission in writing from the publisher. All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized. Artech House cannot attest to the accuracy of this informa tion. Use of a term in this book should not be regarded as affecting the validity of any trade mark or service mark. International Standard Book Number: 1-58053-137-7 Library of Congress Catalog Card Number: 2001035915 10 9 8 7 6 5 4 3 2 1 Contents Preface xi Acknowledgments xiii 1 Introduction 1 1.1 A Few Definitions 3 1.2 Organization and Intended Use 4 1.3 Means to Achieve Dependable Software 6 1.3.1 Fault Avoidance or Prevention 7 1.3.2 Fault Removal 9 1.3.3 Fault/Failure Forecasting 11 1.3.4 Fault Tolerance 12 1.4 Types of Recovery 13 1.4.1 Backward Recovery 14 1.4.2 Forward Recovery 16 1.5 Types of Redundancy for Software Fault Tolerance 18 1.5.1 Software Redundancy 18 v vi SoftwareFaultToleranceTechniquesandImplementation 1.5.2 InformationorDataRedundancy 19 1.5.3 TemporalRedundancy 21 1.6 Summary 21 References 23 2 StructuringRedundancyforSoftwareFault Tolerance 25 2.1 RobustSoftware 27 2.2 DesignDiversity 29 2.2.1 CaseStudiesandExperimentsinDesignDiversity 31 2.2.2 LevelsofDiversityandFaultToleranceApplication 33 2.2.3 FactorsInfluencingDiversity 34 2.3 DataDiversity 35 2.3.1 OverviewofDataRe-expression 37 2.3.2 OutputTypesandRelatedDataRe-expression 38 2.3.3 ExampleDataRe-expressionAlgorithms 40 2.4 TemporalDiversity 42 2.5 ArchitecturalStructureforDiverseSoftware 44 2.6 StructureforDevelopmentofDiverseSoftware 44 2.6.1 XuandRandellFramework 45 2.6.2 Daniels,Kim,andVoukFramework 51 2.7 Summary 53 References 53 3 DesignMethods,ProgrammingTechniques, andIssues 59 3.1 ProblemsandIssues 59 Contents vii 3.1.1 SimilarErrorsandaLackofDiversity 60 3.1.2 ConsistentComparisonProblem 62 3.1.3 DominoEffect 68 3.1.4 Overhead 70 3.2 ProgrammingTechniques 76 3.2.1 Assertions 78 3.2.2 Checkpointing 80 3.2.3 AtomicActions 84 3.3 DependableSystemDevelopmentModeland N-VersionSoftwareParadigm 88 3.3.1 DesignConsiderations 88 3.3.2 DependableSystemDevelopmentModel 91 3.3.3 DesignParadigmforN-VersionProgramming 93 3.4 Summary 94 References 97 4 DesignDiverseSoftwareFaultTolerance Techniques 105 4.1 RecoveryBlocks 106 4.1.1 RecoveryBlockOperation 107 4.1.2 RecoveryBlockExample 113 4.1.3 RecoveryBlockIssuesandDiscussion 115 4.2 N-VersionProgramming 120 4.2.1 N-VersionProgrammingOperation 121 4.2.2 N-VersionProgrammingExample 125 4.2.3 N-VersionProgrammingIssuesandDiscussion 127 4.3 DistributedRecoveryBlocks 132 4.3.1 DistributedRecoveryBlockOperation 132 4.3.2 DistributedRecoveryBlockExample 137 4.3.3 DistributedRecoveryBlockIssuesandDiscussion 139 viii SoftwareFaultToleranceTechniquesandImplementation 4.4 NSelf-CheckingProgramming 144 4.4.1 NSelf-CheckingProgrammingOperation 144 4.4.2 NSelf-CheckingProgrammingExample 145 4.4.3 NSelf-CheckingProgrammingIssuesandDiscussion 149 4.5 ConsensusRecoveryBlock 152 4.5.1 ConsensusRecoveryBlockOperation 152 4.5.2 ConsensusRecoveryBlockExample 155 4.5.3 ConsensusRecoveryBlockIssuesandDiscussion 159 4.6 AcceptanceVoting 162 4.6.1 AcceptanceVotingOperation 162 4.6.2 AcceptanceVotingExample 166 4.6.3 AcceptanceVotingIssuesandDiscussion 169 4.7 TechniqueComparisons 172 4.7.1 N-VersionProgrammingandRecoveryBlock TechniqueComparisons 176 4.7.2 RecoveryBlockandDistributedRecoveryBlock TechniqueComparisons 180 4.7.3 ConsensusRecoveryBlock,RecoveryBlock Technique,andN-VersionProgramming Comparisons 181 4.7.4 AcceptanceVoting,ConsensusRecoveryBlock, RecoveryBlockTechnique,andN-Version ProgrammingComparisons 182 References 183 5 DataDiverseSoftwareFaultToleranceTechniques 191 5.1 RetryBlocks 192 5.1.1 RetryBlockOperation 193 5.1.2 RetryBlockExample 202 5.1.3 RetryBlockIssuesandDiscussion 204 5.2 N-CopyProgramming 207 Contents ix 5.2.1 N-CopyProgrammingOperation 208 5.2.2 N-CopyProgrammingExample 212 5.2.3 N-CopyProgrammingIssuesandDiscussion 214 5.3 Two-PassAdjudicators 218 5.3.1 Two-PassAdjudicatorOperation 218 5.3.2 Two-PassAdjudicatorsandMultipleCorrectResults 223 5.3.3 Two-PassAdjudicatorExample 227 5.3.4 Two-PassAdjudicatorIssuesandDiscussion 229 5.4 Summary 232 References 233 6 OtherSoftwareFaultToleranceTechniques 235 6.1 N-VersionProgrammingVariants 235 6.1.1 N-VersionProgrammingwithTie-Breakerand AcceptanceTestOperation 236 6.1.2 N-VersionProgrammingwithTie-Breakerand AcceptanceTestExample 241 6.2 ResourcefulSystems 244 6.3 Data-DrivenDependabilityAssuranceScheme 247 6.4 Self-ConfiguringOptimalProgramming 253 6.4.1 Self-ConfiguringOptimalProgrammingOperation 253 6.4.2 Self-ConfiguringOptimalProgrammingExample 257 6.4.3 Self-ConfiguringOptimalProgrammingIssuesand Discussion 260 6.5 OtherTechniques 262 6.6 Summary 262 References 265
Description: