Table Of Content

Igor Schagaev · Eugene Zouev · Kaegi Thomas Software Design for Resilient Computer Systems Second Edition Software Design for Resilient Computer Systems Igor Schagaev Eugene Zouev (cid:129) (cid:129) Kaegi Thomas Software Design for Resilient Computer Systems Second Edition 123 Igor Schagaev Eugene Zouev IT-ACS Ltd Department ofInformatics Stevenage,UK Technopolis Innopolis,Kazan, Russia Kaegi Thomas IT-ACS Ltd Stevenage,UK ISBN978-3-030-21243-8 ISBN978-3-030-21244-5 (eBook) https://doi.org/10.1007/978-3-030-21244-5 1stedition:©SpringerInternationalPublishingSwitzerland2016 2ndedition:©SpringerNatureSwitzerlandAG2020 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfrom therelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained hereinorforanyerrorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregard tojurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface WhatIconsidertobethestrongestpointofthiswork,indeeditsmainadvantageis theextensionofthewinningstrategyofmilitarypilotstoamultimodalcomplex:“If your action leads to the unexpected, step back and play anew.” To recognize that thereisahierarchyofresponseoptionsandthenchoosetheleastobvioussequence, yettheoneleadingtosurvivalisahumanmiracle.Toapplythisconsistentlytoan array of systems run by different algorithms is new engineering. Contrary to our experienceswithautonomousandsemi-autonomousmultimodularsystems,wecan avoid creating irreconcilable paradoxes (shutdowns or tonal failures.) In fact, they can be resisted if our causational design logic is augmented (or replaced) with an interactive-transformation-interactive approach. This changes our thinking from compensating for some top-down hierarchy of (event) causes to enabling response “negotiation” across and between all system modules. One way to view the ResilientSystemTheoryistorecognizethatitcanresistunacceptableoutcomesby negotiating multiple options to resolve multiple conflicts at multiple levels. The introduction of “system resilience” requires nonlinear logic and redistributed capacityforflexiblecoordinationandre-coordinationofinternalregimeconditions and parameters. From this perspective, the survival of a multimodal complex is achieved not by insisting on the maximum recovery from losses in or of its key component(s) but on achieving a total system response and behavior with at least minimally optimal integration and recovery under a variety of disorienting, dis- abling,ordysfunctionalconditions.TheresiliencetheoryofProf.Schagaevandhis colleagues promises to integrate this conceptual framework into a radically pur- poseful engineering-design framework. Boris Gorbis Los Angeles. Stevenage, UK Igor Schagaev Kazan, Russia Eugene Zouev Kaegi Thomas v Introduction for Second Edition When in 1989 an anonymous reviewer commented on my short paper that “this classification should be extended to description of distributed systems,” (Yet another approach to classification of redundancy, CIM IMEKO Symposium 1990, Helsinki, pp. 117–124) I was really excited, because people in the research com- munitywerethinkingmuchdeeperandwiderthanmyself(-Ihadjustdefendedmy Ph.D.). Further,faulttolerancewasmigratingtodependability(JeanClaudeLapriewas an indisputable authority and expert in this domain, see more www.springer.com/ gb/book/9783709191729, which later emerged as the concept of resilience. Inprinciple,allthesenewpropertieshadconcretereasoningandmeaningbehind them:whensomethingerroneoushappens,anysystemofourdesignshouldbeable to cope with the problem. Options vary, as well as circumstances and area of application, thus: – If it stops the error propagating and freezes in a safe state, it is fail-stop, or fail-safe; – If it can cope with permanent faults inside the system, it is a fault-tolerant system; – When it continues with reduced functionality, it is graceful degradation; – If it is designed with attention having been paid to reliability, availability, and maintenance or serviceability, it is dependable system; – Ifitiscapableoftoleratingobstaclescausedbyinternalandexternalfactorsand can spring back, recover, and continue, then a system can be considered as resilient. There are two major ways to achieve any of the properties mentioned above: at system level or at local level (technological). Obviously, any reasonable combi- nation of both levels is also welcome. We do not want to repeat our papers and books (https://www.springer.com/gb/book/9783319150680, https://www.springer. com/gb/book/9783319468129) but to incorporate into the second edition any sig- nificant progress that has emerged. vii viii IntroductionforSecondEdition Speaking about ICT systems, especially safety-critical and real-time ones, we might think about the implementation of resilience from the system level down through to hardware and systems software. In addition, we need to consider that each of the parts will both interact with and support each other. Non-Functional Requirements (NFRs) of each part of the system were considered, such as: – Performance; – Reliability; – Efficiency (mostly energy efficient). Therefore, the systems that we design should be PRE-smart and provide these properties throughout the life cycle. Neither ourbooks to date—haveappeared as complete. These books have been usedinChina,Switzerland,Russia,andUSA(mostlyMastersandPh.D.students), and we have received substantial feedback, such as: (cid:129) While reliability of hardware and availability at the system level are explained andfine,therearenosections,orchaptersaboutperformance,especiallywhere parallel and distributed systems are concerned; (cid:129) How to apply (as mentioned in the above review) the classification and properties of resilience for and within distributed systems; (cid:129) How real-time andsafety-critical applications shouldbetreated consideringthe system resilience: rules for system and for packages—have they changed? It was especially satisfying when we discovered that these segments are being updated by researchers around the globe, providing excellent contributions to the content. Thus, our book became an evolving system in itself, aggregating our further efforts with the efforts and results of our colleagues from China, Switzerland, UK, and Russia. Our book has therefore become itself resilient, benefiting from the contributions from the following: Performance chapter (including element-level performance and parallel design) was prepared and included using materials and having contributions from: – Professor Hao Kai, Shantou University, China; – Simon Monkman, IT-ACS Ltd researcher. System software chapters were part of substantial efforts from: – Professor Eugeny Zuev and his team in Technopolis, Kazan, Russia. In turn, requested in 1989 consideration of system level of resilience for distributed systems were developed as two chapters: system level and algorithmic implementation prepared by me and Stephen Farrell. In these chapters, we have introduced a concept of desperation (for transactions within distributed systems) andshowthatourexistingandnewresults,evenpatented:https://www.ipo.gov.uk/ p-ipsum/Case/PublicationNumber/GB2448351canbeextremelyusefulmakingthe IntroductionforSecondEdition ix whole network really resilient and achieving by far better service for applications, especially when critical level of their use was assumed. The structure of the book now looks like figure below illustrates: FAULT TOLERANCE PROPOSED SYSTEM SOFTWARE AND HARDWARE Theory and Concept Generalized Algorithm of Fault Tolerance (GAFT) Proposed FT Hardware Run Time ERRIC System Structure GAFT Extension Hardware Hardware Active Safety Reliability and System System Performance Proposed Language Support Hardware Comparison Recovery Implementation SSW Functions of Language and Features Support Analysis SSW Recovery Preparation Testing, Checking and Recovery HW Support Algorithms Resilience and Recovery Desperation: Algorithms Distributed Systems Analysis Resilience and Desperation: Implementation SYSTEM SOFTWARE FOR FAULT TOLERANCE FUTURE: RESILIENCE OF DISTRIBUTED SYSTEMS Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Hardware Faults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Single Event Effects and Other Deviations . . . . . . . . . . . . . . . 9 2.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Fault Tolerance: Theory and Concepts. . . . . . . . . . . . . . . . . . . . . . 11 3.1 Introduction to Reliability Theory . . . . . . . . . . . . . . . . . . . . . 11 3.2 Connection Between Reliability and Fault Tolerance. . . . . . . . 13 3.3 Models for Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4 Chapter Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4 Generalized Algorithm of Fault Tolerance (GAFT) . . . . . . . . . . . . 25 4.1 The Generalized Algorithm of Fault Tolerance . . . . . . . . . . . . 26 4.2 Definition of Fault Tolerance by GAFT . . . . . . . . . . . . . . . . . 30 4.3 Example of Possible GAFT Implementation . . . . . . . . . . . . . . 31 4.4 GAFT Properties: Performance, Reliability, Coverage . . . . . . . 33 4.5 Reliability Evaluation for Fault Tolerance. . . . . . . . . . . . . . . . 37 4.6 Hardware Redundancy and Reliability . . . . . . . . . . . . . . . . . . 38 4.6.1 Hardware Redundancy: Reliability Analysis. . . . . . . 39 4.7 Conceptual Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5 GAFT Generalization: A Principle and Model of Active System Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.1 GAFT Extension: The Method of Active System Safety . . . . . 49 5.2 GAFT Derivation: A Principle of Active System Safety . . . . . 49 xi xii Contents 5.3 Dependency Matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.4 Recovery Matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.5 PASS Tracing Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.5.1 Forward Tracing Algorithm. . . . . . . . . . . . . . . . . . . 53 5.5.2 Backward Tracing Algorithm. . . . . . . . . . . . . . . . . . 55 5.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6 System Software Support for Hardware Deficiency: Functions and Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.1 System Software Life Cycle Versus Fault Tolerance . . . . . . . . 66 6.2 System Software Phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 7 Testing, Checking, and Hardware Syndrome . . . . . . . . . . . . . . . . . 71 7.1 Hardware-Checking Process. . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.2 Analysis of Checking Process . . . . . . . . . . . . . . . . . . . . . . . . 75 7.2.1 The System Model . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.2.2 Diagnostic Process Algorithm . . . . . . . . . . . . . . . . . 77 7.2.3 Procedure T1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.2.4 Extension of the Diagnostic Procedure. . . . . . . . . . . 81 7.2.5 Testing of Time-Sharing Systems . . . . . . . . . . . . . . 83 7.2.6 FT Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 7.3 System Monitoring of Checking Process: A Syndrome . . . . . . 88 7.3.1 Access and Location of the Syndrome . . . . . . . . . . . 92 7.3.2 Memory Configuration . . . . . . . . . . . . . . . . . . . . . . 94 7.3.3 Interfacing Zone: The Syndrome as Memory Configuration Mechanism . . . . . . . . . . . . . . . . . . . . 96 7.3.4 Graceful Degradation Approach and Implementation . . . . . . . . . . . . . . . . . . . . . . . . 97 7.3.5 Reconfiguration of Other Hardware Devices. . . . . . . 100 7.4 Software Support for Hardware Reconfiguration . . . . . . . . . . . 101 7.4.1 Software Support for Degradation . . . . . . . . . . . . . . 101 7.4.2 Hardware Condition Monitor. . . . . . . . . . . . . . . . . . 103 7.4.3 Hardware Condition Monitor—System Software Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.5 Hardware Reconfiguration Outlook . . . . . . . . . . . . . . . . . . . . 108 7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 8 Recovery Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 8.1 Runtime System Support for Fault Tolerance and Reconfigurability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 8.2 Overview of Existing Backward Recovery Techniques . . . . . . 114

Software Design for Resilient Computer Systems PDF

315 Pages·2020·11.44 MB·English

by Igor Schagaev, Eugene Zouev, Kaegi Thomas

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Software Design for Resilient Computer Systems PDF Free - Full Version

by Igor Schagaev, Eugene Zouev, Kaegi Thomas| 2020| 315 pages| 11.44| English

Download Software Design for Resilient Computer Systems by Igor Schagaev, Eugene Zouev, Kaegi Thomas in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Software Design for Resilient Computer Systems

No description available for this book.

Detailed Information

Author:	Igor Schagaev, Eugene Zouev, Kaegi Thomas
Publication Year:	2020
Pages:	315
Language:	English
File Size:	11.44
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Software Design for Resilient Computer Systems Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Software Design for Resilient Computer Systems PDF?

Yes, on https://PDFdrive.to you can download Software Design for Resilient Computer Systems by Igor Schagaev, Eugene Zouev, Kaegi Thomas completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Software Design for Resilient Computer Systems on my mobile device?

After downloading Software Design for Resilient Computer Systems PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Software Design for Resilient Computer Systems?

Yes, this is the complete PDF version of Software Design for Resilient Computer Systems by Igor Schagaev, Eugene Zouev, Kaegi Thomas. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Software Design for Resilient Computer Systems PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.