ebook img

DTIC ADA469650: Component-Based Analysis of Fault-Tolerant Real-Time Programs PDF

0.26 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview DTIC ADA469650: Component-Based Analysis of Fault-Tolerant Real-Time Programs

(cid:3) Component-Based Analysis of Fault-Tolerant Real-Time Programs BorzooBonakdarpoury SandeepS.Kulkarniy AnishArorax yDepartmentofComputer xDepartmentofComputer Science andEngineering ScienceandEngineering MichiganStateUniversity OhioStateUniversity EastLansingMI48824USA ColumbusOhio43210USA Email: fborzoo,[email protected] Email: [email protected] http://www.cse.msu.edu/(cid:24)fborzoo,sandeepg http://www.cse.ohio-state.edu/(cid:24)anish Abstract We focus on decomposition of fault-tolerant real-time programs that are designed from their fault-intolerant versions. Towardsthisend,motivatedbytheconceptsofstatepredicatedetectionandstatepredicatecorrection[1]foruntimedsystems, we identify three types of components, namely, detectors, weak (cid:14)-correctors, and strong (cid:14)-correctors. We also consider differentlevelsoffault-tolerance,namely,soft-failsafe,hard-failsafe,nonmasking,soft-masking,andhard-masking,depending uponthesatisfactionofsafety,liveness,andtimingconstraintsinthepresenceoffaults.Weshowthatdependinguponthelevel oftolerance,fault-tolerantreal-timeprogramscontainoneormoredetectorsand/orweak/strong-(cid:14)correctors. Keywords:Fault-tolerance,Real-time,Component-baseddesign,Decomposition,Bounded-timerecovery Formalmethods. 1 Introduction Weanalyzereal-timefault-tolerantprogramsthataredesignedfromtheirfault-intolerantversions.Suchfault-tolerantprograms maybedesignedformaintenancetodealwithapreviouslyunanticipatedclassoffaultsorforseparatingfunctionalityofthe programfromitstoleranceproperties. Inboththesecases,(cid:147)reuse(cid:148)ofexistingprogramisdesirable,andpossiblymandatory. Animportantconcernforreuse-basedtechniquesfordesignoffault-toleranceistheircompleteness. Intuitively,completeness ofareuse-basedtechniquecapturestheabilityofthattechniquetoproduceanyfault-tolerantprogramfromafault-intolerant program, say p, assuming that there is some reuse-baseddesign of a fault-tolerant versionof p. Said another way, if p can bemadefault-tolerantbyanyreusedesign, thenthetechniqueshouldbe ableto yieldonesuchfault-tolerantprogram. Our (cid:3) Thiswork waspartially sponsored by NSFCAREERCCR-0092724, DARPAGrant OSURS01-C-1901, ONR Grant N00014-01-1-0744, NSFgrant EIA-0130724,andagrantfromMichiganStateUniversity. 1 Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 2007 2. REPORT TYPE 00-00-2007 to 00-00-2007 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Component-Based Analysis of Fault-Tolerant Real-Time Programs 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION Michigan State University ,Department of Computer Science and REPORT NUMBER Engineering,East Lansing,MI,48824 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES http://www.cse.msu.edu/publications/tech/TR/MSU-CSE-07-24.pdf 14. ABSTRACT 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION 18. NUMBER 19a. NAME OF OF ABSTRACT OF PAGES RESPONSIBLE PERSON Same as 25 a. REPORT b. ABSTRACT c. THIS PAGE unclassified unclassified unclassified Report (SAR) Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 focusinthispaperisonthecompletenessofreuse-basedtechniquesinthecontextofcomponent-baseddesignoffault-tolerant real-timeprograms. Regardingcompleteness,therearetwomainissuesinsuchcomponent-baseddesign:(1)Designmethod:whereonefocuses ontransformingthegivenprogramintoafault-tolerantprogram,and(2)Containmentquestion: wherewewanttodetermine whether such components exist in fault-tolerant programs irrespective of how they are designed. Regarding the (cid:2)rst issue, previously, in [1], Arora and Kulkarni have presented a sound and complete method for component-baseddesign of fault- tolerantuntimedprograms. Theirmethodisbasedontheprincipleofstatedetectionandstatecorrection. Inparticular,using thisprinciple,theyidentifytwocomponents,namely,detectorsandcorrectorsthatrespectivelyfocusondetectingwhetherthe executionofagivenprogramactionissafeinthegivenstateandrestoringtheexecutionofaprogramtoastatewhereacertain statepredicateistrue. Theysubsequentlyshowthat(1)detectorsarethenecessaryandsuf(cid:2)cientbuilding-blockfordesigning failsafefault-tolerantprograms,i.e.,programsthatsatisfytheirsafetyspeci(cid:2)cationinthepresenceoffaults,(2)correctorsare necessaryand suf(cid:2)cient buildingblocksfor designingnonmaskingprograms,i.e., programsthat recover to legitimate states afteroccurrenceoffaults,and(3)botharenecessaryandsuf(cid:2)cientformaskingprograms,i.e. programswherebothsafetyand livenessspeci(cid:2)cationaresatis(cid:2)edinthepresenceoffaults. Inthispaper,wefocusonthesecondquestion.Inparticular,weinvestigatewhethertheideaofstatedetectionandcorrection canbeappliedtoreal-timefault-tolerantprograms.Towardsthisend,wede(cid:2)nethreetypesofcomponents,namely,detectors, weak (cid:14)-correctors, and strong (cid:14)-correctors. Similar to [1], detectors are based on the conceptof detecting state predicates. However,basedontheclosurepropertiesofthecorrectionstatepredicate,weintroduceweakandstrong(cid:14)-correctors. Weillustratethatdependingupontheleveloffault-tolerance,existingreal-timefault-tolerantprogramsthatreusetheirfault- intolerantversioncontainoneormoreoftheabovecomponents. Weshowtheexistenceofthecomponentsbyinvestigating thenecessaryconditionsunderwhichfault-tolerancecanbeprovidedinthecontextofreal-timeprograms. Towardsthisend, were(cid:2)nethelevelsoffault-toleranceconsideredin[1]basedonthesatisfactionofsafety,liveness,andtimingrequirements (e.g.,deadlines)inthepresenceoffaults.Inparticular,weconsidersoft-failsafe,hard-failsafe,nonmasking,soft-masking,and hard-maskingfault-tolerance(cf. Section2forprecisede(cid:2)nitions). Intuitively,asoftfault-tolerantprogramisnotrequiredto meetitstimingconstraintsinthepresenceoffaults.However,intheabsenceoffaultsasoftfault-tolerantprogrambehaveslike itsfault-intolerantversion,i.e.,itsatis(cid:2)esitstimingconstraintsintheabsenceoffaults.Ontheotherhand,ahardfault-tolerant programmustsatisfyitstimingconstraintseveninthepresenceoffaults. Inotherwords,inhardfault-tolerantprograms,the demandforhardreal-timeprocessingmergeswithcatastrophicconsequencesofsystems, whereasinsoftfault-tolerancethe catastrophicconsequencesarenotrelatedtotheprogram’stimingconstraints.Furthermore,fornonmasking,soft-masking,and hard-maskinglevelsoffault-tolerance,wedistinguishtwocaseswherestatecorrectionisachievedinboundedorunbounded amountoftime.Inotherwords,weconsiderunboundedandbounded-timerecoveryinthepresenceoffaults. Inordertoformallyexpresstimingconstraintsofareal-timeprogram,wefocusonastandardpropertyinreal-timesystems called bounded response. Intuitively, a bounded response property speci(cid:2)es if a state predicate, say P, becomes true then anotherstate predicate,sayQ, mustbecometruewithina boundedamountoftime, say (cid:18). Observethataccordingto[2,3], suchpropertiesfallinthecategoryofsafetyproperties. Thus,inthispaperthenotionsafetyspeci(cid:2)cationconsistsofatimed part(i.e.,asetofboundedresponseproperties)andanuntimedpart(e.g.,stateperturbations),whichismodeledbyasetofbad transitions. 2 Contributionsofthepaper. Inthispaper,weconcentrateonthenecessityofstatepredicatedetectionandstatepredicate correctionwithina pre-speci(cid:2)edboundedamountoftime. Thatis, we showthat everyfault-tolerantreal-timeprogramthat reusesitsfault-intolerantversionmustcontaindetectorsor(weak/strong)(cid:14)-correctors.Themainresultsfromthispaperareas follows: (cid:15) We precisely de(cid:2)ne what it means for a fault-tolerant real-time program to contain a component. Our de(cid:2)nition of containmentfor(weak/strong)(cid:14)-correctorsissimilartothatin[1].However,fordetectors,ourde(cid:2)nitionofcontainment is more rigorous than that in [1] in that it precisely identi(cid:2)es how predicates being detected are related to the fault- intolerant program and how the output of the detector (called witness predicate) is being used by the fault-tolerant program. (cid:15) A nonmasking fault-tolerant program with recovery time (cid:18) is one that recovers to a state from where its subsequent computationsatis(cid:2)esboth(timedanduntimed)safetyandlivenesswithin(cid:18)timeunits.However,safetymaybeviolated beforetheprogramreachessuchastate.Weshowthatifaprogramsatis(cid:2)esboththesafetyandthelivenessspeci(cid:2)cations within(cid:18),thenthereexists(cid:14),where(cid:14)isafunctionof(cid:18),suchthattheprogramcontainsstrong(cid:14)-correctors. (cid:15) Ahard-failsafefault-tolerantprogramisonethatalwayssatis(cid:2)esbothuntimedandtimedpartsofitssafetyspeci(cid:2)cation. Regardinghard-failsafefault-tolerance, we show that hard-failsafefault-tolerant programs containdetectors to satisfy theuntimedpartandweak(cid:14)-correctorstosatisfythetimedpartoftheirsafetyspeci(cid:2)cationforsome(cid:14). (cid:15) A soft-maskingfault-tolerant programwith recoverytime (cid:18) is one that recoversto a state from where its subsequent computationsatis(cid:2)esbothsafetyandlivenesswithin(cid:18)timeunits. Moreover,untilsuchastateisreached,onlytheun- timedpartofitssafetyspeci(cid:2)cationisnotviolated.Weshowthatsoft-maskingfault-tolerantprogramscontaindetectors andstrong(cid:14)-correctorsforsome(cid:14),where(cid:14)isafunctionof(cid:18). (cid:15) Ahard-maskingfault-tolerantprogramwithrecoverytime(cid:18)isasoft-maskingprogramwithrecoverytime(cid:18)thatsatis(cid:2)es both untimed and timed parts of its safety speci(cid:2)cation during recovery. We show that hard-masking fault-tolerant programscontaindetectors,weak(cid:14)1-correctors,andstrong(cid:14)2-correctorsforsome(cid:14)1and(cid:14)2,where(cid:14)2isafunctionof(cid:18). Organizationofthepaper. Therestofthepaperisorganizedasfollows:InSection2,weformallyde(cid:2)nethenotionsofreal- timeprograms,faults,andfault-tolerance. Then,inSection3,wepresenttheformalde(cid:2)nitionoffault-tolerancecomponents forreal-timeprograms,namely,detectorsandweak/strong(cid:14)-correctors.InSection4,wedevelopthetheoryofthecomponents andweshowthenecessityoftheirexistenceinreal-timeprogramswithrespecttodifferentlevelsoffault-tolerance. Finally, inSection5,wemakeconcludingremarksanddiscussfuturework. 2 Real-Time Programs, Speci(cid:2)cations, and Fault-Tolerance Inthissection,wegiveformalde(cid:2)nitionsofreal-timeprograms,speci(cid:2)cations,faults,andfault-tolerance. Thede(cid:2)nitionof realprogramsisadaptedfromAlurandHenzinger[4]andAlurandDill[5]. Thede(cid:2)nitionofspeci(cid:2)cationsisbasedonthe work by Alpern and Schneider [2] and Henzinger [3]. Finally, the notion of faults and fault-tolerance is due to Arora and Gouda[6],andBonakdarpourandKulkarni[7]. 3 2.1 Real-TimePrograms LetV bea(cid:2)nitesetofdiscretevariablesandW bea(cid:2)nitesetofclockvariables. Eachdiscretevariableisassociatedwitha (cid:2)nitedomainDofvalues. Alocationisafunctionthatmapseachdiscretevariabletoavaluefromitsrespectivedomain. A clockconstraintoverthesetW ofclockvariablesisaBooleancombinationofformulasoftheformx(cid:22)corx(cid:0)y (cid:22)c,where x;y 2W,c2Z(cid:21)0,and(cid:22)iseither<or(cid:20). WedenotethesetofallclockconstraintsoverW by(cid:8)(W). Aclockvaluationis afunction(cid:23) : W ! R(cid:21)0 thatassignsarealvaluetoeachclockvariable. Furthermore,for(cid:28) 2 R(cid:21)0,(cid:23) +(cid:28) = (cid:23)(x)+(cid:28) for everyclockx. Also,for(cid:21)(cid:18)W,(cid:23)[(cid:21):=0]denotestheclockvaluationforW whichassigns0toeachx2(cid:21)andagreeswith(cid:23) overtherestoftheclockvariablesinW. Astate(denoted(cid:27))isapairhs;(cid:23)i,suchthatsisalocationand(cid:23) isaclockvaluationforW atlocations. Thesetofall possiblestatesiscalledthestatespaceobtainedfromtheassociatedvariables. De(cid:2)nition2.1 (computations) Acomputationisa(cid:2)niteorin(cid:2)nitetimedstatesequenceoftheform: (cid:27) =((cid:27)0;(cid:28)0)!((cid:27)1;(cid:28)1)!(cid:1)(cid:1)(cid:1) where(cid:27)iisastateinthestatespaceobtainedfromtheassociatedvariables,foralli2Z(cid:21)0,andthesequence(cid:28)0(cid:28)1(cid:28)2(cid:1)(cid:1)(cid:1) (called globaltime)satis(cid:2)esthefollowingconstraints: (cid:15) Initialization:(cid:28)0 (cid:21)0, (cid:15) Monotonicity:(cid:28)i (cid:20)(cid:28)i+1 foralli2Z(cid:21)0,and (cid:15) Divergence:if(cid:27)isin(cid:2)nite,forallt2R(cid:21)0,thereexistsj suchthat(cid:28)j (cid:21)t. NoticethatinDe(cid:2)nition2.1,wedonotspecifyaninitialvalueforglobaltime. Itfollowsthatinacomputation(cid:27),(cid:28)0 can beassignedanyvaluefromR(cid:21)0. Let(cid:27)+tdenotethecomputationwithtimeshiftt 2 R,suchthat(cid:28)0 (cid:21) 0(i.e.,(cid:28)i becomes (cid:28) +tforalli(cid:21)0).Also,let(cid:6)beanysetofcomputations.Forall(cid:27) 2(cid:6),werequirethatforalltimeshiftst2R,(cid:27)+tbein i (cid:6)aswell. De(cid:2)nition2.2 (suf(cid:2)xandfusionclosure) Suf(cid:2)xclosureofasetofcomputationsmeansthatifa computation(cid:27) isinthat set then so are all the suf(cid:2)xes of (cid:27). Fusion closure of a set of computations means that if computations h(cid:11);((cid:27);(cid:28));(cid:13)i and h(cid:12);((cid:27);(cid:28)); iareinthatsetthensoarethecomputationsh(cid:11);((cid:27);(cid:28)); iandh(cid:12);((cid:27);(cid:28));(cid:13)i,where(cid:11)and(cid:12) are(cid:2)nitepre(cid:2)xesof computations,(cid:13)and aresuf(cid:2)xesofcomputations,and(cid:27)isastateatglobaltime(cid:28). Notation. Let(cid:27) denotethepair((cid:27) ;(cid:28) )incomputation(cid:27). Also,let(cid:11)and(cid:12) be(cid:2)nitecomputations,wherethelengthof(cid:11) i i i isn. Theconcatenationof(cid:11)and(cid:12) (denoted(cid:11)(cid:12))isacomputation,whereeitherclockvariables(exceptpossiblyasubsetthat isreset)andglobaltimeof(cid:11)n(cid:0)1 and(cid:12)0 areequal,buttheirlocationsaredifferent,ortheirlocationsareequal,buttheclock variablesandglobaltimeareadvancedequally. If(cid:0)and(cid:9)aretwosetsof(cid:2)nitecomputations,then(cid:0)(cid:9)=f(cid:11)(cid:12) : (cid:11) 2 (cid:0)and (cid:12) 2(cid:9)g. De(cid:2)nition2.3 (real-timeprograms) Areal-timeprogrampisspeci(cid:2)edbyasetofdiscretevariables,asetofclockvariables, andasuf(cid:2)xclosedandfusionclosedsetofmaximalcomputationsinthestatespaceofp(denotedS ). Bymaximal,wemean p 4 thatif(cid:27) =(cid:11)(cid:12),wherethepre(cid:2)x(cid:11) =((cid:27)0;(cid:28)0) ! ((cid:27)1;(cid:28)1) ! (cid:1)(cid:1)(cid:1)((cid:27)n;(cid:28)n)andthein(cid:2)nitesuf(cid:2)x(cid:12) = (hsn+1;(cid:23)n+1i;(cid:28)n+1) ! (hsn+1;(cid:23)n+2i;(cid:28)n+2) ! (hsn+1;(cid:23)n+3i;(cid:28)n+3)(cid:1)(cid:1)(cid:1), is acomputationofpsuchthat(cid:23)j+1 = (cid:23)j +((cid:28)j+1 (cid:0)(cid:28)j), forallj > n, thenthereisnoothercomputationofpthathasapre(cid:2)xof(cid:11). Inotherwords,givena(cid:2)nitecomputationpre(cid:2)x(cid:11)ofp,pdoes notcontainthecomputationthatstutters(cid:27)n+1 in(cid:2)nitelyifthereisanyothercomputationofpthatextends(cid:11). Notation. For simplicity, we use the pseudo-arithmeticexpressionsto denotetiming constraints over (cid:2)nite computations. Forinstance,(cid:27)(cid:20)(cid:14),where(cid:14) 2 Z(cid:21)0,denotesa(cid:2)nitecomputation((cid:27)0;(cid:28)0) ! ((cid:27)1;(cid:28)1) ! (cid:1)(cid:1)(cid:1)((cid:27)n;(cid:28)n)thatsatis(cid:2)esthetiming constraint(cid:28)n(cid:0)(cid:28)0 (cid:20)(cid:14). De(cid:2)nition2.4 (statepredicates) AstatepredicateofpisasubsetofSp. WesaythatastatepredicateS isclosedinpiffin everycomputation,((cid:27)0;(cid:28)0) ! ((cid:27)1;(cid:28)1) ! (cid:1)(cid:1)(cid:1),ofp,ifS istrueinstate(cid:27)j,j 2 Z(cid:21)0,(denoted(cid:27)j j= S)thenS remainstrue forallstates(cid:27) ,wherek (cid:21)j (i.e.,(cid:27) j=S). k k De(cid:2)nition2.5 (S-computations) LetSbeastatepredicateandpbeaprogram.TheS-computationsofp,denotedaspjS, isthesetofcomputationsofpthatstartinastatewhereS istrue. Noticethat since the set of computationsof a programis suf(cid:2)xclosed andfusionclosed, the programcan be written in terms of transitions that it can execute[1]. Hence, we can view a programp as a set of transitions. To concisely write the transitionsofaprogram,weusetimedguardedcommands[4].Atimedguardedcommand(alsocalledtimedactions)isofthe (cid:21) formL::Guard (cid:0)!statement,whereLisalabel,Guard isastatepredicate,statement isastatementthatdescribeshowthe programstateisupdated,and(cid:21)isasetofclockvariablesthatareresetbyexecutionofL.Thus,Ldenotesthesetoftransitions f((cid:27)0;(cid:27)1) j Guard is true in state (cid:27)0, (cid:27)1 is obtainedby resetting the clock variables in (cid:21) and changing(cid:27)0 as prescribedby statementg.Aguardedwaitcommand(alsocalleddelayaction)isoftheformGuard (cid:0)!wait,whereGuard identi(cid:2)esthe stateswhereadelaytransitionisallowedtobetaken(i.e.,theprogramstuttersinalocationandletstimeadvance).Aguarded wait commanddelaystheprogrambyanarbitraryamountoftime, as longas Guard remainstrue. We presentexamplesof timedguardedcommandsinSection4. 2.2 Speci(cid:2)cations Similar to a program, a speci(cid:2)cation(also called property) is speci(cid:2)ed by sets of discrete and clock variables (respectively, statespace)andasuf(cid:2)xclosedandfusionclosedsetof((cid:2)niteorin(cid:2)nite)computationsoverthestatespaceobtainedfromthose variables. Inorderto capturethe real-timeproperties ofprograms(e.g., deadlinesandrecoverytime), in this paper,we focusona standardpropertyofreal-timesystemscalledstableboundedresponse. De(cid:2)nition2.6 (stableboundedresponse) LetP andQbetwostatepredicates.Astableboundedresponseproperty(denoted P 7!(cid:20)(cid:14) Q)isthesetofallcomputations((cid:27)0;(cid:28)0) ! ((cid:27)1;(cid:28)1) ! (cid:1)(cid:1)(cid:1) inwhichif(cid:27)i j= P,forsomei (cid:21) 0,thenthereexistsj, j (cid:21)i,suchthat(1)(cid:27) j=Q,(2)(cid:28) (cid:0)(cid:28) (cid:20)(cid:14),and(3)forallk,i(cid:20)k <j,(cid:27) j=P. Inotherwords,itisalwaysthecasethata j j i k stateinP isfollowedbyastateinQwithin(cid:14)timeunitsandP continuouslyremainstrueuntilQbecomestrue.WecallP the eventpredicate,Qtheresponsepredicate,and(cid:14)theresponsetime. 5 Assumption2.7 We assume that the set of clock variables of any stable boundedresponse propertyP 7!(cid:20)(cid:14) Q contains a specialclockvariable,whichisresetwheneverP becomestrue. Thisassumptionisnecessarytoensurethatstablebounded responsepropertiesarefusionclosed. Thespeci(cid:2)cationsconsideredinthispaper(denotedSPEC)areanintersectionofthesafetyspeci(cid:2)cationandtheliveness speci(cid:2)cation,de(cid:2)nednext. De(cid:2)nition2.8 (safetyspeci(cid:2)cation) Wede(cid:2)nethesafetyspeci(cid:2)cationbyasetofcomputationsbasedon(1)asetSPECbt of instantaneous bad transitions of the form hs0;(cid:23)i ! hs1;(cid:23)[(cid:21) := 0]i where s0 and s1 are two locations and (cid:21) is a set of clock variables, and (2) a set SPECbr of m bounded response properties of the form (P1 7!(cid:20)(cid:14)1 Q1) ^ (P2 7!(cid:20)(cid:14)2 Q2) ^ ::: ^ (Pm 7!(cid:20)(cid:14)m Qm),forsomem,m (cid:21) 1. Precisely,thesafetyspeci(cid:2)cationistheintersectionofthefollowing sets: 1. thesetofcomputationswherenopre(cid:2)xcontainsatransitioninSPEC ,and bt 2. theintersectionofsetsofcomputationscorrespondingtoeachstableboundedresponsepropertyP 7! Q inSPEC , i (cid:20)(cid:14)i i br where1(cid:20)i(cid:20)m. Notation. Withabuse ofnotationforsimplicity, throughoutthepaper,wheneverwe referto SPEC , we mean thecorre- bt spondingsetofcomputationsthatdonotcontainatransitioninSPEC . bt De(cid:2)nition2.9 (livenessspeci(cid:2)cation) Alivenessspeci(cid:2)cationisasetofcomputationsthatmeetsthefollowingcondition: foreach(cid:2)nitecomputation(cid:11)thereexistsacomputation(cid:12)suchthat(cid:11)(cid:12) isinthatset. Notation. WeuseS(cid:3) todenotea(cid:2)nitecomputation((cid:27)0;(cid:28)0) ! ((cid:27)1;(cid:28)1) ! (cid:1)(cid:1)(cid:1)((cid:27)n;(cid:28)n)suchthat(cid:27)i j= S foralli,where 0(cid:20)i(cid:20)n.Thus,(true)(cid:3) denotesanarbitrary(cid:2)nitecomputation. Now,wede(cid:2)newhatitmeansforprogramptore(cid:2)neaspeci(cid:2)cationSPEC,andwhatitmeansforprogramp0 (typically, a fault-tolerantprogram)to re(cid:2)ne programp (typically,a fault-intolerantprogram). Essentially, we wouldlike to de(cid:2)ne’p0 re(cid:2)nesp’iffcomputationsofp0 areasubsetofthatinp. However,ifp0 isobtainedbyaddingfault-tolerancetopthenp0 may containadditionalvariablesthat arenotinp. Hence, it will benecessarytoprojectthecomputationsofp0 on(thevariables of) p and then check if the projected computationis a computationof p. More precisely, the projection of a state of p0 on p(respectively,SPEC)isastateobtainedbyconsideringonlythe(discreteandclock)variablesofp(respectively,SPEC). Extendingthisde(cid:2)nitionforcomputations,wesaythattheprojectionofacomputationofp0 onp(respectively,SPEC)isa computationobtainedbyprojectingeachstateinthatcomputationonp(respectively,SPEC). De(cid:2)nition2.10 (re(cid:2)nes) Wesaythatp0 re(cid:2)nesp(respectively,SPEC)fromS iffthefollowingtwoconditionshold: (1)S isclosedinp0,and(2)Foreverycomputationofp0thatstartsinastatewhereSistrue,theprojectionofthatcomputationonp (respectively,SPEC)isacomputationofp(respectively,SPEC). We also de(cid:2)nethe notionof maintainsfor(cid:2)nite computations. Speci(cid:2)cally, givena (cid:2)nite pre(cid:2)x (cid:11), (cid:11) maintainsSPEC capturesthatthespeci(cid:2)cationisnotyetviolatedin(cid:11). 6 De(cid:2)nition2.11 (maintains) Let (cid:11) be a pre(cid:2)x of a computation of p. We say that (cid:11) maintains SPEC iff there exists a computation(cid:12)suchthattheprojectionof(cid:11)(cid:12)onSPEC isinSPEC. Informallyspeaking,provingthecorrectnessofpwithrespecttoSPEC involvesshowingthatpre(cid:2)nesSPEC fromsome statepredicateS,whereS 6=fg.WecallsuchastatepredicateS aninvariantofp. De(cid:2)nition2.12 (invariant) LetSbeastatepredicate.WesaythatSisaninvariantofpforSPEC iffpre(cid:2)nesSPEC from SandS6=fg. 2.3 Faultsand Fault-ToleranceinReal-Time Programs Intuitively,thefaults thata programis subject toaresystematicallyrepresentedbytheunionoftransitionswhose execution perturbstheprogramstateandtransitionsthatunexpectedlyadvancetime. Whilestatecorruptionfaultsmayindirectlycause wasteoftime,delayfaultsdirectlycausewasteoftimeinthesensethattheydefertheoccurrenceofsomedesirableeventby someamountoftime.Forinstance,aprocessorcrashmayrequireaschedulertoassignanotherprocessortoasetofjobs.Itis natural,tomodelthedelayinstarttimeofsuchjobsbydelayfaultsthatonlyadvancethevalueofclockvariables. Formally, wemodelfaultsasasetoftransitionsover(discreteandclock)variablesofp. De(cid:2)nition2.13 (faults) ForaprogrampwithstatespaceSp,thesetf offaultsisspeci(cid:2)edbytheunionofthefollowingtwo sets: 1. thesetfs ofimmediatefaultsofthefromhs0;(cid:23)i ! hs1;(cid:23)[(cid:21) := 0]iwheres0 ands1 aretwolocationsand(cid:21)isasetof (possiblyempty)clockvariables,and 2. asetft ofdelayfaultsoftheformhs;(cid:23)i!hs;(cid:23)+(cid:28)i,whichkeepstheprograminalocationforsometime(cid:28) 2R(cid:21)0. We now de(cid:2)ne what we mean by computations of a program in the presence of faults. Given a program p and faults f, we de(cid:2)ne the computations of p in the presence of f by (cid:2)nite fusion-closure of the computations of p[f as follows. Let Z bea set of computationsand(cid:15) be theoperatorthat fusestwo ((cid:2)nite orin(cid:2)nite)computationsofZ suchthat (cid:15)(Z) = f(cid:11)((cid:27);(cid:28))(cid:12) j9(cid:13); : ((cid:11)((cid:27);(cid:28))(cid:13) 2 Z) ^ ( ((cid:27);(cid:28))(cid:12) 2 Z)g. Also,letFFC(Z)bethesmallest(cid:2)xpointof[1 (cid:15)i (Z). Now, i=0 wede(cid:2)nethecomputationsofpinthepresenceoff (denotedp[]f)asFFC(p[f). Assumption2.14 Observethattheaboveformulationofprogramcomputationsinthepresenceoffaultsguaranteesthatthe numberofoccurrenceoffaultsinacomputationis(cid:2)nite. Inthispaper,however,sincewedealwithreal-timeprogramsand ourgoalistoidentifycomponentsthatprovide(cid:147)bounded-time(cid:148)recoveryinthepresenceoffaults,weassumethatthenumber of occurrence of faults in all computations is boundedby some number k 2 Z(cid:21)0. This assumption is reasonable in many commonlyconsideredfault-tolerantreal-timeprograms. Infact,itcanbeshownthatprovidingbounded-timerecoveryinthe presenceofunboundednumberoffaultsisnotpossible. Justasweuseinvariantstoshowprogramcorrectnessintheabsenceoffaults,weusefault-spanstoshowthecorrectness ofprogramsinthepresenceoffaults. 7 De(cid:2)nition2.15 (fault-span) LetSandT bestatepredicatesandf beasetoffaulttransitions.WesaythatT isanf-spanof pfromS iffS (cid:18)T,andT isclosedinp[]f. Notation. Henceforth,if the speci(cid:2)cationorprojectionofa programonSPEC is clear fromthecontext, we omitit. For example,(cid:147)S isaninvariantofp(cid:148)abbreviates(cid:147)S isaninvariantofpforSPEC(cid:148). Likewise,(cid:147)acomputationofpisinSPEC(cid:148) abbreviates(cid:147)theprojectionofthatcomputationonSPEC isinSPEC(cid:148). Similarly,(cid:147)T isanf-spanofp(cid:148)abbreviates(cid:147)T isan f-spanofpfromS(cid:148). Wenowde(cid:2)newhatwemeanbylevelsoffault-toleranceinthecontextofreal-timeprograms.Obviously,intheabsenceof faults,aprogramshouldre(cid:2)neitsspeci(cid:2)cation.Inthepresenceoffaults,however,itmaynotre(cid:2)neitsspeci(cid:2)cationand,hence, itmayre(cid:2)nesome(possibly)weaker‘tolerancespeci(cid:2)cation’.Thesespeci(cid:2)cationsarebasedonsatisfactionofacombinations ofsafety,liveness,timingconstraints,andadesirablebounded-timerecoverymechanisminthepresenceoffaults. Intuitively, we consider three levels of fault-tolerance, namely failsafe, nonmasking, and masking, based on satisfaction of safety and livenesspropertiesinthepresenceoffaults.Forfailsafeandmaskingfault-tolerance,we,furthermore,considertwoadditional levels,namelysoftandhard,basedonsatisfactionoftimingconstraintsinthepresenceoffaults. Moreover,nonmaskingand maskingfault-tolerantprogramsareassociatedwitharequiredrecoverytime,whichcanbeboundedorunbounded. Tomotivatetheideaofsoftandhardfault-tolerance,letusconsidertherailroadcrossingproblem. Supposethatatrainis approachingarailroadcrossing. Thesafetyspeci(cid:2)cationrequiresthat(cid:147)ifthetrainiscrossing,thegatemustbeclosed(cid:148). Also, astableboundedresponsepropertyrequiresthat(cid:147)oncethegateisclosed,itshouldreopenwithin5minutes(cid:148). Inthisexample, it may be catastrophic if the train is crossing while the gate is open due to occurrence of faults. On the other hand, if the gateremainsclosedformorethan5minutesduetooccurrenceoffaults,theoutcomeisnotdisastrous. Thus,dependingupon theoutcomeofviolationofasafetyspeci(cid:2)cation,thedesiredleveloffault-tolerancechanges. Hence,intherailroadcrossing problemthesystemmusttoleratefaultsthatcausethegatetoremainopenwhilethetrainiscrossing.Wecallsuchasystemsoft fault-tolerant.Intuitively,asoftfault-tolerantreal-timeprogramisnotrequiredtosatisfyitstimingconstraintsinthepresence offaults. Now, consider a system that controls internal pressure of a boiler. Suppose that in this system, the safety speci(cid:2)cation requires that once a pressure gauge reads 30 poundsper square inch, the controllermust issue a commandto open a valve within20seconds. Insuchasystem,ifoccurrenceoffaultscausesthecontrollernottorespondwithintherequiredtime,the outcomemaybe disastrous. Thus, ourboilercontrollermustsatisfy its timingconstraintsevenin the presenceoffaults. In otherwords,theboilercontrollermustbehardfault-tolerant. Intuitively,ahardfault-tolerantreal-timeprogrammustsatisfy its timing constraints even in the presence of faults. In fact, in hard fault-tolerant programs, the demand for hard real-time processingmergeswith catastrophicconsequencesofsystems, whereasinsoftfault-tolerancethecatastrophicconsequences arenotrelatedtotheprogram’stimingconstraints. Below, we de(cid:2)ne tolerance speci(cid:2)cations that often occur in practice. Let SPEC be the speci(cid:2)cation obtained by the intersectionofSPEC ,SPEC ,andlivenessspeci(cid:2)cationasde(cid:2)nedinSubsection2.2. bt br De(cid:2)nition2.16 (soft and hard-failsafetolerance speci(cid:2)cation) The soft-failsafe tolerance speci(cid:2)cationof SPEC is the smallestsafetyspeci(cid:2)cationcontainingSPEC (denotedSSPEC ). Thehard-failsafetolerancespeci(cid:2)cationofSPEC is bt bt the intersection of SSPEC and the smallest speci(cid:2)cationcontainingSPEC (denotedSSPEC ). I.e, The hard-failsafe bt br br 8 tolerancespeci(cid:2)cationofSPEC isSSPEC \SSPEC . bt br De(cid:2)nition2.17 (nonmasking tolerance speci(cid:2)cation) Since in nonmasking fault-tolerance it is not required to satisfy the safety speci(cid:2)cation in the presence of faults, the nonmasking tolerance speci(cid:2)cation of SPEC with recovery time (cid:18) is (true)(cid:3) SPEC. (cid:20)(cid:18) De(cid:2)nition2.18 (softandhard-maskingtolerancespeci(cid:2)cation) Thesoft-maskingtolerancespeci(cid:2)cationofSPEC with recoverytime(cid:18)isSPEC SPEC.Thehard-maskingtolerancespeci(cid:2)cationofSPEC withrecoverytime(cid:18)isSPEC. bt(cid:20)(cid:18) Remark. Noticethatthehard-maskingtolerancespeci(cid:2)cationisindependentoftherecoverytime(cid:18). Thisisbecauseunlike nonmaskingandsoft-masking,inhard-maskingfault-tolerancetheentirespeci(cid:2)cation(i.e.,SPEC)mustalwaysbesatis(cid:2)ed and,hence,recoverytimebecomesonlyamatteroftheamountoftimethattheprogramisinstatesoutsideitsinvariant. Usingthesede(cid:2)nitions,wearenowreadytode(cid:2)newhatitmeansforaprogramtotolerateafault-classf.Withtheintuition thataprogramisf-toleranttoSPEC ifitre(cid:2)nesSPEC intheabsenceoffaultsanditre(cid:2)nesatolerancespeci(cid:2)cationofSPEC inthepresenceoff,wede(cid:2)ne‘f-toleranttoSPEC fromS’asfollows. De(cid:2)nition2.19 (fault-tolerant programs) We say that p is soft/hard-failsafe f-tolerant to SPEC from S (respectively, nonmaskingorsoft/hard-maskingf-toleranttoSPEC withrecoverytime(cid:18)fromS)iffthefollowingtwoconditionshold: (cid:15) pre(cid:2)nesSPEC fromS,and (cid:15) thereexistsT suchthatT (cid:19) S andp[]f re(cid:2)nesthesoft/hard-failsafetolerancespeci(cid:2)cationofSPEC fromT (respec- tively,thenonmaskingorsoft/hard-maskingtolerancespeci(cid:2)cationofSPEC withrecoverytime(cid:18)fromT). Notethatfornonmaskingandmaskinglevelsoffault-toleranceonecanchoosetohaveunbounded-timerecovery.Weaddress theeffectofthechoiceofrecoverytimeinSection4. Notation. Inthesequel,wheneverthespeci(cid:2)cationSPEC andtheinvariantSareclearfromthecontext,weomitthem;thus, (cid:147)nonmaskingf-tolerantwithrecoverytime(cid:18)(cid:148) abbreviates(cid:147)nonmaskingf-toleranttoSPEC withrecoverytime (cid:18) fromS(cid:148), andsoon. 3 Real-Time Fault-Tolerance Components Inthissection,wepresentreal-timefault-tolerancecomponents,namely,detectors,weak(cid:14)-correctors,andstrong(cid:14)-correctors. Oncewe de(cid:2)nethese components,inSection4, wepresent therelevanceofeachcomponenttothe levelsoffault-tolerance introducedinSubsection2.3. 3.1 Detectors Inthissubsection,weformallyintroducethe(cid:2)rstofthethreetolerancecomponents,detectors. We willdeveloptheirtheory andpresentasimplealtitudeswitchexampletoillustrateaninstanceofdetectorsinSubsection4.2. 9

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.