Improving the Software Development Process Using Testability Research Je(cid:11)rey M. Voas Keith W. Miller Reliable Software Technologies Corp. Department of Computer Science Penthouse Suite 101 Jones Hall 1001 North Highland Blvd. College of William & Mary Arlington, VA 22201 Williamsburg, VA 23185 Abstract 2 Preliminaries Softwaretestabilityisthethetendencyofcodetore- Software testability analysis is a process for mea- vealexistingfaultsduringrandomtesting. Thispaper suring the \value" provided by a particular software proposes to take software testability predictions into testingscheme,wherethevalueofaschemecanbeas- account throughout the development process. These sessed in di(cid:11)erent ways. For instance, software testa- predictions can be made from formal speci(cid:12)cations, bility has sometimes been assessed via the ease with design documents, and the code itself. The insight which inputs can be selected to satisfy some struc- providedbysoftwaretestabilityisvaluableduring de- tural testing criteria, e.g., branch coverage. With this sign, coding, testing, and quality assurance. We fur- view oftestability,ifit turned out to be an extremely therbelievethat softwaretestabilityanalysiscanplay di(cid:14)cult task to (cid:12)nd inputs that satis(cid:12)ed a particu- a crucial role in quantifyingthe likelihoodthat faults larstructural coveragecriteria of aprogram,then the arenothidingaftertesting does not result inanyfail- testability of the program would be reduced. A dif- ures for the current version. ferent view of software testability de(cid:12)nes it to be a prediction of the probability that existing faults will berevealedduringtestinggivenanarbitraryinputse- 1 Introduction lectioncriteriaC [17]. Here, softwaretestabilityisnot purely regarded as an assessment of the di(cid:14)culty to Software development processes typically focus on select inputs to cover software structure, but instead avoiding errors, detecting and correcting software asawayofpredictingwhetheraprogramwouldreveal faults that do occur, and predicting reliability after existingfaultsduringtestingwhenCisthemethodfor development. In this paper we emphasize the third generating inputs. focus, the analysis of software to determine its relia- To compare these two view points, we must (cid:12)rst bility. understand the underlying assumption on which the We contend that software testability, the tendency (cid:12)rst de(cid:12)nition operates. It implicitly assumes that ofcode to revealits faultsduringrandomtesting, isa the more software structure exercised during testing, signi(cid:12)cantfactorinsoftwarequality. As such,wepro- the greater the likelihood that existing faults will be pose to take testability into account throughout the revealed. Withthisde(cid:12)nition,astraight-lineprogram development process. The insight provided by soft- withoutanyconditionalexpressions orloopswouldbe waretestabilityisvaluableduringdesign,coding,test- assigned a higher software testability than any pro- ing,and qualityassurance. gram with a more complex (cid:13)ow of control. However In the (cid:12)rst section below we give an overview of when testability is based on the probability of fault software testability and of a model used to analyze detection, a straight-line program without any con- code for testability. In the following four sections we ditional expressions or loops could potentially be as- describehowtestabilitycanbetakenintoaccountdur- signed a lower testability than a program with more ingdesign,coding,testing,andqualityassurance. The complex (cid:13)ow of control. This would not occur with (cid:12)nalsectionsummarizesanddescribes areasforfuture the coverage-based de(cid:12)nition. This is because there research. areconditionsotherthancoveragethatcandetermine whether or not software will fail during testing. The conditions.) Itisthe second andthirdconditionsthat advantageofourde(cid:12)nitionisthatitincorporatesother the second de(cid:12)nition of testability takes into account factors than coverage that play an important role in that the (cid:12)rst de(cid:12)nition does not. This is the essential whether faults will hide during testing. These factors di(cid:11)erence. willbe described later. A semantic-based de(cid:12)nition of testability predicts In either de(cid:12)nition, software testability analysis is the probability that tests will uncover faults if any afunctionofa(program,inputselectioncriteria)pair. faults exist. The software is said to have high testa- The means by which inputs are selected is a parame- bilityforaset oftests ifthe tests are likelytouncover terofthetesting strategy: inputscanbe selected ran- any faults that exist; the software has low testability domly, they can be based upon the structure of the for those tests if the tests are unlikelyto uncoverany program,or they maybe based on the tester’s human faults that exist. Since it is a probability, testability intuition. Testability analysis is impacted heavily by is bounded in a closed interval[0,1]. thechoice ofinputselection criteria. Testabilityanal- In order to make a prediction about the probabil- ysis is more than an assertion about a program, but itythatexistingfaultswillbe revealedduringtesting, rather is an assertion about the ability of an input formal testability analysis should be able to predict selection criteria (in combination with the program) whether afaultwillbeexecuted, whether itwillinfect to satisfy a particular testing goal. Programs may the succeeding data state creating a data state error, have varying testabilities when presented with vary- andwhether thedatastate errorwillpropagateitsin- ingmeans of generating inputs. correctness intoanoutput variable. Whenanexisting Fromthis point on, we will only discuss the latter data state error does not propagate into any output de(cid:12)nitionofsoftwaretestability,whichisbasedonthe variable, we say that the data state error was can- probability of tests uncovering faults. Furthermore, celled. When all of the data state errors that are cre- we will concentrate on black-box random testing as ated during an execution are cancelled, the existence the type of testing used to establish the testability of of the faultthat trigged the data state errors remains a program. hidden,resultinginalowersoftwaretestability. These In order for software to be assessed as having a conditions provide a formal means for predicting the \greater" testability by this de(cid:12)nition, it must be testability of software that is tightly coupled to the likely that a failure occurs during testing whenever fault/failure modelof computation. afaultexists. Tounderstand this likelihood,itisnec- PIE [17, 14, 15] is a formal model that can be essarytounderstand thesequence ofeventsthatleads used toassess softwaretestabilitythat isbasedonthe uptoasoftwarefailure. (Bysoftwarefailure,wemean fault/failure model. PIE is based on three subpro- an incorrect output that was caused by a (cid:13)aw in the cesses, each of whichis responsible for estimatingone program,notanincorrect outputcausedbyaproblem condition of the fault/failure model: Execution Anal- 1 withthe environmentinwhichthe programisexecut- ysis (EA)estimatesthe probabilitythatalocation is ing.) Software failure only occurs when the following executed according to a particular input distribution; three necessary and su(cid:14)cient conditions occur in the Infection Analysis(IA)estimates the probabilitythat followingsequence: a syntactic mutant a(cid:11)ects a data state; and Propa- gation Analysis (PA)estimates the probabilitythat a 1. A input must cause a fault to be executed. data state that has been changed a(cid:11)ects the program 2. Once the fault is executed, the succeeding data outputafterexecutionisresumedonthechangeddata state must containa data state error. state. PIE makes predictions concerning future program 3. Once the data state error is created, the data behaviorbyestimatingthee(cid:11)ectthat(1)aninputdis- state error mustpropagate to an output state. tribution,(2)syntacticmutants,and(3)changeddata This model is termed the fault/failure model, and valuesin data states haveon current programbehav- it’s origins in the literature can be traced to [9, 12]. ior. More speci(cid:12)cally,the technique (cid:12)rst observes the Thefault/failuremodelrelatesprograminputs,faults, behaviorofthe programwhen (1) the programisexe- data state errors, and failures. Since faults trigger cuted with a particular input distribution, (2) a loca- datastateerrorsthattriggerfailures,anyformaltesta- tionoftheprogramisinjectedwithsyntacticmutants, bility analysis model that uses the second de(cid:12)nition 1AlocationinPIEanalysisisbasedonwhatKorel[6]terms forsoftwaretestabilityshould takeintoaccount these a single instruction: an assignment, input statement, output three conditions. ([2] is an example of a mutation- statement,and the <condition> partof an if or while state- basedtestingmethodologythatconsiders the(cid:12)rsttwo ment. and(3) adata state (that iscreated dynamicallybya data state error will cause programfailure given that program location for some input) has one of its data alocationcreates adatastate error follows. Thispro- valuesalteredandexecutionisresumed. Afterobserv- cess is repeated several times (over a set of program ingthebehavioroftheprogramunderthese scenarios, inputs) for each location: The program is executed the technique then predicts future program behavior with an input selected at randomfrom the input dis- if faults were to exist. These three scenarios simulate tribution. Program execution is halted just after exe- the three necessary and su(cid:14)cient conditions for soft- cuting the location, a randomlygenerated data value warefailuretooccur: (1)afaultmustbeexecuted, (2) is injected intosomevariable,and programexecution a data state error must be created, and (3) the data isresumed. Ifthelocationisinaloop,wecustomarily state error must propagate to the output. Therefore inject another randomly selected value into the same the technique is based (cid:12)rmlyon the conditions neces- variable on each successive iteration. Speci(cid:12)c details sary for software failure. on how this process is performed are found in [14]. This process simulates the creation of a data state The process for predicting the probability that a error during execution. We term this process \per- location is executed follows: the program is instru- turbing" a data state, since the value of a variable at mented to record when a particular location is ex- some point during execution represents a portion of ecuted via a print command that is added into the a data state. The tool then observes any subsequent source code and then compiled. The instrumented propagation of the perturbed data state to successor programisthenrunsomenumberoftimeswithinputs output states after execution is resumed. This pro- selectedatrandomaccordingtotheinputdistribution cess is repeated a (cid:12)xed number of times, with each of the program. The proportion of inputs that cause perturbed data state a(cid:11)ecting the same variable at theprintcommandto be invokedinthe instrumented the same point in execution. For instance, assume program out of the total number of inputs on which thatafterperformingthisprocess onsomevariable10 theinstrumentedprogramisexecutedisanestimateof times the output is a(cid:11)ected 3 of those times. Then this probability. This probabilityestimate alongwith the resulting probability estimate would be 0.3 with othersforthesoftwarecanthenbeused topredict the somecon(cid:12)denceinterval[7]. Thisprocessisperformed software’s testability. using di(cid:11)erent variables as the recipients of the per- The process for predicting the probability that a turbed datastates. Probabilityestimates foundusing faultinalocationwilla(cid:11)ect thedatastate ofthepro- theperturbeddatastatescanbeusedtopredictwhich gram will now be provided. This process is repeated regions of a program are likely and which regions are several times for each location: a syntactic mutation unlikelytopropagatedatastate errors caused bygen- is made to the location in question. The program uine software faults. These probability estimates for withthismutatedlocationisthenrunsomenumberof this location along with those for other locations in timeswithinputsselected atrandomaccordingtothe thesoftwarecanthenbeusedtopredictthesoftware’s program’s input distribution. For all the times the testability. mutated location is executed, we record the propor- tionoftimesthatthe programwiththe mutatedloca- PISCES is a tool developed in C++ that imple- tion produces a di(cid:11)erent data state than the original ments the PIE technique for software written in C. location;this proportion is our estimate of the proba- The building of PISCES has occurred in stages over bility that a fault at this location infects. For exam- the past several years. The (cid:12)rst commercial version ple,suppose thataprogramisexecuted 10times,and of PISCES is hoped to be completed by September during the 10 executions the original location is exe- ’92. This version will incorporate all the nuances of cuted1000times,and345datastatesproducedbythe thetheoreticalmodel. Thefundingavailabletouswill mutated programare di(cid:11)erent than what the original determinetheschedulingofthisproject. ThePISCES \unmutated" location produces, then our probability program and design were written by Je(cid:11)ery Payne of estimate is 0.345 with an associated con(cid:12)dence inter- RST Corp. val. In general, many di(cid:11)erent syntactic mutants are Another testability model that can sometimes be madefor a singlelocation,each yieldinga probability quanti(cid:12)ed via the code or speci(cid:12)cation is termed the estimate in this manner. These probabilityestimates domain/range ratio (DRR). This model di(cid:11)ers from forthislocationalongwiththoseforotherlocationsin PIE in that it is static instead of dynamic. Also, an- thesoftwarecanthenbeusedtopredictthesoftware’s other di(cid:11)erence isthatPIEisafunctionofthe proba- testability. bilitydensityfunctionoverthedomainoftheprogram, The process for predicting the probability that a whereas the DRR metric is independent of the prob- ability density function. The domain/range ratio of Once a DRR is estimated for each module design, a speci(cid:12)cation is the ratio between the cardinality of thedesignercanidentifymoduleswhosehighDRRin- the domain of the speci(cid:12)cation to the cardinality of dicates that the module will tend to hide faults from the range of the speci(cid:12)cation. We denote a DRR by random testing. In most applications such modules (cid:11):(cid:12), where (cid:11) is the cardinalityofthe domain,and(cid:12) are inevitable: when data are distilled, a high DRR is the cardinality of the range. As previously stated, results. However,the designer cantakecare toisolate this ratio will not always be visible from a speci(cid:12)ca- highDRRfunctionalityinas fewmodulesaspossible, tion. (An in-depth de(cid:12)nition of the DRR metric can and to make high DRR modules as simple as possi- be found in [16].) After all, there are speci(cid:12)cations ble. Since randomtesting is an ine(cid:11)ective methodfor whose ranges are not known until programs are writ- assuringthe qualityofhighDRRmodules,implemen- tentoimplementthespeci(cid:12)cations. Ifaprogramdoes tors and quality assurance personnel will have to use notcorrectly implementa speci(cid:12)cation, then the pro- other methods to assess these modules. These other gram’sDRR maynot matchthe speci(cid:12)cation’s DRR. methods(suchaspath testingstrategies[18],proofs of This is demonstrated in [16]. correctness[4], and when possible exhaustive testing) DRRsroughly predict a degree of software’stesta- are particularly di(cid:14)cult for large, complex modules. bility. Generallyas the DRRincreases for a speci(cid:12)ca- By isolating high DRR operations in small, straight- tion,the potentialforfewerdatastate errors a(cid:11)ecting forward modules the designer can facilitate e(cid:14)cient software’s output occurring within the implementa- analysis later in the developmentprocess. tionincreases. When (cid:11)is greater than (cid:12), research us- Some operations outlined with a high DRR in a ing PISCES has suggested that faults are more likely speci(cid:12)cation canbedesigned tohaveahigher DRRin toremainundetected(ifanyexist)duringtestingthan the implementation. This is accomplished by having when (cid:11) = (cid:12). a module return more of its internal data state to its users. Thisadvice(cid:13)iesinthe faceofthe commonwis- domthatamoduleshouldasmuchaspossiblehidesits internal workings from other modules[11]. We agree 3 estability and esign that such hiding can enhance portability and reduce interface errors; however, there is a competing inter- Althoughsoftwaretestabilityismostobviouslyrel- est here: increasing testability. In order to increase evantduringtesting,bypayingattentiontotestability the testability of a module, it should reveal as much earlyinthedevelopmentprocess,thetestingphasecan ofitsinternalstateasispractical,sinceinformationin potentially be improved signi(cid:12)cantly. Already at the these states may reveal a fault that will otherwise be design phase, testabilitycan be enhanced. missed during testing. Therefore the designer should, Duringdesign,moregeneralspeci(cid:12)cationsareelab- especially for modulesthat willotherwise havea high oratedanddecomposed. Decompositioneventuallyre- DRR,try to design an interface that includes enough sults in functional descriptions of separate code mod- state informationto increase testability to an accept- ules. As these module descriptions are de(cid:12)ned, the able level. developer can adjust the decomposition to improve the eventual testability when the modules are imple- mented. 4 estability, oding, and nit est Thekeyto predicting testabilityalreadyat the de- sign phase is DRR, the domain/range ratio described When designs are implemented, the DRR again above. When the inputs and outputs of a module de- providesdirectionfordevelopmentthatenhances soft- signarespeci(cid:12)ed,thedesignershouldbeabletogivea ware testability. At the design stage, the process fo- fairlyaccurateassessmentoftheDRRofthatmodule. cuses ontheDRRofmodules;atthecodingstage,the A moduledesign should alreadyinclude a precise def- focus shifts to individual code locations. Single oper- initionof all the legalinputs and outputs that should ations can induce a high DRR; for example, a mod result, and these legal de(cid:12)nitions form the basis of a b, where a >> b, is a high DRR operation. The pro- DRRestimate. However,notalllegalinputs(outputs) grammershould take special care when programming are possible inputs (outputs) when the module is in- these locations. This care should include increased tegrated into the entire system. If the designer can attentionduringcode inspections, smallproofsofcor- givea more precise description of the possible inputs rectness for the block of code in which the locations (outputs), these can form the basis of a better DRR arise, and increased white box testing at boundaries estimate. and special values of the operation itself. As before, when randomblack-boxtesting is unlikelyto uncover closetothattestabilityestimate. Whenthedebugging faults,the programmer must use alternative methods process begins to converge to a deliverable product, to assure quality. it may exhibit a very low but non-zero failure rate. Some locations with a high DRR are obvious from When seeking the location of a fault that could cause the operation. However, more subtle high DRR code this \lowimpact,"the developer can use the PISCES can arise from the interaction of several di(cid:11)erent lo- testability scores to identify likely candidates among cations, perhaps separated by manyinterveningloca- thecodelocationsbeingtested. Inseveralpreliminary tions. Furthermore,alocationorlocationsthatwould experiments(described in[13]),testabilityscores were notnecessarilyhaveahighDRRunderallinputdistri- highlycorrelated with faults at selected locations. butionsmayhavea highDRR under particular input The importance of testability during reliabilityas- distributions. Forthese reasons,visualinspectionsare sessment concerns the con(cid:12)dence of testers that they inadequatetoidentifyallpotentialhighDRRcode lo- havefoundallthefaultsthatexist. Inthepast,quan- cations during coding and unit testing. The PISCES tifyingthat con(cid:12)dence had to rely exclusivelyon ran- software tool, described above, gives automated \ad- dom testing. The more testing, the higher the con- vice" on the testability of code locations. Given an (cid:12)dence that the latest version was fault-free. How- input distribution, PISCES runs a variety of exper- ever,asanincreasingnumberoftests revealednofail- iments that yield a testability estimate for each rel- ures, the predicted reliabilitygoes up proportional to evant location in a module. Because PISCES testa- 1= [8]. Especially when software requires high reli- bility analysis is completely automated, machine re- ability (such as (cid:13)ight software, medical devices, and sources can be used in place of human time in trying other life-critical applications), random testing soon to(cid:12)ndlocationswithlowtestability. BecausePISCES becomes intractable as the exclusive source of infor- execution times are quadratic in the number of loca- mationabout softwarequality. tions, this analysis can be accomplished with much However, testability analysis may allow developers more thoroughness at the module level than during 2 2 2 to obtain much higher con(cid:12)dence in a program using system test (a +b <=(a+b) ). the same amount of testing. The argument is as fol- lows: intraditionalrandomtesting,probabilitydeter- minesthat large-impacterrors are likelyto be discov- 5 estability, ystem est, and elia- ered early in testing, and smaller and smaller impact bility ssessment errors are the only type to survive undetected as the testingcontinues. Itisthe potential\tiny"faultsthat prohibit us from gaining higher con(cid:12)dence at a more During system test, the attention may shift radi- rapid rate as testing continues. callyfromthe mostabstract viewtoanintenselycon- cretefocus,dependingontheoutcomeofsystemtests. But testability analysis o(cid:11)ers a new source of in- As long as system tests uncover no software faults, formation about the likelihoodof such tiny faults ex- the quality assurance e(cid:11)ort concentrates on assessing isting. If wecan write programswith hightestability, the overallqualityofthe deliveredproduct. However, then we can empirically demonstrate that tiny faults when a system test does not deliver the required be- are unlikelyto exist. This quanti(cid:12)ablecon(cid:12)dence can havior, the development sta(cid:11) must locate and repair add to our con(cid:12)dence that testing has uncovered all the underlying fault. Testabilityanalysis can add in- existingfaults(whichare unlikelytobe high-impact). formation that is useful both for assessing the overall In essence, we put a \squeeze play"on errors: we de- qualityand for locating software bugs. signandimplementcode thatisunlikelytohidesmall Debugging software is easiest when a fault causes faults,and then wetest to gaincon(cid:12)dence that larger software to fail often during testing; each failure fur- faults are unlikelyto have survived testing. nishesnewinformationaboutthefault. Thisinforma- Thistechniqueisstillexperimental;wehavenotyet tion (hopefully) helps locate the fault so that it can determined that industrial programs can be written be repaired. The most di(cid:14)cult faults are those that with su(cid:14)ciently high testability to make the squeeze only rarely cause the software to fail. These faults playe(cid:11)ective. However,we think that if testabilityis provide very few clues as to their nature and loca- aconcern throughoutthedevelopmentprocess, highly tion. When a software system has been analyzed for testablecodecanbeproduced,speci(cid:12)callyforthepur- testability using PISCES, each location has a testa- pose ofpassingstrict requirementsforhighreliability. bility estimate; according to that estimate, if a fault Suchcodewouldhavetobedesignedforrelativelysim- existsatthatlocation,itislikelytocauseafailurerate ple functions and straightforward code. Interestingly, (forsomewhatdi(cid:11)erentreasons)othershavesuggested pd . This pd is conditioned on the assumed input that this kind of code may be the wave of the future distribution,ontheassumptionthattheprogramcon- for critical software [10]. tains exactly one fault, and on the assumption that each location is equally likelyto contain that fault. 5.1 Appl in I to robabilit o ail- We have marked interval estimates for each esti- ure sti ation mated pd . If the intervalmarked by (cid:18)^includes 90 of the area under the estimated pd in Figure 1(A), Bothrandomblack-boxtesting and PIE gather in- then according to random testing the actual proba- formation about possible probability of failure values bility of failure is somewhere to the left of (cid:18)^ with a for a program. However, the twotechniques generate con(cid:12)dence of 90 . Similarly, if the interval in Fig- information in distinct ways: random testing treats ure1(B)includes10 oftheareaundertheestimated theprogramas asinglemonolithicblack-boxbut PIE pd ,thenaccordingtoPIEifthereexistsa fault,then examinesthesourcecodelocationbylocation;random itwillinduceaprobabilityoffailurethatissomewhere testingrequires anoracletodeterminecorrectness but to the right of ^ with con(cid:12)dence of 90 . PIE requires no oracle because it does not judge cor- The probabilityof failureof0is a special case that rectness; randomtesting includes analysis of the pos- complicatestheinterpretationofthepd estimatedby sibilityofnofaultsbutPIEfocuses ontheassumption theresultsofPIE.Ifthere existsafaultanditinduces that one fault exists. Thus, the two techniques give anear-zero probabilityoffailure,testingisunlikelyto independent predictions about the probabilityof fail- (cid:12)nd that error. ocations that have PIE estimates ure. very close to zero are troubling in an ultra-reliable Althoughthe true probabilityoffailureofapartic- application. However, a fault that induces a po of ularprogram(conditionedonaninputdistribution)is 0 is not technically a fault at all { no failures will be a single (cid:12)xed value, this value is unknown to us. We observed with such a fault. If there are no faults in a therefore treat the probability of failure as a random program,then the true probabilityof failureis 0 (i.e., variable . We then use black-box random testing (cid:18) =0),andultra-reliabilityhasbeen achieved. Wedo to estimate a probability density function (pd ) for notexpect thistobe thecaseinrealisticsoftware,but conditioned on an input distribution. We also esti- our analysis cannot rule out its possibility. mateapd for usingtheresultofPIE;thisestimate Figure 1(A) suggests that if there is a fault, it is is conditioned on the same input distribution as the likely to induce a small probability of failure; Figure testing pd , but the pd estimated using the results 1(B) suggests that such small impact faults are un- ofPIE isalso conditioned onthe assumptionthat the likely. We now attempt to quantify the meaning of programcontainsexactlyonefault,andthatthisfault the twoestimated pd s taken together. is equallylikelyto be at anylocation in the program. Hamlethas derivedan equationto determine what Theassumptionofthissingle, randomlylocated error hecalls\probablecorrectness"[5]. When testshave is a variation on the competent programmer hypoth- been executed and no failures haveoccurred, then: esis [1]. (cid:0) (cid:0) Figures1(A)and1(B) showexamplesoftwopossi- C =Prob((cid:18) )=1 (1 ) (1) bleestimated pd ’s. Foreachhorizontallocation(cid:18), the height of the curve indicates the estimated prob- whereC isprobablecorrectness, (cid:18)isthetruepo ,and 2 ability that the true probability of failure of the pro- 0< 1. gramhas value (cid:18). The curve in Figure 1(A) is an ex- Hamlet’s equation is related to the pd estimated ampleofanestimatedpd derivedfromrandomblack- by testing in Figure 1(A) as follows: for any given , (cid:13) boxtesting;weassumethatthe testinghasuncovered C = ((cid:18)) (cid:18), where ((cid:18)) is the value of the testing no failures. Details about deriving an estimated pd pd at (cid:18). This equation requires a large number of for givenmanyrandomtests are givenin [8]. tests, ,to establishareasonablyhighC fora close The curve in Figure 1(B) is an example of an es- to 0. timated pd for that might be derived from PIE’s It is possible via PIE to predict a minimumprob- results. PIE can be used to estimate at each loca- ability of failure that would be induced by a fault at tion the probability of failure that would be induced a locationin the program. In Figure 1(B) we havela- in the program by a single fault at that location. All beledaparticularvalue^;usingthepd estimatedby theseestimatesare gathered intoahistogram,oneen- amlet calls a measure of probable correctness, but it try for each location estimate. The histogramis then would be called a con dence if the e uations were cast in a smoothed and normalized to produce an estimated traditionalhypothesistest. Figure 1: (A) The mean of the estimated pd curve, (cid:18)^, is an estimate of the probability of failure. (B) ^ is an estimateof the minimumprobabilityof failure using PIE’s results. 1 PIE’s results, we calculate (cid:11) = (cid:13)^ ((cid:18)) (cid:18), where ((cid:18)) ummary and uture esearc givesthe valueofthe PIEpd at(cid:18). (cid:11)isthe probabil- ityaccording to PIE that the true po isgreater than The signi(cid:12)cance of testability is only recently be- ^. We will refer to (cid:11) as our con(cid:12)dence that ^ is the coming recognized in the software engineering com- true minimum failure rate for the program. If PIE’s munity [3]. In this paper we have illustrated how results have predicted ^ as the minimum po and if testability with respect to random black-box testing wehavecon(cid:12)dence (cid:11)that itis the minimum,then we has importance throughout the software development can makethe followingconjecture: life-cycle. Automated testability analysis, such as PI CE , exploits relatively inexpensive CP power to help guide design, coding, and testing. Also, static (cid:0) (cid:0) if Prob((cid:18) ^)=1 (1 ^) analysis of the DRR gives insight early in the speci- (cid:12)cation and design stages. In all these applications, testabilitygivesa new perspective on the relationship and if (((cid:18) =0) or ((cid:18) > ^)) with con(cid:12)dence (cid:11); between software quality and our ability to measure that quality. Future research will focus on expanding the capa- (cid:0) (cid:0) then with con(cid:12)dence (cid:11);Prob((cid:18) =0)=1 (1 ^) : bilitiesofthePI CE tool,empiricallyexploringdif- (2) ferent syntactic and semantic mutations for testabil- This prediction of the Prob((cid:18) = 0) is higher than is ityanalysis, and comparing testabilityusing di(cid:11)erent possible from random black-box tests without the testingstrategies. Weexpect thatsemantic-basedsta- results of PIE. tistical analysis of this sort will become increasingly importantascomputerpowerbecomesincreasinglyaf- of Maryland, Department of Computer Science, fordable and software quality in life-critical software April 1984. becomes an increasing concern. [10] J. D. M . Reduced Operation Software. Soft- ware Engineering otes, July 1991. Ac no le e ents [11] D .P . Designingsoftwarefor ease of Thisworkhas been funded bya ationalResearch extension and contraction. IEEE rans.on Soft- Council ASA- angley Resident Research Associate- ware Engineering, SE-5:128{138,March 1979. ship and ASA Grant AG-1-884. Since collaborat- ingon this paper at ASA- angley Research Center, [12] D. R M. T . The RE- oas has accepted the position of ice President of A Model of Error Detection and its Applica- AdvancedResearch at ReliableSoftwareTechnologies tion. Proceedings of the AC SI S /IEEE Corporationin Arlington, A. nd or shoponSoftware esting,Analysis,and eri cation,July 1988. Ban(cid:11), Canada. [13] J. . M . Applying A Dy- e erences namic Testability Technique To Debugging Cer- tain Classes ofSoftwareFaults, Software uality [1] R A. D M , R J. , ., Toappear. F G. S . Hints on Test Data Selection: Help for the Practicing Programmer. [14] J. . A ynamic ailure odel for Perform- IEEE Computer, 11(4):34{41,April 1978. ing Propagation and Infection Analysis on Com- puter Programs. PhD thesis, College of William [2] R A. D M A. J. O . and Mary in irginia,March 1990. Constraint-Based Automatic Test Data Gener- ation. IEEE rans. on Software Engineering, [15] J. . A DynamicFailure Model for Estimat- 17(9):900{910,September 1991. ing the Impact that a Program ocation has on the Program. In ecture otes in Computer Sci- [3] R.S.F . TestabilityofSoftwareCompo- ence: Proc. of the rd European Software Engi- nents. IEEE ransactions on SoftwareEngineer- neeringConf.,volume550,pages308{331,Milan, ing, SE-17(6):553{564,June 1991. Italy,October 1991.Springer- erlag. [4] D.G . heScienceofProgramming.Springer- [16] J. . Factors That A(cid:11)ect Program Testa- erlag,1981. bilities. In Proc. of the th Paci c orthwest Software ualityConf.,pages235{247,Portland, [5] R G.H .ProbableCorrectness The- OR, October 1991. Paci(cid:12)c orthwest Software ory. InformationProcessing etters,pages17{25, ualityConference, Inc., Beaverton, OR. April 1987. [17] J. . PIE: A Dynamic Failure-Based Tech- [6] B . PE AS-Program Error- nique. IEEE rans. on Software Engineering, ocating Assistant System. IEEE ransactions 18(8), August 1992. on Software Engineering, SE-14(9), September 1988. [18] E J. W . An Empirical Study of the Complexity of Data Flow Testing. Proc. of [7] A M. W.D . Simu- the Second or shop on Software esting, ali- lation odelingand Analysis. McGraw-HillBook dation, and Analysis, pages 188{195, July 1988. Company,1982. Ban(cid:11), Alberta. [8] . M , .M , R. , S.P , D. ,B.M , J. .Estimatingthe Probability of Failure When Testing Reveals o Failures. IEEE rans. on Software Engineering, 18(1):33{44,January1992. [9] J M . A Theory of Error-based Testing. Technical Report TR-1395, niversity