ebook img

Do Code Clones Matter? PDF

0.34 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Do Code Clones Matter?

Do Code Clones Matter? ElmarJuergens,FlorianDeissenboeck,BenjaminHummel,StefanWagner Institutfu¨rInformatik,TechnischeUniversita¨tMu¨nchen Boltzmannstr. 3,85748Garchingb.Mu¨nchen,Germany {juergens,deissenb,hummelb,wagnerst}@in.tum.de 7 1 Abstract found in cloned code but not fixed in all clone instances, 0 the system is likely to still exhibit the incorrect behavior. 2 Code cloning is not only assumed to inflate mainte- Toillustratethis,Fig.1showsanexample,whereamissing n nancecostsbutalsoconsidereddefect-proneasinconsistent null-checkwasretrofittedinonlyonecloneinstance. a changestocodeduplicatescanleadtounexpectedbehavior. Thispaperpresentstheresultsofalarge-scalecasestudy J Consequently, the identification of duplicated code, clone thatwasundertakentofindout(1)ifclonesarechangedin- 9 1 detection,hasbeenaveryactiveareaofresearchinrecent consistently, (2) if these inconsistencies are introduced in- years. Up to now, however, no substantial investigation of tentionallyand,(3)ifunintentionalinconsistenciescanrep- ] the consequences of code cloning on program correctness resentfaults. Inthiscasestudyweanalyzedthreecommer- E hasbeencarriedout. Toremedythisshortcoming,thispa- cial systems written in C#, one written in Cobol and one S perpresentstheresultsofalarge-scalecasestudythatwas open-source system written in Java. To conduct the study . s undertakentofindoutifinconsistentchangestoclonedcode we developed a novel detection algorithm that enables us c [ canindicatefaults. Fortheanalyzedcommercialandopen todetectinconsistentclones. Wemanuallyinspectedabout sourcesystemswenotonlyfoundthatinconsistentchanges 900clonegroupstohandletheinevitablefalsepositivesand 1 to clones are very frequent but also identified a significant discussed each of the over 700 inconsistent clone groups v number of faults induced by such changes. The clone de- withthedevelopersoftherespectivesystemstodetermine 2 7 tectiontoolusedinthecasestudyimplementsanovelalgo- if the inconsistencies are intentional and if they represent 4 rithmforthedetectionofinconsistentclones. Itisavailable faults. Altogether, around 1800 individual clone group as- 5 asopensourcetoenableotherresearcherstouseitasbasis sessments were manually performed in the course of the 0 forfurtherinvestigations. casestudy. Thestudyleadtotheidentificationof107faults . 1 thathavebeenconfirmedbythesystems’developers. 0 7 1.Clones&correctness ResearchProblem Althoughmostpreviousworkagrees 1 that code cloning poses a problem for software mainte- v: Research in software maintenance has shown that nance, “there is little information available concerning the i many programs contain a significant amount of duplicated impacts of code clones on software quality” [29]. As the X (cloned)code. Suchclonedcodeisconsideredharmfulfor consequences of code cloning on program correctness, in r tworeasons: (1)multiple,possiblyunnecessary,duplicates a particular,arenotfullyunderstoodtoday,itremainsunclear of code increase maintenance costs and, (2) inconsistent how harmful code clones really are. We consider the ab- changes to cloned code can create faults and, hence, lead senceofathoroughunderstandingofcodecloningprecari- toincorrectprogrambehavior[20,29]. Whileclonedetec- ousforsoftwareengineeringresearch,educationandprac- tionhasbeenaveryactiveareaofresearchinrecentyears, tice. uptonow,thereisnothoroughunderstandingofthedegree of harmfulness of code cloning. In fact, some researchers Contribution The contribution of this paper is twofold. evenstartedtodoubttheharmfulnessofcloningatall[17]. First,weextendtheexistingempiricalknowledgebyacase To shed light on the situation, we investigated the ef- study that demonstrates that clones get changed inconsis- fects of code cloning on program correctness. It is impor- tently and that such changes can represent faults. Second, tant to understand, that clones do not directly cause faults wepresentanovelsuffix-treebasedalgorithmforthedetec- but inconsistent changes to clones can lead to unexpected tion of inconsistent clones. In contrast to other algorithms programbehavior. Aparticularlydangeroustypeofchange for the detection of inconsistent clones, our tool suite is to cloned code is the inconsistent bug fix. If a fault was madeavailableforotherresearchersasopensource. Figure1.Missingnullcheckonrightsidecancauseexception(Sysiphus). 2.Termsanddefinitions Forathoroughdiscussionoftheconsequencesofincon- sistentclones,wedefinethatafailureisanincorrectoutput The literature provides a wide variety of different defi- ofasoftwarevisibletotheuserandthatafaultisthecause nitionsofclonesandclonerelatedterms[20,29]. Toavoid ofapotentialfailureinsidethecode. Defectsarethesuper- ambiguity,wedescribethetermsasusedinthispaper. setoffaultsandfailures. Codeisinterpretedasasequenceofunits,whichforex- amplecouldbecharacters,normalizedstatements,orlines. 3.Relatedwork The reason to allow normalization of units at this stage, is thatoftenpiecesofcodeareconsideredequalevendespite A substantial amount of research has been dedicated differences in comments or naming, which can be leveled to code cloning in recent years. The detailed surveys by bythenormalization.Anexactcloneisthena(consecutive) Koschke[20]orRoyandCordy[29]provideacomprehen- substringofthecodethatappearsatleasttwiceinthe(nor- siveoverviewofexistingwork.Sincethispapertargetscon- malized)code. Thusourdefinitionofacloneispurelysyn- sequences of cloning and detection of inconsistent clones, tactical, but catches exactly the idea of copy&paste, while wedetailexistingworkintheseareas. allowingsimplechanges,suchasrenaming,duetonormal- ization. Anexactclonegroupisasetofatleasttwoexact 3.1 Consequencesofcloning clonesthatappearatdifferentpositions. Tocapturethenotionofnon-identicalclones,weroughly Indicationforharmfulnessofcloningformaintainability follow the definitions of a gapped or type 3 clone given or correctness is given by several researchers. Lague etal. in [20,29]. A substring s of the code is called an incon- [24], report inconsistent evolution of a substantial amount sistentclone,ifthereisanothersubstringtofthecodesuch ofclonesinanindustrialtelecommunicationsystem. Mon- that their edit distance is below a given threshold and that denetal.[28]reportahigherrevisionnumberforfileswith t has no significant overlap with s. The edit distance is a clonesthanforfileswithoutina20yearoldlegacysystem, metricthatcountsthenumberofeditoperations(insertion, possiblyindicatinglowermaintainability.In[18],Kimetal. removal,orchangeofasingleunit)neededtotransformone report that a substantial amount of changes to code clones sequenceintotheother.Obviously,thisdefinitionisslightly occur in a coupled fashion, indicating additional mainte- vague,asitdependsonthethresholdchosenandthemean- nanceeffortduetomultiplechangelocations. ingofa“significantoverlap”. However,itcapturesourin- Lietal.[26]presentanapproachtodetectbugsbasedon tuitiveunderstandingofaninconsistentcloneasusedinthis inconsistentrenamingofidentifiersbetweenclones. Jiang, paper. ExamplesareshowninFigs.1and7. Byclonewe SuandChiu[13]analyzedifferentcontextsofclones,such denotebothexactandinconsistentclones. asmissingif statements. Bothpapersreportthesuccessful A clone group can be viewed as a connected graph, discovery of bugs in released software. In [1] and [2], in- where each node is a substring, and edges are drawn be- dividualcasesofbugsorinconsistentbugfixesdiscovered tween substrings that are clones of each other. If at least byanalysisofcloneevolutionarereportedforopensource onepairofinconsistentclonesisinthegroup,itiscalledan software. inconsistent clone group. We could also have required all In contrast, doubt that consequences of cloning are un- clonesinaclonegrouptobeclonesofeachother,butoften ambiguously harmful is raised by several recent research theseslightlylargerclonegroupscreatedbyourdefinition results. Krinke[23]reportsthatonlyhalftheclonesinsev- revealinterestingrelationshipsinthecode. eralopensourcesystemsevolvedconsistentlyandthatonly a small fraction of inconsistent clones becomes consistent Abstract Syntax Tree Baxter etal. [3] hash subtrees into again through later changes, potentially indicating a larger bucketsandperformpairwisecomparisonofsubtreesinthe degree of independence of clones than hitherto believed. same bucket. Jiang etal. [12] propose the generation of Geigeretal.[10]reportthatarelationbetweenchangecou- characteristicvectorsforsubtrees. Insteadofpairwisecom- plingsandcodeclonescould,contrarytoexpectations,not parison, they employ locality sensitive hashing for vector be statistically verified. Lozano and Wermelinger [27] re- clustering, allowing for better scalability than [3]. In [8], port that no systematic relationship between code cloning tree patterns that provide structural abstraction of subtrees andchangeabilitycouldbeestablished. aregeneratedtoidentifyclonedcode. Theeffectofcloningonmaintainabilityandcorrectness Program Dependence Graph Krinke [22] proposes isthusnotclear. Furthermore,theabovelistedpublications a search algorithm for similar subgraph identification. suffer from one or more shortcomings that limit the trans- Komondoor and Horwitz [19] propose slicing to identify ferabilityofthereportedfindings. isomorphicPDGsubgraphs. Gabel,JiangandSu[9]usea modifiedslicingapproachtoreducethegraphisomorphism • Insteadofmanualinspectionoftheactualinconsistent problemtotreesimilarity. clones to evaluate consequences for maintenance and The existing approaches provided valuable inspiration correctness,indirectmeasures1 areused[1,10,23,24, for the algorithm presented in this paper. However, none 27,28]. Suchapproachesareinherentlyinaccurateand ofthemwasapplicabletoourcasestudy,foroneormoreof caneasilyleadtomisleadingresults. Forexample,un- thefollowingreasons. intentional differences and faults, while unknown to developers, exhibit the same evolution pattern as in- • Tree[3,8,12]andgraph[9,19,22]basedapproachesre- tentional independent evolution and are thus prone to quiretheavailabilityofsuitablecontextfreegrammars misclassification. forASTorPDGconstruction.Whilefeasibleformod- ern languages such as Java, this poses a severe prob- • The analyzed systems are too small to be represen- lemforlegacylanguagessuchasCobolorPL/I,where tative [18] or omit analysis of industrial software suitablegrammarsarenotavailable. Parsingsuchlan- [1,2,10,18,23,27]. guagesstillrepresentsasignificantchallenge[6,25]. • The analyses specifically focus on faults introduced • Due to the information loss incurred by the reduc- duringcreation[13,26]orevolution[2]ofclones,in- tionofvariablesizecodefragmentstofinite-sizenum- hibitingquantificationofinconsistenciesingeneral. bersorvectors,theeditdistancebetweeninconsistent Additional empirical research outside these limitations clones cannot be precisely controlled in feature vec- is required to better understand consequences of cloning tor[12]andhashingbased[3]approaches. [20,29],aspresentedinthispaper: Developerratingofthe • Idiosyncrasies of some approaches threaten recall. In actualinconsistentcloneshasbeenperformed,thestudyob- [32], inconsistent clones cannot be detected if their jectsarebothopensourceandindustrialsystemsandincon- constituent exact clones are not long enough. In [9], sistencieshavebeenanalyzedindependentlyoftheirmode inconsistenciesmightnotbedetectediftheyadddata ofcreation. orcontroldependencies,asnotedbytheauthors. 3.2 Detectionofinconsistentclones • Scalability to industrial-size software of some ap- proaches has been shown to be infeasible [19,22] or We classify existing approaches according to the pro- isatleaststillunclear[8,30]. gramrepresentationonwhichtheyoperate. • Formostapproaches,implementationsarenotpublicly Text Normalized code fragments are compared textually available. in a pairwise fashion [30]. A similarity threshold governs whethertextfragmentsareconsideredasclones. In contrast, the approach presented in this paper sup- Token Ueda etal. [32] propose post-processing of the re- ports both modern and legacy languages including Cobol sults of a token-based detection of exact clones. Essen- andPL/I,allowsforprecisecontrolofsimilarityintermsof tially, neighboring exact clones are composed into incon- editdistanceonprogramstatements,issufficientlyscalable sistentclones. In[26], Lietal. presentthetoolCP-Miner, toanalyzeindustrial-sizeprojectsinreasonabletimeandis whichsearchesforsimilarbasicblocksusingfrequentsub- availableforusebyothersasopensourcesoftware. sequenceminingandthencombinesbasicblockclonesinto An approach similar to [32] for bug detection has been largerclones. outlinedbytheauthorsofthispaperin[16]. Incontrastto 1Examplesarechangecouplingortheratiobetweenconsistentandin- thiswork,itdoesnotuseasuffixtreebasedalgorithmand consistentevolutionofclones noempiricalstudywasperformed. 4.2.Detectionalgorithm The task of the detection algorithm is to find clones in thestreamofunitsprovidedbythenormalizer. Stateddif- ferently,wewanttofindcommonsubstringsinthesequence formedbyallunitsofthestream,wherecommonsubstrings arenotrequiredtobeexactlyidentical(afternormalization), Figure2.Theclonedetectionpipelineused butmayhaveaneditdistanceboundedbysomethreshold. Thisproblemisrelatedtotheapproximatestringmatching problem [14,33], which is also investigated extensively in bioinformatics[31]. Themaindifferenceisthatwearenot 4.Detectinginconsistentclones interestedinfindinganapproximationofonlyasinglegiven wordinthestring, butratherarelookingforallsubstrings approximately occurring more than once in the entire se- Thissectionexplainstheapproachusedfordetectingin- quence. consistent clones in large amounts of code. Our approach A sketch of our detection algorithm is shown in Figs. 3 works on the token level, which usually is sufficient for and4. Thealgorithmisaneditdistancebasedtraversalof findingcopy-pastedcode,whileatthesametimebeingeffi- a suffix tree of our input sequence. A suffix tree over a cient. Thealgorithmworksbyconstructingasuffixtreeof sequencesisatreewithedgeslabeledbywordssuchthat the code and then for each possible suffix an approximate exactlyallsuffixesofsarefoundbytraversingthetreefrom searchbasedontheeditdistanceinthistreeisperformed. the root node to a leaf and concatenating the words on the Our clone detector is organized as a pipeline, which is edgesencountered. Suchasuffixtreecanbeconstructedin sketched in Figure 2. The files under analysis are loaded linear time by the well-known online algorithm by Ukko- and then fragmented by the scanner, yielding a stream of nen[34]. Usingthissuffixtree,westartasearchforclones tokens, which is filtered to exclude comments and gener- ateverypossibleindex. ated code (recognized by user provided patterns). From Searching for clones is performed by the procedure the token stream, which consist of single keywords, iden- searchwhichrecursivelytraversesthesuffixtree. Thefirst tifiers, operators, and so on, the normalizer reassembles two parameters to this function are the sequence s we are statements. This stage performs normalization, such that working on and the position start where the search was differences in identifier names or constant values are not started, whichisrequiredwhenreportingaclone. Thepa- relevantwhencomparingstatements. Thesequenceformed rameter j (which is the same as start in the first call of bythosestatementsisthenfedintoourclonedetectionalgo- search)marksthecurrentendofthesubstringunderinspec- rithm, whichfindsandreportsclonegroupsinthisstream. tion. Toprolongthissubstring,thesubstringstartingatj is Finally, clone groups are post-processed and uninteresting comparedtothewordwbeingnextinthesuffixtree,which onesarefilteredout. Weoutlinethedetectionstepsinmore istheedgeleadingtothecurrentnodev (fortherootnode detailinthefollowingsubsections. we just use the empty string). For this comparison an edit distanceofatmosteoperations(fifthparameter)isallowed. Forthefirstcallofsearch,eistheeditdistancemaximally 4.1.Preprocessingandnormalization allowed for a clone. If the remaining edit operations are notenoughtomatchtheentireedgewordw(elsecase),we reportthecloneasfaraswefoundit,otherwisethetraver- As stated before, the code is read and split into tokens sal of the tree continues recursively, increasing the length using a scanner. An important task during preprocessing (j−start)ofthecurrentsubstringandreducingthenumber is normalization, which creates statements from the scan- e of edit operations available by the amount of operations ner’s tokens. This is used as it allows better tailoring of alreadyspentinthisstep. normalizationandtoavoidclonesstartingorendingwithin Toactuallymakethisalgorithmworkanditsresultsus- statements. The used normalization eliminates differences able, some details have to be fleshed out. For the com- in naming of identifiers and values of constants or literals, putation of the longest edit distance match we are using butdoesnot,forexample,changeoperationorder. the simple dynamic programming algorithm found in al- gorithm textbooks. While this is easy to implement, it re- Furthertasksofthepreprocessingphasearetheremoval quires quadratic time and space2. To make this step work ofcommentsorgeneratedcode,whichiseitheralreadyex- cludedatthefileleveloronthetokenstreambasedoncer- 2Actuallythealgorithmcanbeimplementedusingonlylinearspace, tainpatternsthatrecognizesectionsofgeneratedcode. butpreservingthefullcalculationmatrixallowsussomesimplifications. procdetect(s,e) 10000 Input: Strings=(s0,...,sn),maxeditdistancee 9000 8000 s 1 ConstructsuffixtreeT froms nd 7000 o 6000 2 foreachi∈{1,...,n}do c se 5000 3 search(s,i,i,root(T),e) n 4000 e i 3000 m Figure3.Outlineofapproximateclonedetec- Ti 2000 1000 tionalgorithm 0 0 1 2 3 4 5 6 System size in MLOC procsearch(s,start,j,v,e) Figure 5. Runtime of inconsistent clone de- Input: Strings=(s ,...,s ), 0 n tectiononEclipsesource startindexofcurrentsearch,currentsearchindexj, nodevofsuffixtreeovers,maxeditdistancee 1 Let(w ,...,w )bethewordalongtheedgeleadingtov 1 m 2 Calculatethemaximallengthl≤m,suchthat 4.3.Post-processingandfiltering thereisak≥jwheretheeditdistancee(cid:48)between (w1,...,wl)and(sj,...,sk)isatmoste Duringandafterdetection,theclonegroupsthatarere- 3 ifl=mthen portedaresubjecttofiltering.Filteringisusuallyperformed 4 foreachchildnodeuofvdo as early as possible, so no memory is wasted with storing 5 search(s,start,k+m,u,e−e(cid:48)) clone groups that are not considered relevant. Using these 6 elseifk−start≥minimalclonelengththen filters, wediscardclonegroupswhoseclonesoverlapwith 7 reportsubstringfromstarttokofsasclone eachotherandgroupswhoseclonesarecontainedinother Figure 4. Search routine of the approximate clonegroups.Additionally,weenforcenotonlyanabsolute clonedetectionalgorithm limit on the number of inconsistencies, but also a relative one,i.e.,wefilterclonegroupswherethenumberofincon- sistenciesintheclonesrelativetotheclone’slengthexceeds acertainamount. Moreover,wemergeclonegroupswhich efficiently we look at most at the first 1000 statements of share a common clone. While this leads to clone groups the word w. As long as the word on the suffix tree edge withnonrelatedclones(asourdefinitionofaninconsistent isshorter, thisisnotaproblem. Incasethereisacloneof cloneisnottransitive),forpracticalpurposesitispreferred more than 1000 statements, we would find it in chunks of toknowoftheseindirectrelationships,too. 1000. We considered this to be tolerable for practical pur- poses. Aseachsuffixwearerunningthesearchonwillof 4.4.Toolsupport coursebepartofthetree,wealsohavetomakesurethatno selfmatchesarereported. To be able to experiment with the detection of incon- Whenrunningthealgorithmasitis,theresultsareoften sistent clones, our algorithms and filters have been imple- not as expected because the search tries to match as many mented as part of CloneDetective3 [15] which is based on statements as possible. However, allowing for edit opera- ConQAT [4]. The result is a highly configurable and ex- tions right at the beginning or at the end of a clone is not tensibleplatformforclonedetectiononthesyntacticlevel. helpful,astheneveryexactclonecanbeprolongedintoan Asourcloningpipelinecouldreuseamajorportionofthe inconsistent clone. Thus in the search we enforce the first CloneDetective code, we consider such an open platform few statements (how many is parameterized) to match ex- essentialforfutureexperiments, asitallowsresearchersto actly. (Thisalsospeedsupthesearch,aswecanchoosethe focus on individual parts of the pipeline. CloneDetective correct child node at the root of the suffix tree in one step also offers a front-end to visualize and assess the clones withoutlookingatallchildren.)Thelaststatementsarealso found, and thus supports the rapid review of a large num- notallowedtodiffer,whichischeckedforandcorrectedjust berofclonegroups. beforereportingaclone. Including all of these optimizations, the algorithm can 4.5.Scalabilityandperformance miss a clone either due to the thresholds (either too short or too many inconsistencies), or if it is covered by other Duetothemanyimplementationdetails, theworstcase clones. The later case is important, as each substring of a complexity is hard to analyze. Additionally, for practical cloneofcourseisacloneagainandweusuallydonotwant thesetobereported. 3AvailableasOpenSourcehttp://www.clonedetective.org purposes,themorecomplicatedaveragecomplexitywould Sysiphus TheopensourcesystemSysiphus5isdeveloped bemore adequate. Thus, and toassesstheperformance of attheTechnischeUniversita¨tMu¨nchen(TUM)butnoneof the entire pipeline we executed the detector on the source the authors of this paper have been involved in the devel- code of Eclipse4, limiting detection to a certain amount of opment. Itconstitutesacollaborationenvironmentfordis- code. OurresultsonanIntelCore2Duo2.4GHzrunning tributed software development projects. The inclusion of Java in a single thread with 3.5 GB of RAM are shown in anopensourcesystemismotivatedbythefactthat, asthe Figure 5. The settings are the same as for the main study clonedetectiontoolisalsofreelyavailable, theresultscan (min clone length of 10, max edit distance of 5). It is ca- be externally replicated6. This is not possible with the de- pabletohandlethe5.6MLOCofEclipseinabout3hours, tailedconfidentialresultsofthecommercialsystems. whichisfastenoughtobeexecutedwithinanightlybuild. Table1.Summaryoftheanalyzedsystems 5.Studydescription System Organization Language Age Size (years) (kLOC) A MunichRe C# 6 317 Inordertogainasolidinsightintotheeffectsofincon- B MunichRe C# 4 454 sistent clones, we use a study design with 5 objects and 3 C MunichRe C# 2 495 researchquestionsthatguidetheinvestigation. D LV1871 Cobol 17 197 Sysiphus TUM Java 8 281 5.1.Studyobjects We chose 2 companies and 1 open source project as 5.2.Researchquestions sources of software systems. This resulted in 5 analyzed projectsintotal. Wechosesystemswrittenindifferentlan- Theunderlyingproblemthatweanalyzeareclonesand guages,bydifferentteamsindifferentcompaniesandwith especiallytheirinconsistencies. Inordertoinvestigatethis differentfunctionalitiestoincreasethetransferabilityofthe question,weanswerthefollowing3moredetailedresearch study results. These objects included 3 systems written in questions. C#,aJavasystemaswellasalong-livedCobolsystem. All thesesystemsarealreadyinproduction. Fornon-disclosure reasonswegavethecommercialsystemsnamesfromAto RQ1 Arecloneschangedinconsistently? D.AnoverviewisshowninTable1. Thefirstquestionweneedtoansweriswhetherinconsistent clones appear at all in real-world systems. This not only Munich Re Group The Munich Re Group is one of the meanswhetherwecanfindthematallbutalsowhetherthey largest re-insurance companies in the world and employs constituteasignificantpartofthetotalclonesofasystem. morethan37,000peopleinover50locations. Fortheirin- Itdoesnotmakesensetoanalyzeinconsistentclonesifthey surance business, they developa variety ofindividual sup- areararephenomenon. porting software systems. In our study, we analyzed the systems A, B and C, all written in C#. They were each RQ2 Areinconsistentclonescreatedunintentionally? developed by different organizations and provide substan- tially different functionality, ranging from damage predic- Havingestablishedthatthereareinconsistentclonesinreal tion, over pharmaceutical risk management to credit and systems, we need to analyze whether these inconsistent companystructureadministration. Thesystemssupportbe- clones have been created intentionally or not. It can ob- tween10and150expertuserseach. viously be sensible to change a clone so that it becomes inconsistenttoitscounterpartsbecauseithastoconformto different requirements. However, the important difference LV 1871 The Lebensversicherung von 1871 a.G. is whether the developer is aware of the other clones, i.e. (LV1871)isaMunich-basedlife-insurancecompany. The whethertheinconsistencyisintentional. LV 1871 develops and maintains several custom software systemsformainframesandPCs. Inthisstudy,weanalyze RQ3 Can inconsistent clones be indicators for faults in a mainframe-based contract management system mostly realsystems? writteninCobol(SystemD)employedbyabout150users. 5http://sysiphus.in.tum.de/ 4CoreofEclipseEuroparelease3.3 6http://wwwbroy.in.tum.de/˜ccsm/icse09/ |F|/|IC| is thus a lower bound, as potential faults in inten- tionallyinconsistentclonesarenotconsidered. Using this, we are already able to roughly find the an- swer to RQ 3. As this is our main result from the study, we transform it into a hypothesis. We need to make sure that the fault density in the inconsistencies is higher than inrandomlypickedlinesofsourcecode. Thisleadstothe Figure6.CloneGroupSets hypothesisH: Thefaultdensityintheinconsistenciesishigherthanthe averagefaultdensity. As we do not know the actual fault densities of the an- After establishing these prerequisites, we can determine alyzed systems, we need to resort to average values. The whether the inconsistent clones are actually indicators for spanofavailablenumbersislargebecauseofthehighvari- faults in real systems. If there are inconsistent clones that ation in software systems. Endres and Rombach [7] give have not been created because of different requirements, 0.1–50 faults per kLOC as a typical range. For the fault this implies that at least one of these clones does not con- densityintheinconsistencies, weusethenumberoffaults formtotherequirements. Hence,itconstitutesafault. divided by the logical lines of code of the inconsistencies. Werefrainfromtestingthehypothesisstatisticallybecause 5.3.Studydesign ofthelownumberofdatapointsaswellasthelargerange oftypicaldefectdensities. We answer the research questions with the following studydesign. Inthestudyweanalyzesetsofclonegroups 5.4.Procedure asshowninFig.6.TheoutermostsetareallclonegroupsC inasystem,ICdenotesthesetofinconsistentclonegroups, andUICtheunintentionallyinconsistentclonegroups. The Thetreatmentweusedontheobjectswastheapproach subsetF ofUICconsistsofthoseunintentionallyinconsis- todetectinconsistentclonesasdescribedinsection4. For tentclonegroupsthatindicateafaultintheprogram.Please allsystems,thedetectionwasexecutedbytheresearcherto notethatwedonotdistinguishbetweencreatedandevolved identify consistent and inconsistent clone candidates. On inconsistent clones as for the question of faultiness it does an 1.7 GHz notebook, the detection took between one and notmatterwhentheinconsistencieshavebeenintroduced. twominutesforeachsystem. Thedetectionwasconfigured We use these different clone group sets to design the tonotcrossmethodboundaries,sinceexperimentsshowed studythatanswersourresearchquestions. Theindependent that inconsistent clones that cross method boundaries in variablesinthestudyaredevelopmentteam,programming many cases did not capture semantically meaningful con- language, functionaldomain, ageandsize. Thedependent cepts. Thisisalsonotedforexactclonesin[21]andiseven variables for the research questions are explained below. more pronounced for inconsistent clones. Since in Cobol RQ1investigatestheexistenceofinconsistentclonesinre- sections in the procedural division are the counterpart of alistic systems. Hence, we need to analyze the size of set JavaorC#methods,clonedetectionforCobolwaslimited IC with respect to the size of set C. We apply our incon- tothese. sistent clone analysis approach to all the systems, perform FortheC#andJavasystems, thealgorithmwasparam- manualassessmentofthedetectedclonestoeliminatefalse eterized to use 10 statements as minimal clone length, a positivesandcalculatetheinconsistentcloneratio|IC|/|C|. maximum edit distance of 5, a maximal inconsistency ra- ForRQ2,whetherclonesarecreatedunintentionally,we tio (i.e., the ratio of edit distance and clone length) of 0.2 thencomparethesizeofthesetsUIC andIC.Thesetsare and the constraint that the first 2 statements of two clones established by showing each identified inconsistent clone needtobeequal. DuetotheverbosityofCobol[6], mini- to developers of the system and asking them to rate them malclonelengthandmaximaleditdistanceweredoubledto as intentional or unintentional. This gives us the uninten- 20and10,respectively. Generatedcodethatisnotsubject tionally inconsistent clone ratio |UIC|/|IC|. The most im- tomanualeditingwasexcludedfromclonedetection,since portant question we aim to answer is whether inconsistent inconsistent manual updates obviously cannot occur. Nor- clones indicate faults (RQ 3). Hence, we are interested in malization of identifiers and constants was tailored as ap- the size of set F in relation to the size of IC. The set F propriatefortheanalyzedlanguage, toallowforrenaming is again determined byasking developers of the respective ofidentifierswhileatthesametimeavoidingtoolargefalse system. Theirexpertopinionclassifiestheclonesinfaulty positiverates. Thesesettingsweredeterminedtorepresent and non-faulty. We only analyze unintentionally inconsis- the best compromise between precision and recall during tent clones for faults. Our faulty inconsistent clone ratio cursory experiments on the analyzed systems, for which Table2.Summaryofthestudyresults Project A B C D Sysiphus Sum Mean Precisionexactclonegroups 0.88 1.00 0.96 1.00 0.98 — 0.96 Precisioninconsistentclonegroups 0.61 0.86 0.80 1.00 0.87 — 0.83 Clonegroups|C| 286 160 326 352 303 1427 — Inconsistentclonegroups|IC| 159 89 179 151 146 724 — Unintentionallyinconsistentclonegroups|UIC| 51 29 66 15 42 203 — Faultyclonegroups|F| 19 18 42 5 23 107 — RQ1|IC|/|C| 0.56 0.56 0.55 0.43 0.48 — 0.52 RQ2|UIC|/|IC| 0.32 0.33 0.37 0.10 0.29 — 0.28 RQ3|F|/|IC| 0.12 0.20 0.23 0.03 0.16 — 0.15 FaultyinUIC|F|/|UIC| 0.37 0.62 0.64 0.33 0.55 — 0.50 Inconsistentlogicallines 442 197 797 1476 459 3371 — FaultdensityinkLOC−1 43 91.4 52.7 3.4 50.1 — 48.1 randomsamplesofthedetectedcloneshavebeenevaluated ues are smaller for inconsistent clone groups than for ex- manually. actclonegroups,aswasexpected,sinceinconsistentclone Thedetectedclonecandidateswerethenmanuallyrated groupsallowformoredeviation. Thehighprecisionresults by the researcher in order to remove false positives, i.e., ofsystemDresultfromtheratherconservativeclonedetec- codefragmentsthat,althoughidentifiedasclonecandidates tion parameters chosen due to the verbosity of Cobol. For by the detection algorithm, have no semantic relationship. systemA,stereotypedatabaseaccesscodeofsemantically Inconsistentandexactclonegroupcandidatesweretreated unrelatedobjectsgaverisetolowerprecisionvalues. differently: all inconsistent clone group candidates were Abouthalfoftheclones(52%)containinconsistencies. rated,producingthesetofinconsistentclonegroups. Since Therefore, RQ 1 can be positively answered: Clones are the exact clones were not required for further steps of the changedinconsistently. Allthesewouldnotbereportedby case study, instead of rating all of them, a random sample existing tools that search for exact matches. From these of25%wasrated,andfalsepositiveratesthenextrapolated inconsistencies over a quarter (28%) has been introduced todeterminethenumberofexactclones. unintentionally. Hence, RQ 2 can also be answered pos- Theinconsistentclonegroupswerethenpresentedtothe itively: Inconsistent clones are created unintentionally in developers of the respective systems in the tool CloneDe- manycases.OnlysystemDisfarlowerhere,withonly10% tective mentioned in Section 4.4, which is able to display of unintentionally inconsistent clones. With about three the commonalities and differences of the clone group in a quartersofintentionalchanges,thisshowsthatcloningand clearlyarrangedway,asdepictedinFigs.1and7. Thede- changingcodeseemstobeafrequentpatternduringdevel- velopers rated whether the clone groups were created in- opmentandmaintenance. tentionallyorunintentionally. Ifaclonegroupwascreated ForRQ3,whetherinconsistentclonesareindicatorsfor unintentionally,thedevelopersalsoclassifieditasfaultyor faults,wenotethatatleast3-23%oftheinconsistenciesac- non-faulty. For the Java and C# systems, all inconsistent tually presented a fault. Again the by far lowest number clone groups were rated by the developers. For the Cobol comes from the Cobol system. Ignoring it, the total ratio system,ratingwaslimitedtoarandomsampleof68outof of faulty inconsistent clones goes up to 18%. This consti- the151inconsistentclonegroups,sincetheageofthesys- tutesasignificantsharethatneedsconsideration. Tojudge temandthefactthattheoriginaldeveloperswerenotavail- hypothesis H, we also calculated the fault densities. They able for rating increased rating effort. Thus, for the Cobol lieintherangeof3.4–91.4faultsperkLOC.Again,system case, the results for RQ 2 and RQ 3 were computed based Disanoutlier. Comparedtoreportedfaultdensitiesinthe on this sample. In cases where intentionality or faultiness rangeof0.1to50faultsandconsideringthefactthatallsys- couldnotbedetermined,e.g.,becausenoneoftheoriginal tems are not only delivered but even have been productive developerscouldbeaccessedforrating,theinconsistencies forseveralyearsweconsiderourresultstosupporthypoth- weretreatedasintentionalandnon-faulty. esis H. On average the inconsistencies contain more faults thanaveragecode. Hence,RQ3canalsobeansweredpos- 6.Results itively: Inconsistent clones can be indicators for faults in realsystems. The quantitative results of our study are summarized in While the numbers are similar for the C# and Java Table2. ExceptfortheCobolsystemD,theprecisionval- projects, rates of unintentional inconsistencies and thus Figure7.DifferentUIbehaviorsincerightsidedoesnotuseoperations(Sysiphus). faultsarecomparativelylowforprojectD,whichisalegacy user forms and dialogs. Category (3) examples we iden- system written in Cobol. To a certain degree, we attribute tified include unnecessary object creation, minor memory this to our conservative assessment strategy of treating in- leaks, performance issues like missing break statements in consistencieswhoseintentionalityandfaultinesscouldnot loopsandredundantre-computationsofcache-ablevalues, beunambiguouslydeterminedasintentionalandnon-faulty. differences in exception handling, different exception and Furthermore, interviewing the current maintainers of the debugmessagesordifferentloglevelsforsimilarcases. Of systemsrevealedthatcloningissuchacommonpatternin the 107 inconsistent clones found, 17 were categorized as Cobol systems, that searching for duplicates of a piece of category(1)faults, 44ascategory(2)faultsand46ascat- code is actually an integral part of their maintenance pro- egory(3)faults. Sinceallanalyzedsystemsareinproduc- cess. Compared to the developers of the other projects, tion, the relatively larger amounts of category (2) and (3) the Cobol developers where thus more aware of clones in faultscoincidewithourexpectations. thesystem. Toaccountforthisdifferencein“cloneaware- ness” we added the row |F|/|UIC| to Table 2, which re- 7.Threatstovalidity vealsthatwhiletheratesofunintentionalchangesarelower forprojectD,theratioofunintentionalchangesleadingtoa Wediscusshowwemitigatedthreatstoconstruct,inter- faultisinthesamerangeforallprojects.Fromourresultsit nalandexternalvalidityofourstudy. seemsthatabouteverysecondtothirdunintentionalchange toacloneleadstoafault. 7.1.Constructvalidity Although not central to our research questions, the de- tectionoffaultsalmostautomaticallyraisesthequestionfor theirseverity. Asthefaulteffectcostsareunknownforthe We did not analyze the development repositories of the analyzed systems, we cannot provide a full-fledged sever- systems in order to determine if the inconsistencies really ityclassification. However,weprovideapartialanswerby havebeenintroducedbyincompletechangestothesystem categorizing the found faults as (1) faults that lead to po- andnotbyrandomsimilaritiesofunrelatedcode. Thishas tential system crash or data loss, (2) unexpected behavior tworeasons:(1)Wewanttoanalyzeallinconsistentclones, visibletotheenduserand(3)unexpectedbehaviornotvis- alsotheonesthathavebeenintroduceddirectlybycopyand ibletotheenduser. Oneexampleforacategory(1)faultis modificationinasinglecommit.Thosemightnotbevisible showninFig1. Here,onecloneoftheaffectedclonegroup in the repository. (2) The industrial systems do not have performsanull-checktopreventanull-pointerdereference completedevelopmenthistories. Weconfrontedthisthreat whereastheotherdoesnot.Otherexamplesweencountered bymanuallyanalyzingeachpotentialinconsistentclone. forcategory(1)faultsareindex-out-of-boundsexceptions, The comparison with average fault probability is not incorrecttransactionhandlingandmissingrollbacks. Fig.7 perfect to determine whether the inconsistencies are really showsanexampleofacategory(2)fault. Inoneclonethe more fault-prone than a random piece of code. A compar- performedoperationisnotencapsulatedinanoperationob- isonwiththeactualfaultdensitiesofthesystemsoractual ject and, hence, is handled differently by the undo mecha- checksforfaultsinrandomcodelineswouldbettersuitthis nism. Furtherexampleswefoundforcategory(2)faultsare purpose. However, the actual fault densities are not avail- incorrectendusermessages, inconsistentdefaultvaluesas abletousbecauseofincompletedefectdatabases.Tocheck well as different editing and validation behavior in similar for faults in random code lines is practically not possible. Wewouldneedthedeveloperstimeandwillingnessforin- lead to faults in a system. The inconsistencies between specting random code. As the potential benefit for the de- clones are often not justified by different requirements but velopersislow,themotivationwouldbelowandhencethe canbeexplainedbydevelopermistakes. resultswouldbeunreliable. WeconsiderofspecialvaluetheanalysisoftheSysiphus project. BecausebothSysiphusandourdetectiontoolsare 7.2.Internalvalidity open source, the whole analysis can completely be repli- catedindependently. Weprovideawebsitewiththeneces- As we ask the developers for their expert opinion on saryinformation7. whetheraninconsistencyisintentionalorunintentionaland Havingestablishedtheempiricalresults,thequestionre- faulty or non-faulty, a threat is that the developers do not mains of how to use this information in order to reduce judgethiscorrectly. Onecaseisthatthedeveloperassesses faultsinsoftwaresystems. Theansweristwofold: (1)pre- somethingasnon-faultywhichactuallyisfaulty. Thiscase vention by less cloning and (2) tools that prevent uninten- only reduces the chances to positively answer the research tionally inconsistent changes of clones. The fewer clones questions.Thesecondcaseisthatthedevelopersratesome- thereareinthesystem,thelesslikelyitistointroducefaults thingasfaultywhichisnofault.Wemitigatedthisthreatby by inconsistencies between them. In order to increase de- only rating an inconsistency as faulty if the developer was veloper awareness of clones, we have integrated our clone completelysure. Otherwiseitwaspostponedandthedevel- detectiontoolintotheVisualStudiodevelopmentenviron- operconsultedcolleaguesthatknowthecorrespondingpart ment8. AttheMunichReGroup,asareactionontheclone ofthecodebetter. Inconclusivecandidateswererankedas results,clonedetectionisnowincludedinthenightlybuilds intentionalandnon-faulty. Hence,againonlythechanceto ofalldiscussedprojects. Furthermore, forexistingclones, answertheresearchquestionpositivelyisreduced. there should be tool support that ensures that all changes Theconfigurationoftheclonedetectiontoolhasastrong thataremadetoaclonearemadeinthefullknowledgeof influenceonthedetectionresults. Wecalibratedtheparam- itsduplicates.ToolssuchasCloneTracker[5]orCReN[11] eters based on a pre-study and our experience with clone provide promising approaches. However, both approaches detectioningeneral. Theconfigurationalsovariesoverthe arenotapplicabletoexistingsoftwarethatalreadycontains differentprogramminglanguagesencountered,duetotheir inconsistent clones. Due to their high fault potential, we differences in features and language constructs. However, considertheabilitytodetectinconsistentclonesanimpor- thisshouldnotstronglyaffectthedetectionofinconsistent tantfeatureofindustrial-strengthclonedetectors. clonesbecausewespentgreatcaretoconfigurethetoolin awaythattheresultingclonesaresensible. 9.Conclusion We also pre-processed the inconsistent clones that we presentedtothedevelopersinordertoeliminatefalseposi- In this paper we provide strong evidence that inconsis- tives. Thiscouldmeanthatweexcludedclonesthatareac- tentclonesconstituteamajorsourceoffaults,whichmeans tuallyfaulty. However,thisagainonlyreducesthechances that cloning can be a substantial problem during develop- thatwecananswerourresearchquestionpositively. ment and maintenance unless special care is taken to find and track existing clones and their evolution. Our results 7.3.Externalvalidity suggest that nearly every second unintentionally inconsis- tent change to a clone leads to a fault. Furthermore, we Theprojectswereobviouslynotsampledrandomlyfrom provide a scalable algorithm for finding such inconsistent allpossiblesoftwaresystemsbutwereliedonourconnec- clones as well as suitable tool support for future experi- tionswiththedevelopersofthesystems. Hence,thesetof ments. systems is not completely representative. The majority of Future work on this topic will evolve in multiple direc- thesystemsiswritteninC#andanalyzing5systemsinto- tions. One obvious development is the refinement of the talisnotahighnumber. However,all5systemshavebeen algorithmsandtoolsused. Thisincludesrefinedheuristics developed by different development organizations and the tospeeduptheclonesearchandperformautomaticassess- C#-systems are technically different (2 web, 1 rich client) menttodiscardobviouslyirrelevantclones. Inaddition,the and provide substantially different functionalities. We fur- usability of the tools could be advanced further to make thermitigatedthisthreatbyalsoanalyzingalegacyCobol their use more efficient for practical applications. More- systemaswellasanopensourceJavasystem. over,itwillbeinterestingtocomparedifferentdetectionpa- rametervalues,algorithmsandtoolsaccordingtotheirper- 8.Discussion formanceandaccuracywhenfindinginconsistentclones. Evenconsideringthethreatstovaliditydiscussedabove, 7http://wwwbroy.in.tum.de/˜ccsm/icse09/ the results of the study show convincingly that clones can 8http://www.codeplex.com/CloneDetectiveVS

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.