ebook img

Why and How to Control Cloning in Software Artifacts PDF

215 Pages·2011·3.41 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Why and How to Control Cloning in Software Artifacts

Why and How to Control Cloning in Software Artifacts Elmar Juergens Institut für Informatik der Technischen Universität München Why and How to Control Cloning in Software Artifacts Elmar Juergens Vollständiger Abdruck der von der Fakultät für Informatik der Technischen Universität MünchenzurErlangungdesakademischenGradeseines DoktorsderNaturwissenschaften(Dr.rer.nat.) genehmigtenDissertation. Vorsitzender: Univ.-Prof. BerndBrügge,Ph.D. PrüferderDissertation: 1. Univ.-Prof. Dr. Dr. h.c. ManfredBroy 2. Univ.-Prof. Dr. RainerKoschke UniversitätBremen DieDissertationwurdeam07.10.2010beiderTechnischenUniversitätMüncheneingere- ichtunddurchdieFakultätfürInformatikam19.02.2011angenommen. Abstract The majority of the total life cycle costs of long-lived software arises after its first release, during softwaremaintenance. Cloning,theduplicationofpartsofsoftwareartifacts,hindersmaintenance: it increases size, and thus effort for activities such as inspections and impact analysis. Changes need to be performed to all clones, instead of to a single location only, thus increasing effort. If individual clones are forgotten during a modification, the resulting inconsistencies can threaten programcorrectness. Cloningisthusaqualitydefect. The software engineering community has recognized the negative consequences of cloning over a decade ago. Nevertheless, it abounds in practice—across artifacts, organizations and domains. Cloning thrives, since its control is not part of software engineering practice. We are convinced that this has two principal reasons: first, the significance of cloning is not well understood. We do not know the extent of cloning across different artifact types and the quantitative impact it has onprogramcorrectnessandmaintenanceefforts. Consequently,wedonotknowtheimportanceof clone control. Second, no comprehensive method exists that guides practitioners through tailoring andorganizationalchangemanagementrequiredtoestablishsuccessfulclonecontrol. Lackingboth aquantitativeunderstandingofitsharmfulnessandcomprehensivemethodsforitscontrol,cloning islikelytobeneglectedinpractice. This thesis contributes to both areas. First, we present empirical results on the significance of cloning. Analysis of differences between code clones in productive software revealed over 100 faults. More specifically, every second modification to code that was done in unawareness of its clones caused a fault, demonstrating the impact of code cloning on program correctness. Further- more,analysisofindustrialrequirementsspecificationsandgraph-basedmodelsrevealedsubstantial amountsofcloningintheseartifacts,aswell. Thesizeincreasecausedbycloningaffectsinspection efforts—for one specification, by an estimated 14 person days; for a second one by over 50%. To avoidsuchimpactonprogramcorrectnessandmaintenanceefforts,cloningmustbecontrolled. Second,wepresentacomprehensivemethodforclonecontrol. Itcomprisesdetectortailoringtoim- proveaccuracyofdetectedclones,andassessmenttoquantifytheirimpact. Itguidesorganizational changemanagementtosuccessfullyintegrateclonecontrolintoestablishedmaintenanceprocesses, and root cause analysis to prevent the creation of new clones. To operationalize the method, we presentaclonedetectionworkbenchforcode,requirementsspecificationsandmodelsthatsupports all these steps. We demonstrate the effectiveness of the method—including its tools—through an industrialcasestudy,whereitsuccessfullyreducedcloningintheparticipatingsystem. Finally,weidentifythelimitationsofclonedetectionandcontrol. Throughacontrolledexperiment, we show that clone detection approaches are unsuited to detect behaviorally similar code that has been developed independently and is thus not the result of copy & paste. Its detection remains an importanttopicforfuturework. 3 Acknowledgements IhavespentthelastfouryearsasaresearcherattheLehrstuhlforSoftware&SystemsEngineering at Technische Universität München from Prof. Dr. Dr. h. c. Manfred Broy. I want to express my gratitudetoManfredBroyforthefreedomandresponsibilityIwasgrantedandforhisguidanceand advice. Ihave,andstilldo,enjoyworkinginthechallengingandcompetitiveresearchenvironment he creates. I want to thank Prof. Dr. rer. nat. Rainer Koschke for accepting to co-supervise this thesis. I am grateful for inspiring discussions on software cloning, but also for the hospitality and interest—bothbyhimandhisgroup—thatIexperiencedduringmyvisitinBremen. Myviewofthe socialaspectsofresearch,whichformedinthelarge,thematicallyheterogenousgroupofManfred Broy,wasenrichedbytheglimpseintothesmaller,morefocussedgroupofRainerKoschke. Iamverygratefultomycolleagues. Theirsupport,bothonthescientificandonthepersonallevel, wasvital forthesuccessof thisthesis. Andnot least, for mypersonaldevelopmentduring thelast four years. I am grateful to Silke Müller for schedule magic. To Florian Deissenboeck for being an example worth following and for both his encouragement and outright criticism. To Benjamin Hummelforhismeritandcreativityinproducingideas,andforhisproductivityandeffectivenessin their realization. To Martin Feilkas for his ability to overview and simplify complicated situations and for reliability and trust come what may. To Stefan Wagner for his guidance and example in scientificwritingandempiricalresearch. ToDanielRatiuforthesensitivity,carefulnessanddepth heshowsduringscientificdiscussions(andoutsideofthem). ToLarsHeinemannforbeingthebest colleague I ever shared an office with and for his tolerance exhibited doing so. To Markus Her- rmannsdörfer for his encouragement and pragmatic, uncomplicated way that makes collaboration productiveandfun. ToMarkusPizkaforraisingmyinterestinresearchandforencouragingmeto startmyPhDthesis. Workingwithallofyouwas,andstillis,aprivilege. Research, understanding and idea generation benefit from collaboration. I am grateful for joint paper projects to Sebastian Benz, Michael Conradt, Florian Deissenboeck, Christoph Domann, Martin Feilkas, Jean-François Girard, Nils Göde, Lars Heinemann, Benjamin Hummel, Klaus Lochmann, Benedikt May y Parareda, Michael Pfaehler, Markus Pizka, Daniel Ratiu, Bernhard Schaetz,JonathanStreit,StefanTeuchertandStefanWagner. Inaddition,thisthesisbenefitedfrom the feedback of many. I am thankful for proof-reading drafts to Florian Deissenboeck, Martin Feilkas, Nils Göde, Lars Heinemann, Benjamin Hummel, Klaus Lochmann, Birgit Penzenstadler, DanielRatiuandStefanWagner. AndtoRebeccaTiarksforhelpwiththeBellonBenchmark. Theempiricalpartsofthisworkcouldnothavebeenrealizedwithoutthecontinuoussupportofour industrial partners. I want to thank everybody I worked with at ABB, MAN, LV1871 and Munich ReGroup. IparticularlythankMunichReGroup—especiallyRainerJanßenandRudolfVaas—for thelong-termcollaborationwithourgroupthatsubstantiallysupportedthisdissertation. Mostofall,Iwanttothankmyfamilyfortheirunconditionalsupport(bothmaterialandimmaterial) notonlyduringmydissertation,butduringallofmyeducation. Iamdeeplygratefultomyparents, mybrotherand,aboveall,mywifeSofie. 5 »Aman’sgottadowhataman’sgottado« FredMacMurrayinTheRainsofRanchipur »Aman’sgottadowhataman’sgottado« GaryCooperinHighNoon »Aman’sgottadowhataman’sgottado« GeorgeJetsoninTheJetsons »Aman’sgottadowhataman’sgottado« JohnCleeseinMontyPython’sGuidetoLife Contents 1 Introduction 13 1.1 ProblemStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3 Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2 Fundamentals 19 2.1 NotionsofRedundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2 SoftwareCloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3 NotionsofProgramSimilarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4 TermsandDefinitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.5 CloneMetrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.6 Data-flowModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.7 CaseStudyPartners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3 State of the Art 37 3.1 ImpactonProgramCorrectness. . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2 ExtentofCloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3 CloneDetectionApproaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4 CloneAssessmentandManagement . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.5 LimitationsofCloneDetection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4 Impact on Program Correctness 53 4.1 ResearchQuestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 StudyDesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3 StudyObjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.4 ImplementationandExecution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.5 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.7 ThreatstoValidity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5 Cloning Beyond Code 63 5.1 ResearchQuestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.2 StudyDesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.3 StudyObjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.4 ImplementationandExecution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 9

Description:
We analyzed third party open source software, including, e.g., WebKit, Subversion, [51] F. Deissenboeck, U. Hermann, E. Juergens, and T. Seifert.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.