Enhancing Program Dependency Graph Based Clone Detection Using Approximate Subgraph Matching A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE AWARD OF THE DEGREE OF MASTER OF TECHNOLOGY IN COMPUTER SCIENCE AND ENGINEERING SUBMITTED BY C.M. Kamalpriya Roll No. 14203020 UNDER THE SUPERVISION OF Dr. Paramvir Singh (Assistant Professor) DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING DR. B. R. AMBEDKAR NATIONAL INSTITUTE OF TECHNOLOGY JALANDHAR – 144011, PUNJAB (INDIA) JULY, 2016 DR. B. R. AMBEDKAR NATIONAL INSTITUTE OF TECHNOLOGY, JALANDHAR CANDIDATE’S DECLARATION I hereby certify that the work, which is being presented in the dissertation, entitled “Enhancing Program Dependency Graph Based Clone Detection Using Approximate Subgraph Matching” by “C.M. Kamalpriya” in partial fulfillment of requirements for the award of degree of M.Tech. (Computer Science and Engineering) submitted to the Department of Computer Science and Engineering of Dr. B R Ambedkar National Institute of Technology, Jalandhar, is an authentic record of my own work carried out during a period from August, 2015 to July, 2016 under the supervision of Dr. Paramvir Singh, Assistant Professor. The matter presented in this dissertation has not been submitted by me in any other University/Institute for the award of any degree. C.M. Kamalpriya Roll No. 14203020 This is to certify that the above statement made by the candidate is correct and true to the best of my knowledge. Dr. Paramvir Singh (Supervisor) Assistant Professor Department of Computer Science & Engineering Dr. B. R. Ambedkar NIT, Jalandhar The M.Tech (Dissertation) Viva-Voce examination of C.M. Kamalpriya, Roll No. 14203020, has been held on ____________ and accepted. (Signature) (Signature) (Signature) External Examiner Supervisor Head of Department i ACKNOWLEDGEMENTS I would like to hereby express my sincere gratitude to my supervisor Dr. Paramvir Singh, Assistant Professor, Department of Computer Science and Engineering, DR B. R. Ambedkar National Institute of Technology, Jalandhar for the guidance, support and continuous engagement through the learning process of this master thesis. He motivated and encouraged me throughout this work. Without his encouragement and guidance this project would not have materialized. I consider myself extremely fortunate to have a chance to work under his supervision. In spite of his busy schedule, he was always approachable and took his time off to guide me and gave appropriate advice. I also wish to thank whole heartedly all the faculty members of the Department of Computer Science and Engineering for the invaluable knowledge they have imparted on me. I also extend my thanks to the technical and administrative staff of the department for maintaining an excellent working environment. I would like to thank my parents and all my family members for their continuous support and blessings throughout the entire process. I would also like to thank all my batch mates for the useful discussions, constant support and encouragement during whole period of the work. Last but not the least, I would like to thank the almighty GOD for His blessings and for giving me enough strength to accomplish this phase of life. C.M. Kamalpriya ii ABSTRACT Software Maintenance is a one of the paramount processes in Software Engineering. It accounts for over sixty percent of the total effort and cost expended in the overall software engineering process. The main goal of software maintenance is to preserve the value of the developed software over time. During software development, programmers often reuse existing code to build new code. Replicated code fragment that is an exact copy or a modified version of an existing code fragment is called a code clone. Replicating existing code helps in quick software development and enhancement for changed user requirements. So, the occurrence of code clones is inevitable in the development of large software systems. But, cloning occurrences lead to higher cost and effort for software maintenance, increased probability of bug propagation, sloppier system design and increase in system size. Clone detection and removal are therefore, fundamental to efficient and effective software development and maintenance process. There are different types of code clones based on the amount and kind of replication performed. Various tools and techniques have been developed to detect clones from software systems. Each clone detection tool or technique specializes in detection of one or more type of clones. Program Dependency Graph (PDG) based clone detection techniques have a key advantage over other techniques, that they are capable of detecting non-contiguous code clones. This work proposes further enhancement to PDG-based detection in order to identify all possible clone relations from the obtained clone results by applying Approximate Subgraph Matching (ASM). The results of the proposed technique were obtained on three subject software systems. The obtained results are composed of many new subsumed clone relations and exact and approximate clone relations derived from the clone pair results of PDG-based technique. These results indicate that using the proposed approach, a large number of new clone relations can be identified from the clone pairs obtained by PDG-based detection. The results have been manually validated for each subject system. This work also presents a novel approach using ASM to identify different node-to-node mappings between code fragments of each detected clone relation and proposes a new ASM-based distance measure to quantify their similarity. iii TABLE OF CONTENTS CANDIDATE’S DECLARATION i ACKNOWLEDGEMENTS ii ABSTRACT iii LIST OF PUBLICATIONS vii LIST OF FIGURES viii LIST OF TABLES x LIST OF ABBREVIATIONS xi CHAPTER 1 INTRODUCTION 1.1 Basic Types of Clones 1 1.2 Causes of Software Cloning 2 1.3 Advantages of Software Clones 2 1.4 Disadvantages of Software Clones 3 1.5 Clone Detection Process 3 1.5.1 Intermediate Source Code Representation 5 1.6 An Overview of Clone Detection Tools and Techniques 6 1.7 Overview of Program Dependency Graph (PDG) Based 8 Clone Detection 1.8 Example Study of Approximate Subgraph Matching (ASM) 8 Algorithm 1.9 Research Motivation 10 1.10 Research Objectives 11 1.11 Thesis Organization 12 iv CHAPTER 2 RELATED WORK 2.1 Related Surveys 13 2.2 Clone Detection Tools and Techniques 14 2.2.1 Text Based Clone Detection 14 2.2.2 Token Based Clone Detection 15 2.2.3 Tree Based Clone Detection 16 2.2.4 Graph Based Clone Detection 17 2.2.5 Model Based Clone Detection 18 2.2.6 Metrics Based Clone Detection 19 2.2.7 Hybrid Clone Detection 20 2.3 Program Dependency Graph (PDG) Based Clone Detection 20 2.4 Subgraph Matching Techniques 21 2.5 Chapter Summary 23 CHAPTER 3 PROPOSED APPROACH 3.1 Algorithm Steps Outlined 24 3.2 Proposed Algorithm 26 3.3 Approximate Subgraph Matching (ASM) Algorithm 30 and Modifications Performed in ASM Source Code 3.4 Chapter Summary 32 CHAPTER 4 EXPERIMENTAL DESIGN AND METHODOLOGY 4.1 Platform and Computing Environment 33 4.1.1 Eclipse Platform 33 4.1.2 Java Programming Language 33 v 4.1.3 Computing Environment 34 4.2 Scorpio Tool 34 4.2.1 Working of Scorpio 34 4.2.2 Flowcharts of Scorpio’s Working 35 4.2.3 Scorpio Output 35 4.2.4 PDG Visualization of Scorpio Tool 40 4.3 Subject Systems 41 4.4 Generic Methodology 42 4.5 Output Format 42 4.6 Output Validation Method 45 4.7 Chapter Summary 45 CHAPTER 5 RESULTS AND DISCUSSION 5.1 Clone Relations Obtained in Subject Systems 46 5.2 Comparison of Results with Scorpio 51 5.3 Validation of Obtained Results 52 5.3.1 Results on Test System of Scorpio Tool 53 5.3.2 Results on EIRC 59 5.3.3 Results on Eclipse-ant System 61 5.4 Results Summary 63 5.5 Chapter Summary 63 CHAPTER 6 CONCLUSIONS AND FUTURE WORK 6.1 Conclusions 64 6.2 Future Work 65 References 67 vi LIST OF PUBLICATIONS 1. C.M. Kamalpriya and Paramvir Singh, “Enhancing Program Dependency Graph Based Clone Detection using Approximate Subgraph Matching”, 23rd Asia-Pacific Software Engineering Conference (APSEC 2016) (Communicated). vii LIST OF FIGURES Figure 1.1 Clone Detection Process 5 Figure 1.2 Approximate Subgraph Matching (G ,G ) 9 1 2 Figure 4.1 Snapshot of Eclipse Platform 34 Figure 4.2 Scorpio – Invoke Slicing on Similar Node Pair 36 Figure 4.3 Scorpio – Perform Slicing on Input Node Pair 37 Figure 4.4 Output Snapshot of Scorpio Tool 38 Figure 4.5 Cloned Code Fragments Detected by Scorpio 39 from its Test System Figure 4.6 Snapshot of Output Clone Pair File Generated 39 by Scorpio for its Test System Figure 4.7 Example of Scorpio Generated PDG 40 Figure 4.8 Flow Diagram of Proposed Clone Detection Method 43 Figure 4.9 Representation of Output Format 44 Figure 5.1 Graphical Representation of Results on Test System 48 of Scorpio Tool Figure 5.2 Graphical Representation of Results on EIRC 50 System Figure 5.3 Graphical Representation of Results on Eclipse-ant 51 System Figure 5.4 Graphical Representation of Comparison of Results 52 With Scorpio Figure 5.5 Output Snapshot 1 – ASM Relation from 53 Test System of Scorpio (distance < 0.5) Figure 5.6 Cloned Code Fragments for Output Snapshot 1 55 viii Figure 5.7 Output Snapshot 2 – Clone Relation from 56 Test System of Scorpio (distance = 0.0) Figure 5.8 Cloned Code Fragments for Output Snapshot 2 57 Figure 5.9 Output Snapshot 3 – ASM Relation from 58 Test System of Scorpio (distance > 0.5) Figure 5.10 Cloned Code Fragments for Output Snapshot 3 59 Figure 5.11 Output Snapshot 4 - Clone Relation from 60 EIRC System (distance = 0.0) Figure 5.12 Cloned Code Fragments for Output Snapshot 4 60 Figure 5.13 Output Snapshot 5 - Clone Relation from 61 Eclipse-ant System (distance = 0.0) Figure 5.14 Cloned Code Fragments for Output Snapshot 5 62 ix
Description: