Analyzing GitHub as a Collaborative Software Development Platform: A Systematic Review by Arturo Reyes L´opez B.Sc., Universidad Veracruzana, 2004 A Master’s Project Submitted in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in the Department of Computer Science (cid:13)c Arturo Reyes L´opez, 2017 University of Victoria All rights reserved. This project may not be reproduced in whole or in part, by photocopying or other means, without the permission of the author. ii Analyzing GitHub as a Collaborative Software Development Platform: A Systematic Review by Arturo Reyes L´opez B.Sc., Universidad Veracruzana, 2004 Supervisory Committee Dr. Daniel M German, Supervisor (Department of Computer Science) Dr. Bruce Kapron, Departmental Member (Department of Computer Science) iii Supervisory Committee Dr. Daniel M German, Supervisor (Department of Computer Science) Dr. Bruce Kapron, Departmental Member (Department of Computer Science) ABSTRACT GitHub is a popular social coding site where developers not only host their code and use git functions, but also use social features to communicate, collaborate, and be aware of changes and others’ activities. This new paradigm to code together, and the availability of data have given rise to much research studying collaboration from different angles. However, the vast accumulated knowledge about GitHub tends to be scattered and fragmented. The goal of this study is to collect the available research on GitHub that is focused on identifying the impact of GitHub in software development. The design of the study includes two sections. First, a systematic search in 7 electronic digital libraries was conducted using a defined search protocol, which included a keyword string and exclusion/inclusion criteria. Second, the extraction of data from each publication and manual coding was conducted to define categories of knowledge based on research questions and findings. The study results show a growing trend in research with an increase in mixed methodology. The preferred data sources for empirical studies about GitHub are the GitHubAPIandGHTorrentin72.57%ofpublications. Thestudyrevealsthatagroup made of 30 researchers publish 45.86% of total research. The research in NorthAmer- ica represents 26% of publications. The research on GitHub is focused on the eval- uation of pull requests and use of issues(30.77%), popular projects characteristics (20.88%), collaboration and transparency (15.38%), developers’ roles (9.89%), influ- ence of popular developers (8.79%), quick-start package with guidelines and datasets (8.79%), tools to improve contributions and collaboration (4.40%) and other (1.1%). iv Contents Supervisory Committee ii Abstract iii Table of Contents iv List of Tables vi List of Figures vii Acknowledgements viii Dedication ix 1 Introduction 1 2 Background and Related Work 4 2.1 Version Control Systems . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 SourceForge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 GitHub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3.1 Pull-based Model and Code Review . . . . . . . . . . . . . . . 6 2.3.2 Social Features . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Systematic Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4.1 A Note on Related Work . . . . . . . . . . . . . . . . . . . . . 8 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Methodology 10 3.1 Systematic Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1.1 Systematic Literature Review . . . . . . . . . . . . . . . . . . 10 3.1.2 Systematic Mapping Review . . . . . . . . . . . . . . . . . . . 11 v 3.1.3 Systematic Scoping Review . . . . . . . . . . . . . . . . . . . 11 3.1.4 Snowball Method . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Search Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3.1 Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3.2 Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4 Study Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.4.1 Inclusion Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.4.2 Exclusion Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4.3 Grey literature publications . . . . . . . . . . . . . . . . . . . 16 3.4.4 Data Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4.5 Coding Themes . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4 Results 19 4.1 Statistics of research conducted on GitHub . . . . . . . . . . . . . . . 19 4.1.1 Electronic libraries search results . . . . . . . . . . . . . . . . 19 4.1.2 Statistics information . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 Topics covered by publications on GitHub research . . . . . . . . . . 29 4.2.1 Classification Description . . . . . . . . . . . . . . . . . . . . . 29 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5 Discussion and Limitations 34 5.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 6 Conclusions 40 A Annotated Bibliography 41 Bibliography 42 vi List of Tables Table 3.1 Digital libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Table 3.2 Excluded papers . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Table 4.1 Number of selected papers, by electronic library . . . . . . . . . 20 Table 4.2 Number of selected papers, by year . . . . . . . . . . . . . . . . 21 Table 4.3 Authors with more published research order by number of publi- cations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Table 4.4 Number of authors by n number of papers . . . . . . . . . . . . 27 Table 4.5 Top-10 most cited papers in GitHub order by number of citations and year . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Table 4.6 Top-7 most active countries order by number of publications . . 28 Table 4.7 Popular conferences for publication of GitHub research . . . . . 28 Table 4.8 GitHub classifications . . . . . . . . . . . . . . . . . . . . . . . . 29 vii List of Figures Figure 4.1 Research statistics by year . . . . . . . . . . . . . . . . . . . . . 22 Figure 4.2 Data sources to mine GitHub . . . . . . . . . . . . . . . . . . . 23 Figure 4.3 Dataset availability for replication . . . . . . . . . . . . . . . . 24 Figure 4.4 Citation statistics . . . . . . . . . . . . . . . . . . . . . . . . . 25 Figure 5.1 GitHub Privacy Statement . . . . . . . . . . . . . . . . . . . . 35 viii ACKNOWLEDGEMENTS I would like to thank: Dr. Daniel German for giving me this invaluable opportunity to study abroad, support and guide me through my studies. I am deeply honored to be under your supervision. Eirini for your friendship, comprehension, support and those constructive conver- sations that helped me to focus when I used to loose the objective and your priceless support, corrections and guidance throughout this research. Wendy for being always ready to help and your attentions with my little baby. Aditi Gupta for your invaluable assistance in defining the methodology and pro- viding corrections to this report. I would like to thank to my family: my wife, Cristina forbeingmyeternalcompanionandwalkthroughthislongjour- ney we decided to take to give a better place to our children when even she was not here. Thanks for your love, comprehension and care. This achievement belongs not only to myself, but also to you. my mother, Angeles for everything, for raising me on your own and teach me honesty, humbleness and being hardworker to achieve whatever I have on mind. my father, Arturo foryourlove,financialsupportandregardlessthedistanceteach me how to persevere regardless how hard is the journey ahead. my sister, Greetcher for your love and care during hardship in the last year. Thank you for being here with us aunty. Finally I would like to thank: Dr. Florence Leclair and Dr. Daniel Warder for providing me the best med- ical treatment and help me to recover my vision that allow me to finish my studies and have a better life. I am deeply in debt with both of you. ix DEDICATION To my little princess, Maria Cristina and my wife, Cristina. Chapter 1 Introduction With over 53 million repositories1, GitHub2 is currently the most popular social cod- ing site. Both open source software communities and commercial companies have been increasingly using GitHub – either public or private repositories – to host their code and manage their development projects. GitHub builds on the features of the git version control system, and offers a friendly web-user interface with embedded workflows and social features which leverage collaboration in software development. GitHub originally became popular with well-known open source projects3, which identified GitHub as the means to increase contribution and collaboration [80]. How- ever, these project communities migrating from other platforms such as SourceForge need to adapt themselves to a new form of collaboration in software development through a new workflow and social features. GitHub provides social features in a style that is similar to other social media sites [115]. Users have a public profile which includes personal information and project and activity information for each developer [10]. Users can also subscribe to event feeds by watching projects or follow popular users [107]; this provides awareness of development activities (e.g. pull requests, issues, comments)[31]. In this social coding environment, developers are able to create social networks [10] and make social and technical inferences [32] that can affect the way developers collaborate. Due to the popularity, social features and availability of data by using either GitHub API4 or GHTorrent [40], researchers have shown interest in mining infor- 1https://github.com/features 2https://github.com/ 3http://rubyonrails.org/ 4https://developer.github.com/v3/
Description: