ebook img

App Store Analysis for Software Engineering PDF

237 Pages·2017·2.12 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview App Store Analysis for Software Engineering

UNIVERSITY COLLEGE LONDON COMPUTER SCIENCE A THESIS PRESENTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN COMPUTER SCIENCE App Store Analysis for Software Engineering JANUARY 25, 2017 Author: WILLIAM MARTIN Supervisors: Examiners: MARK HARMAN RACHEL HARRISON YUE JIA WALID MAALEJ FEDERICA SARRO 1 I, William Martin confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. Abstract App Store Analysis concerns the mining of data from apps, made possible through app stores. This thesis extracts publicly available data from app stores, in order to detect and analyse relationships between technical attributes, such as software features, and non-technical attributes, such as rating and popularity information. The thesis identifies the App Sampling Problem, its effects and a methodology to ameliorate the problem. The App Sampling Problem is a fundamental sampling issueconcernedwithminingappstores,causedbytheratherlimited‘most-popular- only’ ranked app discovery present in mobile app stores. This thesis provides novel techniques for the analysis of technical and non-technical data from app stores. Topicmodellingisusedasafeatureextractiontechnique,whichisshowntoproduce the same results as n-gram feature extraction, that also enables linking technical features from app descriptions with those in user reviews. Causal impact analysis is applied to app store performance data, leading to the identification of properties of statistically significant releases, and developer-controlled properties which could increase a release’s chance for causal significance. This thesis introduces the Causal Impact Release Analysis tool, CIRA, for performing causal impact analysis on app store data, which makes the aforementioned research possible; combined with the earlier feature extraction technique, this enables the identification of the claimed software features that may have led to significant positive and negative changes after a release. Impact Statement The work in this thesis seeks primarily to answer the question of “what makes suc- cessful apps successful?” through performing empirical analysis of app store data. Thecontributionsworktowardthisgoal,firstestablishingafeatureextractiontech- nique using topic modelling, and identifying the sampling issues present in mining appstoredata(publicationatMSR2015). Thisworklooksattimeseriesdata,iden- tifying the app releases that had the greatest subsequent effect on app success, and analysing common factors (publications at ICSE 2016 and FSE 2016). One of the outcomes of this work is the tool, CIRA, an implementation of causal impact anal- ysis, a time series analysis technique. The secondary goal of this thesis is to help define the growing field of “app store analysis for software engineering”, which it does through a comprehensive review of literature in the field (accepted for publi- cation in the Transactions of Software Engineering journal). Publications The following papers were authored as part of this PhD. First-Author Peer-Reviewed Papers – W. Martin, M. Harman, Y. Jia, F. Sarro, and Y. Zhang, “The app sampling problem for app store mining,” in Proceedings of the 12th IEEE Working Conference on Mining Software Repositories, MSR ’15, 2015, pp. 123–133 Citations at the time of writing: 35 –W.Martin,“Causalimpactforappstoreanalysis,”inCompanionProceedingsofthe 38thInternationalConferenceonSoftwareEngineering,ICSECompanion’16. ACM, 2016 Citations at the time of writing: 3 – W. Martin, F. Sarro, and M. Harman, “Causal impact analysis for app releases in GooglePlay,”inProceedingsofthe201624thACMSIGSOFTInternationalSymposium ontheFoundationsofSoftwareEngineering,FSE’16,2016,toappearCitationsatthe time of writing: 5 – W. Martin, F. Sarro, Y. Jia, Y. Zhang, and M. Harman, “A survey of app store analysis for software engineering,” IEEE Transactions on Software Engineering, 2016 Citations at the time of writing: 20 Technical Reports – W. Martin, F. Sarro, and M. Harman, “Causal impact analysis applied to app re- leases in Google Play and Windows Phone Store,” University College London, Tech. Rep., 2015, rN/15/07 Citations at the time of writing: 6 –W.Martin,F.Sarro,Y.Jia,Y.Zhang,andM.Harman,“Asurveyofappstoreanalysis for software engineering,” University College London, Tech. Rep., 2016, rN/16/02 Citations at the time of writing: 20 Publications 5 Co-Authored Peer-Reviewed Papers In addition to the first-author papers listed above, I have also contributed to the following publications: – F. Sarro, A. Al-Subaihin, M. Harman, Y. Jia, W. Martin, and Y. Zhang, “Feature lifecycles as they spread, migrate, remain and die in app stores,” in Proceedings of the Requirements Engineering Conference, 23rd IEEE International (RE’15). IEEE, 2015 Citations at the time of writing: 15 –A.Al-Subaihin,A.Finkelstein,M.Harman,Y.Jia,W.Martin,F.Sarro,andY.Zhang, “Appstoreminingandanalysis,”inProceedingsofthe3rdInternationalWorkshopon Software Development Lifecycle for Mobile, DeMobile 2015. ACM, 2015, pp. 1–2 Citations at the time of writing: 3 –M.Harman,A.Al-Subaihin,Y.Jia,W.Martin,F.Sarro,andY.Zhang,“Mobileapp and app store analysis, testing and optimisation,” in Proceedings of the International Conference on Mobile Software Engineering and Systems, MOBILESoft ’16. ACM, 2016, pp. 243–244 Citations at the time of writing: 1 Technical Reports – A. Finkelstein, M. Harman, Y. Jia, W. Martin, F. Sarro, and Y. Zhang, “App store analysis: Mining app stores for relationships between customer, business and tech- nical characteristics,” Tech. Rep., 2014, rN/14/10 Citations at the time of writing: 11 Acknowledgements I’d like to send a big thanks to my supervisors Mark, Shin, Yue and Federica, for their guidance and support. A big thankyou to my dedicated proof reading team, who are now experts in App Store Analysis, Lynn, Steve and Aja. Your attention to detail has never wavered. And to Anne, your drive, motivation and passion for science inspire me. To support this PhD, I’m very grateful to have been in receipt of funding from the DAASE project grant. Toeveryonementioned,withouteachofyounoneofthiswouldhavebeenpossible. My heartfelt thanks. Contents Abstract 2 Impact Statement 3 Publications 4 Acknowledgements 6 1 Introduction 12 2 Literature Review 16 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2 Literature Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.2 Search Methodology . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.3 Snowballing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.4 Search Results . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.5 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3 Non-Technical Research . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4 Scale of Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.5 Key Ideas Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.6 API Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.6.1 API Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.6.2 Class Reuse and Inheritance . . . . . . . . . . . . . . . . . . . 31 2.6.3 Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.6.4 Permissions and Security . . . . . . . . . . . . . . . . . . . . . 33 2.7 Feature Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.7.1 Automated Classification . . . . . . . . . . . . . . . . . . . . . 37 Contents 8 2.7.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.7.3 Lifecycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.7.4 Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.7.5 Store Success . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.7.6 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.8 Release Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.8.1 Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.8.2 Success . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.8.3 Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.9 Review Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.9.1 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.9.2 Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.9.3 Requirements Engineering . . . . . . . . . . . . . . . . . . . . 50 2.9.4 Sentiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2.9.5 Summarisation . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.9.6 Surveys and Methodological Aspects of App Store Analysis . . 54 2.10 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.10.1 Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.10.2 Malware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2.10.3 Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.10.4 Plagiarism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.10.5 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.10.6 Vulnerability . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.11 Store Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.11.1 Inter-store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.11.2 Intra-store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 2.11.3 Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . 67 2.11.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 2.12 Size and Effort Prediction . . . . . . . . . . . . . . . . . . . . . . . . 68 2.13 Checklist for Future App Store Analysis Authors . . . . . . . . . . . . 70 2.14 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 2.15 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Contents 9 3 Methodology 73 3.1 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.3 Mining Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.3.1 User Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.3.2 Persistent List Collection . . . . . . . . . . . . . . . . . . . . . 77 3.3.3 Time Series Datasets . . . . . . . . . . . . . . . . . . . . . . . 78 3.4 Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.5 Text Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.6 Topic Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.7 TF.IDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4 App Feature Extraction using Topic Models 83 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.2 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.5 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.6 Application of LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.8 The Problem of Zero Rated Apps . . . . . . . . . . . . . . . . . . . . 90 4.9 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5 The App Sampling Problem for App Store Mining 94 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.3 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.4 The App Sampling Problem . . . . . . . . . . . . . . . . . . . . . . . 97 5.5 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.6 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.7 Topic Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

Description:
70 estimate game apps, using an intermediate representation of required assets and functionality in the Unity3D game engine. In 2015 D'Avanzo et al. [68] applied the COSMIC approach to 8 Google Play apps, and applied linear regression to the functional point size in order to estimate the code size.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.