Abstract
Mobile app reviews by users contain a wealth of information on the issues that users are experiencing. For example, a review might contain a feature request, a bug report, and/or a privacy complaint. Developers, users and app store owners (e.g. Apple, Blackberry, Google, Microsoft) can benefit from a better understanding of these issues – developers can better understand users’ concerns, app store owners can spot anomalous apps, and users can compare similar apps to decide which ones to download or purchase. However, user reviews are not labelled, e.g. we do not know which types of issues are raised in a review. Hence, one must sift through potentially thousands of reviews with slang and abbreviations to understand the various types of issues. Moreover, the unstructured and informal nature of reviews complicates the automated labelling of such reviews. In this paper, we study the multi-labelled nature of reviews from 20 mobile apps in the Google Play Store and Apple App Store. We find that up to 30 % of the reviews raise various types of issues in a single review (e.g. a review might contain a feature request and a bug report). We then propose an approach that can automatically assign multiple labels to reviews based on the raised issues with a precision of 66 % and recall of 65 %. Finally, we apply our approach to address three proof-of-concept analytics use case scenarios: (i) we compare competing apps to assist developers and users, (ii) we provide an overview of 601,221 reviews from 12,000 apps in the Google Play Store to assist app store owners and developers and (iii) we detect anomalous apps in the Google Play Store to assist app store owners and users.
Similar content being viewed by others
References
Abubakar AM, Jawawi DN (2013) A study on code peer review process monitoring using statistical process control. In: e-Proceeding of Software Engineering Postgraduates Workshop (SEPoW), p 136
Adobe (2014) Mobile analytics. Available: http://www.adobe.com/ca/solutions/digital-analytics/mobile-analytics.html
Ahsan SN, Ferzund J, Wotawa F (2009) Automatic classification of software change request using multi-label machine learning methods. In: Software Engineering Workshop (SEW), 2009 33rd Annual IEEE:79–86
Annie A (2014) App annie. Available: http://www.appannie.com/app-store-analytics/
Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement?: a text-based approach to classify change requests. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, ser. CASCON ACM, pp 23:304–23:318
Akdeniz (2013) Google play crawler. In: Available: https://github.com/Akdeniz/google-play-crawler
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Technol 3:993–1022
Bostic K (2013) Google play takes top spot in downloads, but apple’s app store still tops revenue. Available: http://appleinsider.com/articles/13/07/31/google-play-takes-top-spot-in-downloads-but-apples-app-store-still-tops-revenue http://appleinsider.com/articles/13/07/31/google-play-takes-top-spot-in-downloads-but-apples-app-store-still-tops-revenue
Brameier M, Banzhaf W (2001) A comparison of linear genetic programming and neural networks in medical data mining. Evol Comput IEEE Trans 5(1):17–26
Butler M (2011) Android: Changing the mobile landscape. Pervasive Comput IEEE 10(1):4–7
Distimo (2013) Google play store, united states, top overall, free, week 35 2013. Available: http://www.distimo.com/leaderboards/google-play-store/united-states/top-overall/free
Esbensen KH, Guyot D, Westad F, Houmoller LP (2002) Multivariate data analysis: in practice: an introduction to multivariate data analysis and experimental design. Multivariate Data Analysis
Fan R-E, Lin C-J (2007) A study on threshold selection for multi-label classification. In: Department of Computer Science, National Taiwan University
Flurry (2014) Flurry. Available: http://www.flurry.com/solutions/analytics
Fu B, Lin J, Li L, Faloutsos C, Hong J, Sadeh N (2013) Why people hate your app: Making sense of user feedback in a mobile app store. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’13. New York, NY, USA: ACM, pp 1276–1284. Available: doi:10.1145/2487575.2488202
Galvis Carreño LV, Winbladh K (2013) Analysis of user comments: an approach for software requirements evolution. In: Proceedings of the 2013 International Conference on Software Engineering, ser. ICSE ’13. Piscataway, NJ, USA: IEEE Press, pp 582–591
Ganesan K, Zhai C, Viegas E (2012) Micropinion generation: an unsupervised approach to generating ultra-concise summaries of opinions. In: Proceedings of the 21st international conference on World Wide Web, ser. WWW ’12. New York, NY, USA: ACM, pp 869–878
Ghaith S, Wang M, Perry P, Murphy J (2013) Profile-based, load-independent anomaly detection and analysis in performance regression testing of software systems. In: 17th European Conference on IEEE Software Maintenance and Reengineering (CSMR), 2013, pp 379–383
Google (2013) Google play developer program policies. In: Available: https://play.google.com/about/developer-content-policy.html
Google (2013) Core app quality guidelines. In: Available: http://developer.android.com/distribute/googleplay/quality/core.html
Google (2014) Google analytics. Available: http://www.google.ca/analytics/mobile/
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009a) The weka data mining software: an update. SIGKDD Explor Newsl 11 (1):10–18. Available: doi:10.1145/1656274.1656278
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009b) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11 (1):10–18
Han D, Zhang C, Fan X, Hindle A, Wong K, Stroulia E (2012) Understanding android fragmentation with topic analysis of vendor-specific bugs. In: 19th Working Conference on Reverse Engineering (WCRE), 2012. IEEE, pp 83–92
Harman M, Jia Y, Test YZ (2012) App store mining and analysis: Msr for app stores
Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, pp 392–401
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 168–177
Iacob C, Harrison R (2013) Retrieving and analyzing mobile apps feature requests from online reviews. In: Proceedings of the Tenth International Workshop on Mining Software Repositories. IEEE Press, pp 41–44
Jindal N, Liu B (2007) Review spam detection. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 1189–1190
Khalid H, Nagappan M, Shihab E, Hassan AE (2014) Prioritizing the devices to test your app on: A case study of android game apps. In: 22nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE 2014)
Khalid H, Shihab E, Nagappan M, Hassan A (2014) What do mobile app users complain about? A study on free ios apps:1–1
Khalid H (2013) On identifying user complaints of ios apps. In: Proceedings of the 2013 International Conference on Software Engineering. IEEE Press
Kim H-W, Lee HL, Son JE (2011) An exploratory study on the determinants of smartphone app purchase. In: The 11th International DSI and the 16th APDSI Joint Meeting. Taipei, Taiwan
Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics
Melville P, Gryc W, Lawrence R (2009) Sentiment analysis of blogs by combining lexical knowledge with text classification. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1275–1284
Michielan L, Terfloth L, Gasteiger J, Moro S (2009) Comparison of multilabel and single-label classification applied to the prediction of the isoform specificity of cytochrome p450 substrates. J Chem Inf Model 49(11):2588–2605. pMID: 19883102. Available: http://pubs.acs.org/doi/abs/10.1021/ci900299a
mobile V (2014) Developer economics Q1 2014: state of the developer nation. Tech Rep:05
Mudambi SM, Schuff D (2010) What makes a helpful online review? A study of customer reviews on amazon.com. MIS Q 34(1):185–200
Niculescu MF, Wu DJ (2011) When should software firms commercialize new products via freemium business models. Under Review
Nguyen TH, Adams B, Jiang ZM, Hassan AE, Nasser M, Flora P (2012) Automated detection of performance regressions using statistical process control techniques. In: Proceedings of the third joint WOSP/SIPEW international conference on Performance Engineering. ACM, pp 299–310
Pagano D, Maalej W (2013) Proceedings of the 21st. IEEE International Requirements Engineering Conference. IEEE. Available: http://mobis.informatik.uni-hamburg.de/wp-content/uploads/2013/07/RE2013PaganoMaalej.pdf
Pagano D, Bruegge B (2013) User involvement in software evolution practice: a case study. In: Proceedings of the 2013 International Conference on Software Engineering, pp 953–962
Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: LREC
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1-2):1–135
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics, pp 79–86
Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, p 271
Porter MF (1980) An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14(3), 130–137
Rajaraman A, Ullman JD (2012) Mining of massive datasets. In: Cambridge University Press
Ramage D, Rosen E (2011) Stanford topic modeling toolbox
Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. Association for Computational Linguistics, pp 248–256
Read J (2010) Scalable multi-label classification. Ph.D. dissertation, University of Waikato
Read J, Pfahringer B, Holmes G, Frank E (2009) Classifier chains for multi-label classification. In: Machine Learning and Knowledge Discovery in Databases. Springer, pp 254–269
Read J (2013) Meka: A multi-label extension to weka. In: Available: http://meka.sourceforge.net/
Shewhart WA (1931) Economic control of quality of manufactured product. ASQ Quality Press, vol. 509
Thelwall M, Buckley K, Paltoglou G (2012) Sentiment strength detection for the social web. J Am Soc Inf Sci Technol 63(1):163–173. Available: doi:10.1002/asi.21662
Tsoumakas G, Katakis I, Vlahavas I (2010) Mining multi-label data. In: Data mining and knowledge discovery handbook. Springer, pp 667–685
Tsoumakas G, Katakis I (2007) Multi-label classification: An overview. Int J Data Warehousing and Mining (IJDWM) 3(3):1–13
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Sunghun Kim
Rights and permissions
About this article
Cite this article
McIlroy, S., Ali, N., Khalid, H. et al. Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empir Software Eng 21, 1067–1106 (2016). https://doi.org/10.1007/s10664-015-9375-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-015-9375-7