Skip to main content
Log in

Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Mobile app reviews by users contain a wealth of information on the issues that users are experiencing. For example, a review might contain a feature request, a bug report, and/or a privacy complaint. Developers, users and app store owners (e.g. Apple, Blackberry, Google, Microsoft) can benefit from a better understanding of these issues – developers can better understand users’ concerns, app store owners can spot anomalous apps, and users can compare similar apps to decide which ones to download or purchase. However, user reviews are not labelled, e.g. we do not know which types of issues are raised in a review. Hence, one must sift through potentially thousands of reviews with slang and abbreviations to understand the various types of issues. Moreover, the unstructured and informal nature of reviews complicates the automated labelling of such reviews. In this paper, we study the multi-labelled nature of reviews from 20 mobile apps in the Google Play Store and Apple App Store. We find that up to 30 % of the reviews raise various types of issues in a single review (e.g. a review might contain a feature request and a bug report). We then propose an approach that can automatically assign multiple labels to reviews based on the raised issues with a precision of 66 % and recall of 65 %. Finally, we apply our approach to address three proof-of-concept analytics use case scenarios: (i) we compare competing apps to assist developers and users, (ii) we provide an overview of 601,221 reviews from 12,000 apps in the Google Play Store to assist app store owners and developers and (iii) we detect anomalous apps in the Google Play Store to assist app store owners and users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://nlp.stanford.edu/software/tmt/tmt-0.4/

References

  • Abubakar AM, Jawawi DN (2013) A study on code peer review process monitoring using statistical process control. In: e-Proceeding of Software Engineering Postgraduates Workshop (SEPoW), p 136

  • Adobe (2014) Mobile analytics. Available: http://www.adobe.com/ca/solutions/digital-analytics/mobile-analytics.html

  • Ahsan SN, Ferzund J, Wotawa F (2009) Automatic classification of software change request using multi-label machine learning methods. In: Software Engineering Workshop (SEW), 2009 33rd Annual IEEE:79–86

  • Annie A (2014) App annie. Available: http://www.appannie.com/app-store-analytics/

  • Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement?: a text-based approach to classify change requests. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, ser. CASCON ACM, pp 23:304–23:318

  • Akdeniz (2013) Google play crawler. In: Available: https://github.com/Akdeniz/google-play-crawler

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Technol 3:993–1022

    MATH  Google Scholar 

  • Bostic K (2013) Google play takes top spot in downloads, but apple’s app store still tops revenue. Available: http://appleinsider.com/articles/13/07/31/google-play-takes-top-spot-in-downloads-but-apples-app-store-still-tops-revenue http://appleinsider.com/articles/13/07/31/google-play-takes-top-spot-in-downloads-but-apples-app-store-still-tops-revenue

  • Brameier M, Banzhaf W (2001) A comparison of linear genetic programming and neural networks in medical data mining. Evol Comput IEEE Trans 5(1):17–26

    Article  MATH  Google Scholar 

  • Butler M (2011) Android: Changing the mobile landscape. Pervasive Comput IEEE 10(1):4–7

    Article  Google Scholar 

  • Distimo (2013) Google play store, united states, top overall, free, week 35 2013. Available: http://www.distimo.com/leaderboards/google-play-store/united-states/top-overall/free

  • Esbensen KH, Guyot D, Westad F, Houmoller LP (2002) Multivariate data analysis: in practice: an introduction to multivariate data analysis and experimental design. Multivariate Data Analysis

  • Fan R-E, Lin C-J (2007) A study on threshold selection for multi-label classification. In: Department of Computer Science, National Taiwan University

  • Flurry (2014) Flurry. Available: http://www.flurry.com/solutions/analytics

  • Fu B, Lin J, Li L, Faloutsos C, Hong J, Sadeh N (2013) Why people hate your app: Making sense of user feedback in a mobile app store. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’13. New York, NY, USA: ACM, pp 1276–1284. Available: doi:10.1145/2487575.2488202

  • Galvis Carreño LV, Winbladh K (2013) Analysis of user comments: an approach for software requirements evolution. In: Proceedings of the 2013 International Conference on Software Engineering, ser. ICSE ’13. Piscataway, NJ, USA: IEEE Press, pp 582–591

  • Ganesan K, Zhai C, Viegas E (2012) Micropinion generation: an unsupervised approach to generating ultra-concise summaries of opinions. In: Proceedings of the 21st international conference on World Wide Web, ser. WWW ’12. New York, NY, USA: ACM, pp 869–878

  • Ghaith S, Wang M, Perry P, Murphy J (2013) Profile-based, load-independent anomaly detection and analysis in performance regression testing of software systems. In: 17th European Conference on IEEE Software Maintenance and Reengineering (CSMR), 2013, pp 379–383

  • Google (2013) Google play developer program policies. In: Available: https://play.google.com/about/developer-content-policy.html

  • Google (2013) Core app quality guidelines. In: Available: http://developer.android.com/distribute/googleplay/quality/core.html

  • Google (2014) Google analytics. Available: http://www.google.ca/analytics/mobile/

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009a) The weka data mining software: an update. SIGKDD Explor Newsl 11 (1):10–18. Available: doi:10.1145/1656274.1656278

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009b) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11 (1):10–18

    Article  Google Scholar 

  • Han D, Zhang C, Fan X, Hindle A, Wong K, Stroulia E (2012) Understanding android fragmentation with topic analysis of vendor-specific bugs. In: 19th Working Conference on Reverse Engineering (WCRE), 2012. IEEE, pp 83–92

  • Harman M, Jia Y, Test YZ (2012) App store mining and analysis: Msr for app stores

  • Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, pp 392–401

  • Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 168–177

  • Iacob C, Harrison R (2013) Retrieving and analyzing mobile apps feature requests from online reviews. In: Proceedings of the Tenth International Workshop on Mining Software Repositories. IEEE Press, pp 41–44

  • Jindal N, Liu B (2007) Review spam detection. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 1189–1190

  • Khalid H, Nagappan M, Shihab E, Hassan AE (2014) Prioritizing the devices to test your app on: A case study of android game apps. In: 22nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE 2014)

  • Khalid H, Shihab E, Nagappan M, Hassan A (2014) What do mobile app users complain about? A study on free ios apps:1–1

  • Khalid H (2013) On identifying user complaints of ios apps. In: Proceedings of the 2013 International Conference on Software Engineering. IEEE Press

  • Kim H-W, Lee HL, Son JE (2011) An exploratory study on the determinants of smartphone app purchase. In: The 11th International DSI and the 16th APDSI Joint Meeting. Taipei, Taiwan

  • Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics

  • Melville P, Gryc W, Lawrence R (2009) Sentiment analysis of blogs by combining lexical knowledge with text classification. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1275–1284

  • Michielan L, Terfloth L, Gasteiger J, Moro S (2009) Comparison of multilabel and single-label classification applied to the prediction of the isoform specificity of cytochrome p450 substrates. J Chem Inf Model 49(11):2588–2605. pMID: 19883102. Available: http://pubs.acs.org/doi/abs/10.1021/ci900299a

    Article  Google Scholar 

  • mobile V (2014) Developer economics Q1 2014: state of the developer nation. Tech Rep:05

  • Mudambi SM, Schuff D (2010) What makes a helpful online review? A study of customer reviews on amazon.com. MIS Q 34(1):185–200

    Google Scholar 

  • Niculescu MF, Wu DJ (2011) When should software firms commercialize new products via freemium business models. Under Review

  • Nguyen TH, Adams B, Jiang ZM, Hassan AE, Nasser M, Flora P (2012) Automated detection of performance regressions using statistical process control techniques. In: Proceedings of the third joint WOSP/SIPEW international conference on Performance Engineering. ACM, pp 299–310

  • Pagano D, Maalej W (2013) Proceedings of the 21st. IEEE International Requirements Engineering Conference. IEEE. Available: http://mobis.informatik.uni-hamburg.de/wp-content/uploads/2013/07/RE2013PaganoMaalej.pdf

  • Pagano D, Bruegge B (2013) User involvement in software evolution practice: a case study. In: Proceedings of the 2013 International Conference on Software Engineering, pp 953–962

  • Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: LREC

  • Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1-2):1–135

    Article  Google Scholar 

  • Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics, pp 79–86

  • Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, p 271

  • Porter MF (1980) An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14(3), 130–137

  • Rajaraman A, Ullman JD (2012) Mining of massive datasets. In: Cambridge University Press

  • Ramage D, Rosen E (2011) Stanford topic modeling toolbox

  • Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. Association for Computational Linguistics, pp 248–256

  • Read J (2010) Scalable multi-label classification. Ph.D. dissertation, University of Waikato

  • Read J, Pfahringer B, Holmes G, Frank E (2009) Classifier chains for multi-label classification. In: Machine Learning and Knowledge Discovery in Databases. Springer, pp 254–269

  • Read J (2013) Meka: A multi-label extension to weka. In: Available: http://meka.sourceforge.net/

  • Shewhart WA (1931) Economic control of quality of manufactured product. ASQ Quality Press, vol. 509

  • Thelwall M, Buckley K, Paltoglou G (2012) Sentiment strength detection for the social web. J Am Soc Inf Sci Technol 63(1):163–173. Available: doi:10.1002/asi.21662

  • Tsoumakas G, Katakis I, Vlahavas I (2010) Mining multi-label data. In: Data mining and knowledge discovery handbook. Springer, pp 667–685

  • Tsoumakas G, Katakis I (2007) Multi-label classification: An overview. Int J Data Warehousing and Mining (IJDWM) 3(3):1–13

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stuart McIlroy.

Additional information

Communicated by: Sunghun Kim

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

McIlroy, S., Ali, N., Khalid, H. et al. Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empir Software Eng 21, 1067–1106 (2016). https://doi.org/10.1007/s10664-015-9375-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-015-9375-7

Keywords

Navigation