Abstract
App stores like Google Play and Apple AppStore have over 3 million apps covering nearly every kind of software and service. Billions of users regularly download, use, and review these apps. Recent studies have shown that reviews written by the users represent a rich source of information for the app vendors and the developers, as they include information about bugs, ideas for new features, or documentation of released features. The majority of the reviews, however, is rather non-informative just praising the app and repeating to the star ratings in words. This paper introduces several probabilistic techniques to classify app reviews into four types: bug reports, feature requests, user experiences, and text ratings. For this, we use review metadata such as the star rating and the tense, as well as, text classification, natural language processing, and sentiment analysis techniques. We conducted a series of experiments to compare the accuracy of the techniques and compared them with simple string matching. We found that metadata alone results in a poor classification accuracy. When combined with simple text classification and natural language preprocessing of the text—particularly with bigrams and lemmatization—the classification precision for all review types got up to 88–92 % and the recall up to 90–99 %. Multiple binary classifiers outperformed single multiclass classifiers. Our results inspired the design of a review analytics tool, which should help app vendors and developers deal with the large amount of reviews, filter critical reviews, and assign them to the appropriate stakeholders. We describe the tool main features and summarize nine interviews with practitioners on how review analytics tools including ours could be used in practice.
Similar content being viewed by others
References
Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc Y-G (2008) Is it a bug or an enhancement?: A text-based approach to classify change requests. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds (CASCON’08). ACM, pp 23:304–23:318
Bano M, Zowghi D (2015) A systematic review on the relationship between user involvement and system success. Inf Softw Technol 58:148–169
Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report? In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering. ACM Press, p 308
Bird S, Klein E, Loper E (2009) Natural language processing with Python. O’Reilly Media, Inc
Chen N, Lin J, Hoi SCH, Xiao X, Zhang B (2014) AR-miner: mining informative reviews for developers from mobile app marketplace. In: Proceedings of the 36th international conference on software engineering (ICSE 2014). ACM, pp 767–778
Finkelstein A, Harman M, Jia Y, Martin W, Sarro F, Zhang Y (2014) App store analysis: mining app stores for relationships between customer, business and technical characteristics. Research Note RN/14/10, UCL Department of Computer Science
Fitzgerald C, Letier E, Finkelstein A (2011) Early failure prediction in feature request management systems. In: Proceedings of the 2011 IEEE 19th international requirements engineering conference (RE’11). IEEE Computer Society, pp 229–238
Galvis Carreño LV , Winbladh K (2013) Analysis of user comments: an approach for software requirements evolution. In: ICSE ’13 Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 582–591
Gartner (2015) Number of mobile app downloads worldwide from 2009 to 2017 (in millions). Technical report, Gartner Inc
Gorla A, Tavecchia I, Gross F, Zeller A (2014) Checking app behavior against app descriptions. In: Proceedings of the 36th international conference on software engineering. ACM, pp 1025–1035
Groen EC, Doerr J, Adam S (2015) Towards crowd-based requirements engineering: a research preview. In: REFSQ 2015, number 9013 in LNCS. Springer, Berlin, pp 247–253
Guzman E, Maalej W (2014) How do users like this feature? A fine grained sentiment analysis of app reviews. In: 2014 IEEE 22nd international requirements engineering conference (RE), pp 153–162
Harman M, Jia Y, Zhang Y (2012) App store mining and analysis: MSR for app stores. In: Proceedings of the working conference on mining software repositories—MSR’12, pp 108–111
Harman M et al (eds) (2014) The 36th CREST open workshop. University College London
Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 392–401
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70
Hoon L, Vasa R, Schneider J-G, Grundy J (2013) An analysis of the mobile app review landscape: trends and implications. Technical report, Swinburne University of Technology
Hu M, Liu B (2004) Mining opinion features in customer reviews. In: Proceedings of the international conference on knowledge discovery and data mining—KDD ’04. AAAI Press, pp 755–760
Iacob C, Harrison R (2013) Retrieving and analyzing mobile apps feature requests from online reviews. In: MSR ’13 proceedings of the 10th working conference on mining software repositories. IEEE Press, pp 41–44
Jakob N, Weber SH, Müller MC, Gurevych I (2009) Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations. In: Proceeding of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion. ACM Press
Johann T, Maalej W (2015) Democratic mass participation of users in requirements engineering? In: 2015 IEEE 23rd international requirements engineering conference (RE)
Li H, Zhang L, Zhang L, Shen J (2010) A user satisfaction analysis approach for software evolution. In: 2010 IEEE international conference on progress in informatics and computing (PIC), vol 2. IEEE, pp 1093–1097
Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? On automatically classifying app reviews. In: 2015 IEEE 23rd international requirements engineering conference (RE), pp 116–125
Maalej W, Happel H-J, Rashid A (2009) When users become collaborators. In: Proceeding of the 24th ACM SIGPLAN conference companion on object oriented programming systems languages and applications—OOPSLA ’09. ACM Press, p 981
Maalej W, Pagano D (2011) On the socialness of software. In: 2011 IEEE ninth international conference on dependable, autonomic and secure computing. IEEE, pp 864–871
Mei Q, Ling X, Wondra M, Su H, Zhai C (2007) Topic sentiment mixture: modeling facets and opinions in Weblogs. In Proceedings of the 16th international conference on World Wide Web, pp 171–180
Neuendorf KA (2002) The content analysis guidebook. Sage, Beverly Hills
Pagano D, Brügge B (2013) User involvement in software evolution practice: a case study. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 953–962
Pagano D, Maalej W (2013) User feedback in the appstore: an empirical study. In: Proceedings of the international conference on requirements engineering—RE’13, pp 125–134
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
Popescu A-M, Etzioni O (2005) Extracting product features and opinions from reviews. In: Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, pp 339–346
Schneider K, Meyer S, Peters M, Schliephacke F, Mörschbach J, Aguirre L (2010) Product-focused software process improvement, vol 6156 of lecture notes in computer science. Springer, Berlin
Seyff N, Graf F, Maiden N (2010) Using mobile RE tools to give end-users their own voice. In: 18th IEEE international requirements engineering conference. IEEE, pp 37–46
Standish Group (2014) Chaos report. Technical report
Thelwall M, Buckley K, Paltoglou G (2012) Sentiment strength detection for the social web. J Am Soc Inf Sci Technol 63(1):163–173
Thelwall M, Buckley K, Paltoglou G, Cai D, Kappas A (2010) Sentiment strength detection in short informal text. J Am Soc Inf Sci Technol 61(12):2544–2558
Torgo L (2010) Data mining with R: learning with case studies. Data mining and knowledge discovery series. Chapman & Hall/CRC, Boca Raton
Xu Q-S, Liang Y-Z (2001) Monte Carlo cross validation. Chemom Intell Lab Syst 56(1):1–11
Acknowledgments
We thank D. Pagano for his support with the data collection, M. Häring for contributing to the development of the coding tool, as well as the RE15 reviewers, M. Nagappan, and T. Johann for the comments on the paper. We are also very grateful to the participants in the evaluation interviews. This work was partly supported by Microsoft Research (SEIF Award 2014).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Maalej, W., Kurtanović, Z., Nabil, H. et al. On the automatic classification of app reviews. Requirements Eng 21, 311–331 (2016). https://doi.org/10.1007/s00766-016-0251-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00766-016-0251-9