On the automatic classification of app reviews

Walid Maalej¹,
Zijad Kurtanović¹,
Hadeer Nabil² &
…
Christoph Stanik¹

6714 Accesses
4 Altmetric
Explore all metrics

Abstract

App stores like Google Play and Apple AppStore have over 3 million apps covering nearly every kind of software and service. Billions of users regularly download, use, and review these apps. Recent studies have shown that reviews written by the users represent a rich source of information for the app vendors and the developers, as they include information about bugs, ideas for new features, or documentation of released features. The majority of the reviews, however, is rather non-informative just praising the app and repeating to the star ratings in words. This paper introduces several probabilistic techniques to classify app reviews into four types: bug reports, feature requests, user experiences, and text ratings. For this, we use review metadata such as the star rating and the tense, as well as, text classification, natural language processing, and sentiment analysis techniques. We conducted a series of experiments to compare the accuracy of the techniques and compared them with simple string matching. We found that metadata alone results in a poor classification accuracy. When combined with simple text classification and natural language preprocessing of the text—particularly with bigrams and lemmatization—the classification precision for all review types got up to 88–92 % and the recall up to 90–99 %. Multiple binary classifiers outperformed single multiclass classifiers. Our results inspired the design of a review analytics tool, which should help app vendors and developers deal with the large amount of reviews, filter critical reviews, and assign them to the appropriate stakeholders. We describe the tool main features and summarize nine interviews with practitioners on how review analytics tools including ours could be used in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Opinion mining for app reviews: an analysis of textual representation and predictive models

Article 06 October 2021

E-Commerce Product Review Analysis Using Machine Learning Techniques

Making Sense of Online Reviews: A Machine Learning Approach: An Abstract

Notes

References

Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc Y-G (2008) Is it a bug or an enhancement?: A text-based approach to classify change requests. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds (CASCON’08). ACM, pp 23:304–23:318
Bano M, Zowghi D (2015) A systematic review on the relationship between user involvement and system success. Inf Softw Technol 58:148–169
Article Google Scholar
Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report? In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering. ACM Press, p 308
Bird S, Klein E, Loper E (2009) Natural language processing with Python. O’Reilly Media, Inc
Chen N, Lin J, Hoi SCH, Xiao X, Zhang B (2014) AR-miner: mining informative reviews for developers from mobile app marketplace. In: Proceedings of the 36th international conference on software engineering (ICSE 2014). ACM, pp 767–778
Finkelstein A, Harman M, Jia Y, Martin W, Sarro F, Zhang Y (2014) App store analysis: mining app stores for relationships between customer, business and technical characteristics. Research Note RN/14/10, UCL Department of Computer Science
Fitzgerald C, Letier E, Finkelstein A (2011) Early failure prediction in feature request management systems. In: Proceedings of the 2011 IEEE 19th international requirements engineering conference (RE’11). IEEE Computer Society, pp 229–238
Galvis Carreño LV , Winbladh K (2013) Analysis of user comments: an approach for software requirements evolution. In: ICSE ’13 Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 582–591
Gartner (2015) Number of mobile app downloads worldwide from 2009 to 2017 (in millions). Technical report, Gartner Inc
Gorla A, Tavecchia I, Gross F, Zeller A (2014) Checking app behavior against app descriptions. In: Proceedings of the 36th international conference on software engineering. ACM, pp 1025–1035
Groen EC, Doerr J, Adam S (2015) Towards crowd-based requirements engineering: a research preview. In: REFSQ 2015, number 9013 in LNCS. Springer, Berlin, pp 247–253
Guzman E, Maalej W (2014) How do users like this feature? A fine grained sentiment analysis of app reviews. In: 2014 IEEE 22nd international requirements engineering conference (RE), pp 153–162
Harman M, Jia Y, Zhang Y (2012) App store mining and analysis: MSR for app stores. In: Proceedings of the working conference on mining software repositories—MSR’12, pp 108–111
Harman M et al (eds) (2014) The 36th CREST open workshop. University College London
Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 392–401
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70
MathSciNet MATH Google Scholar
Hoon L, Vasa R, Schneider J-G, Grundy J (2013) An analysis of the mobile app review landscape: trends and implications. Technical report, Swinburne University of Technology
Hu M, Liu B (2004) Mining opinion features in customer reviews. In: Proceedings of the international conference on knowledge discovery and data mining—KDD ’04. AAAI Press, pp 755–760
Iacob C, Harrison R (2013) Retrieving and analyzing mobile apps feature requests from online reviews. In: MSR ’13 proceedings of the 10th working conference on mining software repositories. IEEE Press, pp 41–44
Jakob N, Weber SH, Müller MC, Gurevych I (2009) Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations. In: Proceeding of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion. ACM Press
Johann T, Maalej W (2015) Democratic mass participation of users in requirements engineering? In: 2015 IEEE 23rd international requirements engineering conference (RE)
Li H, Zhang L, Zhang L, Shen J (2010) A user satisfaction analysis approach for software evolution. In: 2010 IEEE international conference on progress in informatics and computing (PIC), vol 2. IEEE, pp 1093–1097
Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? On automatically classifying app reviews. In: 2015 IEEE 23rd international requirements engineering conference (RE), pp 116–125
Maalej W, Happel H-J, Rashid A (2009) When users become collaborators. In: Proceeding of the 24th ACM SIGPLAN conference companion on object oriented programming systems languages and applications—OOPSLA ’09. ACM Press, p 981
Maalej W, Pagano D (2011) On the socialness of software. In: 2011 IEEE ninth international conference on dependable, autonomic and secure computing. IEEE, pp 864–871
Mei Q, Ling X, Wondra M, Su H, Zhai C (2007) Topic sentiment mixture: modeling facets and opinions in Weblogs. In Proceedings of the 16th international conference on World Wide Web, pp 171–180
Neuendorf KA (2002) The content analysis guidebook. Sage, Beverly Hills
Google Scholar
Pagano D, Brügge B (2013) User involvement in software evolution practice: a case study. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 953–962
Pagano D, Maalej W (2013) User feedback in the appstore: an empirical study. In: Proceedings of the international conference on requirements engineering—RE’13, pp 125–134
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
Article Google Scholar
Popescu A-M, Etzioni O (2005) Extracting product features and opinions from reviews. In: Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, pp 339–346
Schneider K, Meyer S, Peters M, Schliephacke F, Mörschbach J, Aguirre L (2010) Product-focused software process improvement, vol 6156 of lecture notes in computer science. Springer, Berlin
Google Scholar
Seyff N, Graf F, Maiden N (2010) Using mobile RE tools to give end-users their own voice. In: 18th IEEE international requirements engineering conference. IEEE, pp 37–46
Standish Group (2014) Chaos report. Technical report
Thelwall M, Buckley K, Paltoglou G (2012) Sentiment strength detection for the social web. J Am Soc Inf Sci Technol 63(1):163–173
Article Google Scholar
Thelwall M, Buckley K, Paltoglou G, Cai D, Kappas A (2010) Sentiment strength detection in short informal text. J Am Soc Inf Sci Technol 61(12):2544–2558
Article Google Scholar
Torgo L (2010) Data mining with R: learning with case studies. Data mining and knowledge discovery series. Chapman & Hall/CRC, Boca Raton
Book Google Scholar
Xu Q-S, Liang Y-Z (2001) Monte Carlo cross validation. Chemom Intell Lab Syst 56(1):1–11
Article MathSciNet Google Scholar

Download references

Acknowledgments

We thank D. Pagano for his support with the data collection, M. Häring for contributing to the development of the coding tool, as well as the RE15 reviewers, M. Nagappan, and T. Johann for the comments on the paper. We are also very grateful to the participants in the evaluation interviews. This work was partly supported by Microsoft Research (SEIF Award 2014).

Author information

Authors and Affiliations

Department of Informatics, University of Hamburg, Hamburg, Germany
Walid Maalej, Zijad Kurtanović & Christoph Stanik
German University of Cairo, Cairo, Egypt
Hadeer Nabil

Authors

Walid Maalej
View author publications
You can also search for this author in PubMed Google Scholar
Zijad Kurtanović
View author publications
You can also search for this author in PubMed Google Scholar
Hadeer Nabil
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Stanik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Walid Maalej.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maalej, W., Kurtanović, Z., Nabil, H. et al. On the automatic classification of app reviews. Requirements Eng 21, 311–331 (2016). https://doi.org/10.1007/s00766-016-0251-9

Download citation

Received: 14 November 2015
Accepted: 26 April 2016
Published: 14 May 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s00766-016-0251-9

On the automatic classification of app reviews

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Opinion mining for app reviews: an analysis of textual representation and predictive models

E-Commerce Product Review Analysis Using Machine Learning Techniques

Making Sense of Online Reviews: A Machine Learning Approach: An Abstract

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

On the automatic classification of app reviews

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Opinion mining for app reviews: an analysis of textual representation and predictive models

E-Commerce Product Review Analysis Using Machine Learning Techniques

Making Sense of Online Reviews: A Machine Learning Approach: An Abstract

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation