[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3152494.3152500acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
research-article

Fault in your stars: an analysis of Android app reviews

Published: 11 January 2018 Publication History

Abstract

Mobile app distribution platforms such as Google Play Store allow users to share their feedback about downloaded apps in the form of a review comment and a corresponding star rating. Typically, the star rating ranges from one to five stars, with one star denoting a high sense of dissatisfaction with the app and five stars denoting a high sense of satisfaction.
Unfortunately, due to a variety of reasons, often the star rating provided by a user is inconsistent with the opinion expressed in the review. For example, consider the following review for the Facebook App on Android; "Awesome App". One would reasonably expect the rating for this review to be five stars, but the actual rating is one star!
Such inconsistent ratings can lead to a deflated (or inflated) overall average rating of an app which can affect user downloads, as typically users look at the average star ratings while making a decision on downloading an app. Also, the app developers receive a biased feedback about the application that does not represent ground reality. This is especially significant for small apps with a few thousand downloads as even a small number of mismatched reviews can bring down the average rating drastically.
In this paper, we conducted a study on this review-rating mismatch problem. We manually examined 8600 reviews from 10 popular Android apps and found that 20% of the ratings in our dataset were inconsistent with the review. Further, we developed three systems; two of which were based on traditional machine learning and one on deep learning to automatically identify reviews whose rating did not match with the opinion expressed in the review. Our deep learning system performed the best and had an accuracy of 92% in identifying the correct star rating to be associated with a given review.
In another evaluation, we asked 23 end users to write reviews for any 5 apps that they had used recently. We got 115 reviews from 66 different mobile apps. Our deep learning system had an accuracy of 87%.
Further, our study suggests that this problem is quite prevalent among apps. Across the ten apps used in our study, the mismatch percentage ranged from 16% to 26%.

References

[1]
Mousumi Banerjee, Michelle Capozzoli, Laura McSweeney, and Debajyoti Sinha. 1999. Beyond kappa: A review of interrater agreement measures. Canadian Journal of Statistics 27, 1 (1999), 3--23.
[2]
Ning Chen, Jialiu Lin, Steven C. H. Hoi, Xiaokui Xiao, and Boshen Zhang. 2014. AR-miner: Mining Informative Reviews for Developers from Mobile App Marketplace. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 767--778.
[3]
Kushal Dave, Steve Lawrence, and David M. Pennock. 2003. Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. In Proceedings of the 12th International Conference on World Wide Web (WWW '03). ACM, New York, NY, USA, 519--528.
[4]
Kushal Dave, Steve Lawrence, and David M. Pennock. 2003. Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. In Proceedings of the 12th International Conference on World Wide Web (WWW '03). ACM, New York, NY, USA, 519--528.
[5]
deeplearning.net. 2014. Convolutional Neural Networks (LeNet). online. (2014).
[6]
Steve Easterbrook, Janice Singer, Margaret-Anne Storey, and Daniela Damian. 2008. Selecting Empirical Methods for Software Engineering Research. (2008), 285--311.
[7]
Bin Fu, Jialiu Lin, Lei Li, Christos Faloutsos, Jason Hong, and Norman Sadeh. 2013. Why People Hate Your App: Making Sense of User Feedback in a Mobile App Store. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '13). ACM, New York, NY, USA, 1276--1284.
[8]
Laura V. Galvis Carreño and Kristina Winbladh. 2013. Analysis of User Comments: An Approach for Software Requirements Evolution. In Proceedings of the 2013 International Conference on Software Engineering (ICSE '13). IEEE Press, Piscataway, NJ, USA, 582--591. http://dl.acm.org/citation.cfm?id=2486788.2486865
[9]
M. Harman, Y. Jia, and Y. Zhang. 2012. App store mining and analysis: MSR for app stores. In Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on. 108--111.
[10]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[11]
Claudia Iacob and Rachel Harrison. 2013. Retrieving and Analyzing Mobile Apps Feature Requests from Online Reviews. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR '13). IEEE Press, Piscataway, NJ, USA, 41--44. http://dl.acm.org/citation.cfm?id=2487085.2487094
[12]
Hammad Khalid, Meiyappan Nagappan, Emad Shihab, and Ahmed E. Hassan. 2014. Prioritizing the Devices to Test Your App on: A Case Study of Android Game Apps. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). ACM, New York, NY, USA, 610--620.
[13]
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. CoRR abs/1408.5882 (2014). http://arxiv.org/abs/1408.5882
[14]
J. Peter Kincaid, Richard Braby, and John E. Mears. {n. d.}. Electronic authoring and delivery of technical information. Journal of instructional development 11, 2 ({n. d.}), 8--13.
[15]
Y. LeCun, L. Jackel, L. Bottou, A. Brunot, C. Cortes, J. Denker, H. Drucker, I. Guyon, U. MÃijller, E. SÃd'ckinger, P. Simard, and V. Vapnik. 1995. Comparison of Learning Algorithms for Handwritten Digit Recognition. In INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS. 53--60.
[16]
Mario Linares-Vásquez, Gabriele Bavota, Carlos Bernal-Cárdenas, Massimiliano Di Penta, Rocco Oliveto, and Denys Poshyvanyk. 2013. API Change and Fault Proneness: A Threat to the Success of Android Apps. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2013). ACM, New York, NY, USA, 477--487.
[17]
Mingbo Ma, Liang Huang, Bowen Zhou, and Bing Xiang. 2015. Tree-based Convolution for Sentence Modeling. CoRR abs/1507.01839 (2015). http://arxiv.org/abs/1507.01839
[18]
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Association for Computational Linguistics (ACL) System Demonstrations. 55--60. http://www.aclweb.org/anthology/P/P14/P14-5010
[19]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781
[20]
Thomas M. Mitchell. 1997. Machine Learning (1 ed.). McGraw-Hill, Inc., New York, NY, USA.
[21]
M. Ohzeki. 2015. Statistical-Mechanical Analysis of Pre-training and Fine Tuning in Deep Learning. Journal of the Physical Society of Japan 84, 3, Article 034003 (March 2015), 034003 pages. arXiv:stat.ML/1501.04413
[22]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532--1543. http://www.aclweb.org/anthology/D14--1162
[23]
S. Ruder, P. Ghaffari, and J. G. Breslin. 2016. Character-level and Multi-channel Convolutional Neural Networks for Large-scale Authorship Attribution. ArXiv e-prints (Sept. 2016). arXiv:cs.CL/1609.06686
[24]
Tony C. Smith and Eibe Frank. 2016. Statistical Genomics: Methods and Protocols. Springer, New York, NY, Chapter Introducing Machine Learning Concepts with WEKA, 353--378.
[25]
Phong Minh Vu, Tam The Nguyen, Hung Viet Pham, and Tung Thanh Nguyen. 2015. Mining User Opinions in Mobile App Reviews: A Keyword-Based Approach (T). In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE) (ASE '15). IEEE Computer Society, Washington, DC, USA, 749--759.
[26]
Hongning Wang, Yue Lu, and ChengXiang Zhai. 2011. Latent Aspect Rating Analysis Without Aspect Keyword Supervision. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '11). ACM, New York, NY, USA, 618--626.
[27]
Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li. 2015. Empirical Evaluation of Rectified Activations in Convolutional Network. CoRR abs/1505.00853 (2015). http://arxiv.org/abs/1505.00853
[28]
Matthew D. Zeiler. 2012. ADADELTA: An Adaptive Learning Rate Method. CoRR abs/1212.5701 (2012). http://arxiv.org/abs/1212.5701

Cited By

View all
  • (2024)A Text Mining Study of Online Reviews to Understand Intercity Bus Service QualityTransport Policy10.1016/j.tranpol.2024.12.002Online publication date: Dec-2024
  • (2023)Quality Assurance of A GPT-Based Sentiment Analysis System: Adversarial Review Data Generation and Detection2023 30th Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC60848.2023.00056(450-457)Online publication date: 4-Dec-2023
  • (2023)Performance evaluation of machine learning models on large dataset of android applications reviewsMultimedia Tools and Applications10.1007/s11042-023-14713-682:24(37197-37219)Online publication date: 18-Mar-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
CODS-COMAD '18: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data
January 2018
379 pages
ISBN:9781450363419
DOI:10.1145/3152494
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 January 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Android
  2. convolutional neural networks
  3. deep learning
  4. machine learning
  5. mobile apps

Qualifiers

  • Research-article

Conference

CoDS-COMAD '18

Acceptance Rates

CODS-COMAD '18 Paper Acceptance Rate 50 of 150 submissions, 33%;
Overall Acceptance Rate 197 of 680 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)6
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Text Mining Study of Online Reviews to Understand Intercity Bus Service QualityTransport Policy10.1016/j.tranpol.2024.12.002Online publication date: Dec-2024
  • (2023)Quality Assurance of A GPT-Based Sentiment Analysis System: Adversarial Review Data Generation and Detection2023 30th Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC60848.2023.00056(450-457)Online publication date: 4-Dec-2023
  • (2023)Performance evaluation of machine learning models on large dataset of android applications reviewsMultimedia Tools and Applications10.1007/s11042-023-14713-682:24(37197-37219)Online publication date: 18-Mar-2023
  • (2023)Evaluating pre-trained models for user feedback analysis in software engineering: a study on classification of app-reviewsEmpirical Software Engineering10.1007/s10664-023-10314-x28:4Online publication date: 23-May-2023
  • (2022)Quality assurance study with mismatched data in sentiment analysis2022 29th Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC57359.2022.00059(442-446)Online publication date: Dec-2022
  • (2022)Opinion mining for app reviews: an analysis of textual representation and predictive modelsAutomated Software Engineering10.1007/s10515-021-00301-129:1Online publication date: 1-May-2022
  • (2022)Mkulima Platform: An Inclusive Business Platform Ecosystem that Integrates African Small-Scale Farmers into Agricultural Value Chaine-Infrastructure and e-Services for Developing Countries10.1007/978-3-031-06374-9_26(397-419)Online publication date: 26-May-2022
  • (2021)VOYAGER – Smart Travel Guidance Cross Platform Mobile Application2021 3rd International Conference on Advancements in Computing (ICAC)10.1109/ICAC54203.2021.9671136(163-168)Online publication date: 9-Dec-2021
  • (2020)A Framework to Analyze Comments for Educational Apps on Google Play StoreHCI International 2020 - Posters10.1007/978-3-030-50729-9_37(264-268)Online publication date: 10-Jul-2020
  • (2020)Updating the goal model with user reviews for the evolution of an appJournal of Software: Evolution and Process10.1002/smr.225732:8Online publication date: 3-Aug-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media