[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3507473.3507480acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicsedConference Proceedingsconference-collections
research-article

Compare Machine Learning Models in Text Classification Using Steam User Reviews

Published: 13 February 2022 Publication History

Abstract

Text Classification and Sentiment Analysis of game reviews are viewed as important parts in not only academic fields but also in game studies. In this paper, with more than 400 thousand game reviews on Steam platform, we preprocess the data using different libraries (sklearn, nltk, and spaCy) and use them as inputs to build three sentiment classification models based on different algorithms (Naive Bayes, SVM, and Random Forest). In contrast to previous studies that only focus on different sentiment analysis models, our paper also highlights the use of different APIs to preprocess the data and their corresponding model performance. The results show that no matter which API we choose, Random Forest models always perform the best. However, in terms of training time, Naive Bayes is the fastest. This work can be used to apply grid search for researchers to automatically find the optimum API before conducting sentiment analysis in the future.

References

[1]
NewZoo.2020. Newzoo Global Games Market Report 2020 | Light Version.https://newzoo.com/insights/trend-reports/newzoo-global-games-market-report-2020-light-version/
[2]
Bhatt, A., Patel, A.,Chheda,H.and Gawande,K. 2015. Amazon review classification and sentiment analysis.(2015.)
[3]
Haque,TU., Saber, NN. and Shi.e.,FM. 2018. Sentiment analysis on large scale Amazon product reviews.(2018.)
[4]
Guner,L.,Coyne, E. and Smit,J. 2019.Sentiment analysis for Amazon. com reviews.(2019.)
[5]
Tan, W., Wang, X. and Xu,X. 2019. Sentiment analysis for Amazon reviews.(2019.)
[6]
Chowdhury, G.G. 2002. Annual Review of Information Science and Technology. Online Information Review. 26, 5 (2002), 348–349.
[7]
Hatzivassiloglou, V. and McKeown, K.R. 1997. Predicting the semantic orientation of adjectives. Proceedings of the 35th annual meeting on Association for Computational Linguistics -. (1997).
[8]
Ng, H.T. and Zelle, J. 1997. Corpus-based approaches to semantic interpretation in NLP. AI Magazine. 18, 4 (1997).
[9]
Pang, B., Lee, L. and Vaithyanathan, S. 2002. Thumbs up? Sentiment classification using machine learning techniques. Proceedings of the ACL-02 conference on Empirical methods in natural language processing - EMNLP '02. (2002).
[10]
Aggarwal, C.C. and Zhai, C.X. 2012. A Survey of Text Classification Algorithms. Mining Text Data. (2012), 163–222.
[11]
Liu, B. 2012. Sentiment Analysis and Opinion Mining. Morgan & Claypool.
[12]
Tripathy, A., Anand, A. and Rath, S.K. 2017. Document-level sentiment classification using hybrid machine learning approach. Knowledge and Information Systems. 53, 3 (2017), 805–831.
[13]
Zhen, Z. 2018 Sentiment Analysis of Steam Review Datasets using Naive Bayes and Decision Tree Classifier. https://www.ideals.illinois.edu/handle/2142/100126. (2018)
[14]
Di Nunzio, G.M. 2014. A new decision to take for cost-sensitive Naïve Bayes classifiers. Information Processing & Management. 50, 5 (2014), 653–674.
[15]
Lopamudra, D., Sanjay, C., Anuraag, B., Beepa B., Sweta T., 2016. Sentiment Analysis of Review Datasets Using Naive Bayes and K-NN Classifier -. 8, 4 (2016), 54-62.
[16]
Fang, X. and Zhan, J. 2015. Sentiment analysis using product review data. Journal of Big Data. 2, 1 (2015).
[17]
Strååt, B., Verhagen, H. and Warpefelt, H. 2017. Probing user opinions in an indirect way. Proceedings of the 21st International Academic Mindtrek Conference. (2017).
[18]
Pontiki, M. 2016. SemEval-2016 Task 5: Aspect Based Sentiment Analysis. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). (2016).
[19]
Hu, M. and Liu, B. 2004. Mining and summarizing customer reviews. Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '04. (2004).
[20]
Thet, T.T., Na, J.-C. and Khoo, C.S.G. 2010. Aspect-based sentiment analysis of movie reviews on discussion boards. Journal of Information Science. 36, 6 (2010), 823–848.
[21]
Ruder, S., Ghaffari, P. and Breslin, J.G. 2016. A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. (2016).
[22]
Loper, E. and Bird, S. 2002. NLTK: The Natural Language Toolkit. Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics -. (2002).
[23]
Schmitt, X., Kubler, S., Robert, J., Papadakis, M. and LeTraon, Y. 2019. A Replicable Comparison Study of NER Software: StanfordNLP, NLTK, OpenNLP, SpaCy, Gate. 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS). (2019).
[24]
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
[25]
Amrani, Al "Random forest and support vector machine based hybrid approach to sentiment analysis." Procedia Computer Science 127 (2018): 511-520.
[26]
Vapnik, V., Golowich, S. E., & Smola, A. (1997). Support vector method for function approximation, regression estimation, and signal processing. Advances in neural information processing systems, 281-287.

Cited By

View all
  • (2024)An empirically based object-oriented testing using Machine learningEAI Endorsed Transactions on Internet of Things10.4108/eetiot.534410Online publication date: 8-Mar-2024
  • (2024)Enhancing churn forecasting with sentiment analysis of steam reviewsSocial Network Analysis and Mining10.1007/s13278-024-01337-314:1Online publication date: 31-Aug-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICSED '21: Proceedings of the 2021 3rd International Conference on Software Engineering and Development
November 2021
75 pages
ISBN:9781450385213
DOI:10.1145/3507473
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 February 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. nltk
  2. sentiment analysis
  3. sentiment classification models
  4. sklearn
  5. spaCy
  6. text mining

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICSED 2021

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)49
  • Downloads (Last 6 weeks)5
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An empirically based object-oriented testing using Machine learningEAI Endorsed Transactions on Internet of Things10.4108/eetiot.534410Online publication date: 8-Mar-2024
  • (2024)Enhancing churn forecasting with sentiment analysis of steam reviewsSocial Network Analysis and Mining10.1007/s13278-024-01337-314:1Online publication date: 31-Aug-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media