[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3308558.3313647acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Think Outside the Dataset: Finding Fraudulent Reviews using Cross-Dataset Analysis

Published: 13 May 2019 Publication History

Abstract

While online review services provide a two-way conversation between brands and consumers, malicious actors, including misbehaving businesses, have an equal opportunity to distort the reviews for their own gains. We propose OneReview, a method for locating fraudulent reviews, correlating data from multiple crowd-sourced review sites. Our approach utilizes Change Point Analysis to locate points at which a business' reputation shifts. Inconsistent trends in reviews of the same businesses across multiple websites are used to identify suspicious reviews. We then extract an extensive set of textual and contextual features from these suspicious reviews and employ supervised machine learning to detect fraudulent reviews.
We evaluated OneReview on about 805K and 462K reviews from Yelp and TripAdvisor, respectively to identify fraud on Yelp. Supervised machine learning yields excellent results, with 97% accuracy. We applied the created model on suspicious reviews and detected about 62K fraudulent reviews (about 8% of all the Yelp reviews). We further analyzed the detected fraudulent reviews and their authors, and located several spam campaigns in the wild, including campaigns against specific businesses, as well as campaigns consisting of several hundreds of socially-networked untrustworthy accounts.

References

[1]
Hojjat Aghakhani, Aravind Machiry, Shirin Nilizadeh, Christopher Kruegel, and Giovanni Vigna. 2018. Detecting Deceptive Reviews using Generative Adversarial Networks. arXiv preprint arXiv:1805.10364(2018).
[2]
Hirotogu Akaike, BN Petrov, and F Csaki. 1973. Information theory and an extension of the maximum likelihood principle. (1973).
[3]
Amazon. 2018. About Amazon Mechanical Turk. https://www.mturk.com/worker/help
[4]
Michael Anderson and Jeremy Magruder. 2012. Learning from the crowd: Regression discontinuity estimates of the effects of an online review database. The Economic Journal 122, 563 (2012), 957-989.
[5]
Anonymous. 2018. Get Paid to Write Reviews: 27 Sites That Pay You (with Cash & Free Stuff!). http://moneypantry.com/get-paid-to-write-reviews/. (2018).
[6]
Anonymous and Symon, Evan V. 2016. I Get Paid To Write Fake Reviews For Amazon. http://www.cracked.com/personal-experiences-2376-i-get-paid-to-write-fake-reviews-amazon.html. (2016).
[7]
Ivan E Auger and Charles E Lawrence. 1989. Algorithms for the optimal identification of segment neighborhoods. Bulletin of mathematical biology 51, 1 (1989), 39-54.
[8]
Steven Bird. 2006. NLTK: the natural language toolkit. In Proceedings of the COLING/ACL on Interactive presentation sessions. Association for Computational Linguistics, 69-72.
[9]
Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008, 10(2008), P10008.
[10]
Charles F Bond Jr and Bella M DePaulo. 2006. Accuracy of deception judgments. Personality and social psychology Review 10, 3 (2006).
[11]
Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5-32.
[12]
Rich Caruana and Alexandru Niculescu-Mizil. 2006. An Empirical Comparison of Supervised Learning Algorithms. In Proceedings of the 23rd International Conference on Machine Learning(ICML '06). ACM, New York, NY, USA, 161-168.
[13]
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321-357.
[14]
Jie Chen and Arjun K Gupta. 2011. Parametric statistical change point analysis: with applications to genetics, medicine, and finance. Springer Science & Business Media.
[15]
Geli Fei, Arjun Mukherjee, Bing Liu, Meichun Hsu, Malú Castellanos, and Riddhiman Ghosh. 2013. Exploiting Burstiness in Reviews for Review Spammer Detection. In ICWSM. The AAAI Press.
[16]
Song Feng, Ritwik Banerjee, and Yejin Choi. 2012. Syntactic stylometry for deception detection. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2. Association for Computational Linguistics, 171-175.
[17]
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2001. The elements of statistical learning. Vol. 1. Springer series in statistics Springer, Berlin. 587-588 pages.
[18]
Douglas M Hawkins. 2001. Fitting multiple change-point models to data. Computational Statistics & Data Analysis 37, 3 (2001), 323-341.
[19]
Nitin Jindal and Bing Liu. 2008. Opinion Spam and Analysis. In Proceedings of the 2008 International Conference on Web Search and Data Mining(WSDM '08). ACM, New York, NY, USA, 219-230.
[20]
H Tim Kam. 1995. Random decision forest. In Proc. of the 3rd Int'l Conf. on Document Analysis and Recognition, Montreal, Canada, August. 14-18.
[21]
Rebecca Killick, Paul Fearnhead, and Idris A Eckley. 2012. Optimal detection of changepoints with a linear computational cost. J. Amer. Statist. Assoc. 107, 500 (2012), 1590-1598.
[22]
Raymond Y. K. Lau, S. Y. Liao, Ron Chi-Wai Kwok, Kaiquan Xu, Yunqing Xia, and Yuefeng Li. 2012. Text Mining and Probabilistic Language Modeling for Online Review Spam Detection. ACM Trans. Manage. Inf. Syst. 2, 4, Article 25 (Jan. 2012), 30 pages.
[23]
Hee Andy Lee, Rob Law, and Jamie Murphy. 2011. Helpful reviewers in TripAdvisor, an online travel community. Journal of Travel & Tourism Marketing 28, 7 (2011), 675-688.
[24]
Huayi Li, Zhiyuan Chen, Arjun Mukherjee, Bing Liu, and Jidong Shao. 2015. Analyzing and Detecting Opinion Spam on a Large-scale Dataset via Temporal and Spatial Patterns. In ICWSM. 634-637.
[25]
Jiwei Li, Myle Ott, Claire Cardie, and Eduard H Hovy. 2014. Towards a General Rule for Identifying Deceptive Opinion Spam. In ACL (1). Citeseer, 1566-1576.
[26]
Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu, and Hady Wirawan Lauw. 2010. Detecting product review spammers using rating behaviors. In Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, 939-948.
[27]
Yuming Lin, Tao Zhu, Hao Wu, Jingwei Zhang, Xiaoling Wang, and Aoying Zhou. 2014. Towards online anti-opinion spam: Spotting fake reviews from the review sequence. In Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on. IEEE, 261-264.
[28]
Michael Luca and Georgios Zervas. 2016. Fake it till you make it: Reputation, competition, and Yelp review fraud. Management Science (2016).
[29]
Justin Ma, Lawrence K Saul, Stefan Savage, and Geoffrey M Voelker. 2011. Learning to detect malicious urls. ACM Transactions on Intelligent Systems and Technology (TIST) 2, 3(2011), 30.
[30]
Microsoft. 2017. Febipos.A Malware.http://www.microsoft.com/security/portal/threat/encyclopedia/Entry.aspx?Name=Trojan:JS/Febipos.A. (September 2017).
[31]
Arjun Mukherjee, Abhinav Kumar, Bing Liu, Junhui Wang, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh. 2013. Spotting opinion spammers using behavioral footprints. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 632-640.
[32]
Arjun Mukherjee, Bing Liu, and Natalie Glance. 2012. Spotting fake reviewer groups in consumer reviews. In Proceedings of the 21st international conference on World Wide Web. ACM, 191-200.
[33]
Arjun Mukherjee, Bing Liu, Junhui Wang, Natalie Glance, and Nitin Jindal. 2011. Detecting Group Review Spam. In Proceedings of the 20th International Conference Companion on World Wide Web(WWW '11). ACM, New York, NY, USA, 93-94.
[34]
Arjun Mukherjee, Vivek Venkataraman, Bing Liu, and Natalie Glance. 2013. What yelp fake review filter might be doing?. In Seventh international AAAI conference on weblogs and social media.
[35]
Arjun Mukherjee, Vivek Venkataraman, Bing Liu, and Natalie Glance. 2013. What Yelp Fake Review Filter Might Be Doing?http://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/view/6006
[36]
Myle Ott, Claire Cardie, and Jeffrey T Hancock. 2013. Negative Deceptive Opinion Spam. In HLT-NAACL. 497-501.
[37]
Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T Hancock. 2011. Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 309-319.
[38]
Gabriele Paolacci, Jesse Chandler, and Panagiotis G Ipeirotis. 2010. Running experiments on amazon mechanical turk. (2010).
[39]
Mahmudur Rahman, Bogdan Carbunar, Jaime Ballesteros, George Burri, Duen Horng, 2014. Turning the Tide: Curbing Deceptive Yelp Behaviors. In SDM. SIAM, SIAM, 244-252.
[40]
Juan Ramos 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning.
[41]
Reviews That Stick. 2017. Buy Positive Yelp Reviews. http://reviewsthatstick.com/yelp/. (2017).
[42]
Gordon J Ross 2013. Parametric and nonparametric sequential change detection in R: The cpm package. Journal of Statistical Software 78 (2013).
[43]
ANDREW JHON Scott and M Knott. 1974. A cluster analysis method for grouping means in the analysis of variance. Biometrics (1974), 507-512.
[44]
Review Shepherd. 2017. How To Get Yelp Reviews. https://reviewshepherd.com/articles/get-yelp-reviews/. (2017).
[45]
Tim Parker. 2017. Posting Fake Reviews For Your Business May Cost You. https://quickbooks.intuit.com/r/marketing/posting-fake-reviews-for-your-business-may-cost-you/. (2017).
[46]
Bimal Viswanath, M. Ahmad Bashir, Mark Crovella, Saikat Guha, Krishna P. Gummadi, Balachander Krishnamurthy, and Alan Mislove. 2014. Towards Detecting Anomalous User Behavior in Online Social Networks. In 23rd USENIX Security Symposium (USENIX Security 14). San Diego, CA, 223-238.
[47]
Gang Wang, Tianyi Wang, Haitao Zheng, and Ben Y Zhao. 2014. Man vs. Machine: Practical Adversarial Detection of Malicious Crowdsourcing Workers. In USENIX Security Symposium. 239-254.
[48]
Guan Wang, Sihong Xie, Bing Liu, and Philip S Yu. 2012. Identify online store review spammers via social review graph. ACM Transactions on Intelligent Systems and Technology (TIST) 3, 4(2012), 61.
[49]
Yuanshun Yao, Bimal Viswanath, Jenna Cryan, Haitao Zheng, and Ben Y. Zhao. 2017. Automated Crowdturfing Attacks and Defenses in Online Review Systems. In ACM Conference on Computer and Communications Security (CCS ?17). Dallas, Texas.
[50]
Yelp. 2016. Yelp Dataset Challenge. (September 2016). https://www.yelp.com/dataset_challenge.
[51]
Kyung-Hyan Yoo and Ulrike Gretzel. 2009. Comparison of deceptive and truthful travel reviews. Information and communication technologies in tourism 2009 (2009), 37-47.
[52]
Bianca Zadrozny, John Langford, and Naoki Abe. 2003. Cost-sensitive learning by cost-proportionate example weighting. In Data Mining, 2003. ICDM 2003. Third IEEE International Conference on. IEEE, 435-442.
[53]
Wen Zhang, Taketoshi Yoshida, and Xijin Tang. 2011. A comparative study of TF* IDF, LSI and multi-words for text classification. Expert Systems with Applications 38, 3 (2011), 2758-2765.

Cited By

View all
  • (2024)A multimodal travel route recommendation system leveraging visual Transformers and self-attention mechanismsFrontiers in Neurorobotics10.3389/fnbot.2024.143919518Online publication date: 26-Nov-2024
  • (2024)Detecting Spam Movie Review Under Coordinated Attack With Multi-View Explicit and Implicit Relations Semantics FusionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.344194719(7588-7603)Online publication date: 2024
  • (2024)A Weighted Stacking Ensemble Model With Sampling for Fake Reviews DetectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.326854811:2(2578-2594)Online publication date: Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '19: The World Wide Web Conference
May 2019
3620 pages
ISBN:9781450366748
DOI:10.1145/3308558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Change-Point Analysis;
  2. Cross-Dataset Analysis
  3. Fraudulent Reviews and Campaigns
  4. Review Websites

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '19
WWW '19: The Web Conference
May 13 - 17, 2019
CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)3
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A multimodal travel route recommendation system leveraging visual Transformers and self-attention mechanismsFrontiers in Neurorobotics10.3389/fnbot.2024.143919518Online publication date: 26-Nov-2024
  • (2024)Detecting Spam Movie Review Under Coordinated Attack With Multi-View Explicit and Implicit Relations Semantics FusionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.344194719(7588-7603)Online publication date: 2024
  • (2024)A Weighted Stacking Ensemble Model With Sampling for Fake Reviews DetectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.326854811:2(2578-2594)Online publication date: Apr-2024
  • (2023)An explainable ensemble of multi-view deep learning model for fake review detectionJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.10164435:8(101644)Online publication date: Sep-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media