More Web Proxy on the site http://driver.im/

research-article

Think Outside the Dataset: Finding Fraudulent Reviews using Cross-Dataset Analysis

Authors:

Shirin Nilizadeh,

Hojjat Aghakhani,

Eric Gustafson,

Christopher Kruegel,

Giovanni VignaAuthors Info & Claims

WWW '19: The World Wide Web Conference

Pages 3108 - 3115

https://doi.org/10.1145/3308558.3313647

Published: 13 May 2019 Publication History

Abstract

While online review services provide a two-way conversation between brands and consumers, malicious actors, including misbehaving businesses, have an equal opportunity to distort the reviews for their own gains. We propose OneReview, a method for locating fraudulent reviews, correlating data from multiple crowd-sourced review sites. Our approach utilizes Change Point Analysis to locate points at which a business' reputation shifts. Inconsistent trends in reviews of the same businesses across multiple websites are used to identify suspicious reviews. We then extract an extensive set of textual and contextual features from these suspicious reviews and employ supervised machine learning to detect fraudulent reviews.

We evaluated OneReview on about 805K and 462K reviews from Yelp and TripAdvisor, respectively to identify fraud on Yelp. Supervised machine learning yields excellent results, with 97% accuracy. We applied the created model on suspicious reviews and detected about 62K fraudulent reviews (about 8% of all the Yelp reviews). We further analyzed the detected fraudulent reviews and their authors, and located several spam campaigns in the wild, including campaigns against specific businesses, as well as campaigns consisting of several hundreds of socially-networked untrustworthy accounts.

References

[1]

Hojjat Aghakhani, Aravind Machiry, Shirin Nilizadeh, Christopher Kruegel, and Giovanni Vigna. 2018. Detecting Deceptive Reviews using Generative Adversarial Networks. arXiv preprint arXiv:1805.10364(2018).

[2]

Hirotogu Akaike, BN Petrov, and F Csaki. 1973. Information theory and an extension of the maximum likelihood principle. (1973).

[3]

Amazon. 2018. About Amazon Mechanical Turk. https://www.mturk.com/worker/help

[4]

Michael Anderson and Jeremy Magruder. 2012. Learning from the crowd: Regression discontinuity estimates of the effects of an online review database. The Economic Journal 122, 563 (2012), 957-989.

[5]

Anonymous. 2018. Get Paid to Write Reviews: 27 Sites That Pay You (with Cash & Free Stuff!). http://moneypantry.com/get-paid-to-write-reviews/. (2018).

[6]

Anonymous and Symon, Evan V. 2016. I Get Paid To Write Fake Reviews For Amazon. http://www.cracked.com/personal-experiences-2376-i-get-paid-to-write-fake-reviews-amazon.html. (2016).

[7]

Ivan E Auger and Charles E Lawrence. 1989. Algorithms for the optimal identification of segment neighborhoods. Bulletin of mathematical biology 51, 1 (1989), 39-54.

[8]

Steven Bird. 2006. NLTK: the natural language toolkit. In Proceedings of the COLING/ACL on Interactive presentation sessions. Association for Computational Linguistics, 69-72.

Digital Library

[9]

Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008, 10(2008), P10008.

[10]

Charles F Bond Jr and Bella M DePaulo. 2006. Accuracy of deception judgments. Personality and social psychology Review 10, 3 (2006).

[11]

Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5-32.

Digital Library

[12]

Rich Caruana and Alexandru Niculescu-Mizil. 2006. An Empirical Comparison of Supervised Learning Algorithms. In Proceedings of the 23rd International Conference on Machine Learning(ICML '06). ACM, New York, NY, USA, 161-168.

Digital Library

[13]

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321-357.

[14]

Jie Chen and Arjun K Gupta. 2011. Parametric statistical change point analysis: with applications to genetics, medicine, and finance. Springer Science & Business Media.

[15]

Geli Fei, Arjun Mukherjee, Bing Liu, Meichun Hsu, Malú Castellanos, and Riddhiman Ghosh. 2013. Exploiting Burstiness in Reviews for Review Spammer Detection. In ICWSM. The AAAI Press.

[16]

Song Feng, Ritwik Banerjee, and Yejin Choi. 2012. Syntactic stylometry for deception detection. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2. Association for Computational Linguistics, 171-175.

Digital Library

[17]

Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2001. The elements of statistical learning. Vol. 1. Springer series in statistics Springer, Berlin. 587-588 pages.

[18]

Douglas M Hawkins. 2001. Fitting multiple change-point models to data. Computational Statistics & Data Analysis 37, 3 (2001), 323-341.

Digital Library

[19]

Nitin Jindal and Bing Liu. 2008. Opinion Spam and Analysis. In Proceedings of the 2008 International Conference on Web Search and Data Mining(WSDM '08). ACM, New York, NY, USA, 219-230.

Digital Library

[20]

H Tim Kam. 1995. Random decision forest. In Proc. of the 3rd Int'l Conf. on Document Analysis and Recognition, Montreal, Canada, August. 14-18.

[21]

Rebecca Killick, Paul Fearnhead, and Idris A Eckley. 2012. Optimal detection of changepoints with a linear computational cost. J. Amer. Statist. Assoc. 107, 500 (2012), 1590-1598.

[22]

Raymond Y. K. Lau, S. Y. Liao, Ron Chi-Wai Kwok, Kaiquan Xu, Yunqing Xia, and Yuefeng Li. 2012. Text Mining and Probabilistic Language Modeling for Online Review Spam Detection. ACM Trans. Manage. Inf. Syst. 2, 4, Article 25 (Jan. 2012), 30 pages.

Digital Library

[23]

Hee Andy Lee, Rob Law, and Jamie Murphy. 2011. Helpful reviewers in TripAdvisor, an online travel community. Journal of Travel & Tourism Marketing 28, 7 (2011), 675-688.

[24]

Huayi Li, Zhiyuan Chen, Arjun Mukherjee, Bing Liu, and Jidong Shao. 2015. Analyzing and Detecting Opinion Spam on a Large-scale Dataset via Temporal and Spatial Patterns. In ICWSM. 634-637.

[25]

Jiwei Li, Myle Ott, Claire Cardie, and Eduard H Hovy. 2014. Towards a General Rule for Identifying Deceptive Opinion Spam. In ACL (1). Citeseer, 1566-1576.

[26]

Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu, and Hady Wirawan Lauw. 2010. Detecting product review spammers using rating behaviors. In Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, 939-948.

Digital Library

[27]

Yuming Lin, Tao Zhu, Hao Wu, Jingwei Zhang, Xiaoling Wang, and Aoying Zhou. 2014. Towards online anti-opinion spam: Spotting fake reviews from the review sequence. In Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on. IEEE, 261-264.

Digital Library

[28]

Michael Luca and Georgios Zervas. 2016. Fake it till you make it: Reputation, competition, and Yelp review fraud. Management Science (2016).

Digital Library

[29]

Justin Ma, Lawrence K Saul, Stefan Savage, and Geoffrey M Voelker. 2011. Learning to detect malicious urls. ACM Transactions on Intelligent Systems and Technology (TIST) 2, 3(2011), 30.

Digital Library

[30]

Microsoft. 2017. Febipos.A Malware.http://www.microsoft.com/security/portal/threat/encyclopedia/Entry.aspx?Name=Trojan:JS/Febipos.A. (September 2017).

[31]

Arjun Mukherjee, Abhinav Kumar, Bing Liu, Junhui Wang, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh. 2013. Spotting opinion spammers using behavioral footprints. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 632-640.

Digital Library

[32]

Arjun Mukherjee, Bing Liu, and Natalie Glance. 2012. Spotting fake reviewer groups in consumer reviews. In Proceedings of the 21st international conference on World Wide Web. ACM, 191-200.

Digital Library

[33]

Arjun Mukherjee, Bing Liu, Junhui Wang, Natalie Glance, and Nitin Jindal. 2011. Detecting Group Review Spam. In Proceedings of the 20th International Conference Companion on World Wide Web(WWW '11). ACM, New York, NY, USA, 93-94.

Digital Library

[34]

Arjun Mukherjee, Vivek Venkataraman, Bing Liu, and Natalie Glance. 2013. What yelp fake review filter might be doing?. In Seventh international AAAI conference on weblogs and social media.

[35]

Arjun Mukherjee, Vivek Venkataraman, Bing Liu, and Natalie Glance. 2013. What Yelp Fake Review Filter Might Be Doing?http://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/view/6006

[36]

Myle Ott, Claire Cardie, and Jeffrey T Hancock. 2013. Negative Deceptive Opinion Spam. In HLT-NAACL. 497-501.

[37]

Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T Hancock. 2011. Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 309-319.

Digital Library

[38]

Gabriele Paolacci, Jesse Chandler, and Panagiotis G Ipeirotis. 2010. Running experiments on amazon mechanical turk. (2010).

[39]

Mahmudur Rahman, Bogdan Carbunar, Jaime Ballesteros, George Burri, Duen Horng, 2014. Turning the Tide: Curbing Deceptive Yelp Behaviors. In SDM. SIAM, SIAM, 244-252.

[40]

Juan Ramos 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning.

[41]

Reviews That Stick. 2017. Buy Positive Yelp Reviews. http://reviewsthatstick.com/yelp/. (2017).

[42]

Gordon J Ross 2013. Parametric and nonparametric sequential change detection in R: The cpm package. Journal of Statistical Software 78 (2013).

[43]

ANDREW JHON Scott and M Knott. 1974. A cluster analysis method for grouping means in the analysis of variance. Biometrics (1974), 507-512.

[44]

Review Shepherd. 2017. How To Get Yelp Reviews. https://reviewshepherd.com/articles/get-yelp-reviews/. (2017).

[45]

Tim Parker. 2017. Posting Fake Reviews For Your Business May Cost You. https://quickbooks.intuit.com/r/marketing/posting-fake-reviews-for-your-business-may-cost-you/. (2017).

[46]

Bimal Viswanath, M. Ahmad Bashir, Mark Crovella, Saikat Guha, Krishna P. Gummadi, Balachander Krishnamurthy, and Alan Mislove. 2014. Towards Detecting Anomalous User Behavior in Online Social Networks. In 23rd USENIX Security Symposium (USENIX Security 14). San Diego, CA, 223-238.

Digital Library

[47]

Gang Wang, Tianyi Wang, Haitao Zheng, and Ben Y Zhao. 2014. Man vs. Machine: Practical Adversarial Detection of Malicious Crowdsourcing Workers. In USENIX Security Symposium. 239-254.

Digital Library

[48]

Guan Wang, Sihong Xie, Bing Liu, and Philip S Yu. 2012. Identify online store review spammers via social review graph. ACM Transactions on Intelligent Systems and Technology (TIST) 3, 4(2012), 61.

Digital Library

[49]

Yuanshun Yao, Bimal Viswanath, Jenna Cryan, Haitao Zheng, and Ben Y. Zhao. 2017. Automated Crowdturfing Attacks and Defenses in Online Review Systems. In ACM Conference on Computer and Communications Security (CCS ?17). Dallas, Texas.

Digital Library

[50]

Yelp. 2016. Yelp Dataset Challenge. (September 2016). https://www.yelp.com/dataset_challenge.

[51]

Kyung-Hyan Yoo and Ulrike Gretzel. 2009. Comparison of deceptive and truthful travel reviews. Information and communication technologies in tourism 2009 (2009), 37-47.

[52]

Bianca Zadrozny, John Langford, and Naoki Abe. 2003. Cost-sensitive learning by cost-proportionate example weighting. In Data Mining, 2003. ICDM 2003. Third IEEE International Conference on. IEEE, 435-442.

Digital Library

[53]

Wen Zhang, Taketoshi Yoshida, and Xijin Tang. 2011. A comparative study of TF* IDF, LSI and multi-words for text classification. Expert Systems with Applications 38, 3 (2011), 2758-2765.

Digital Library

Cited By

Juan ZZhang JGao M(2024)A multimodal travel route recommendation system leveraging visual Transformers and self-attention mechanismsFrontiers in Neurorobotics10.3389/fnbot.2024.143919518Online publication date: 26-Nov-2024
https://doi.org/10.3389/fnbot.2024.1439195
Cai YWang HCao HWang WZhang LChen X(2024)Detecting Spam Movie Review Under Coordinated Attack With Multi-View Explicit and Implicit Relations Semantics FusionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.344194719(7588-7603)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3441947
Singhal RKashef R(2024)A Weighted Stacking Ensemble Model With Sampling for Fake Reviews DetectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.326854811:2(2578-2594)Online publication date: Apr-2024
https://doi.org/10.1109/TCSS.2023.3268548
Show More Cited By

Recommendations

Lightning Talk - Think Outside the Dataset: Finding Fraudulent Reviews using Cross-Dataset Analysis
WWW '19: Companion Proceedings of The 2019 World Wide Web Conference

Many crowd-sourced review platforms, such as Yelp, TripAdvisor, and Foursquare, have sprung up to provide a shared space for people to write reviews and rate local businesses. With the substantial impact of businesses’ online ratings on their selling [2]...
Stopping Intruders Outside the Gates

As networks have grown in size and complexity, connecting a vast array of business functions, intrusion threats have increased in frequency and sophistication. Network administrators and vendors are thus looking beyond traditional intrusion detection ...
The MALICIA dataset: identification and analysis of drive-by download operations

Drive-by downloads are the preferred distribution vector for many malware families. In the drive-by ecosystem, many exploit servers run the same exploit kit and it is a challenge understanding whether the exploit server is part of a larger operation. In ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '19: The World Wide Web Conference

May 2019

3620 pages

ISBN:9781450366748

DOI:10.1145/3308558

Editors:
Ling Liu
Georgia Tech, USA
,
Ryen White
Microsoft Research, USA

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '19

WWW '19: The Web Conference

May 13 - 17, 2019

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
516
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)3

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Juan ZZhang JGao M(2024)A multimodal travel route recommendation system leveraging visual Transformers and self-attention mechanismsFrontiers in Neurorobotics10.3389/fnbot.2024.143919518Online publication date: 26-Nov-2024
https://doi.org/10.3389/fnbot.2024.1439195
Cai YWang HCao HWang WZhang LChen X(2024)Detecting Spam Movie Review Under Coordinated Attack With Multi-View Explicit and Implicit Relations Semantics FusionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.344194719(7588-7603)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3441947
Singhal RKashef R(2024)A Weighted Stacking Ensemble Model With Sampling for Fake Reviews DetectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.326854811:2(2578-2594)Online publication date: Apr-2024
https://doi.org/10.1109/TCSS.2023.3268548
Mohawesh RXu SSpringer MJararweh YAl-Hawawreh MMaqsood S(2023)An explainable ensemble of multi-view deep learning model for fake review detectionJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.10164435:8(101644)Online publication date: Sep-2023
https://doi.org/10.1016/j.jksuci.2023.101644

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents