More Web Proxy on the site http://driver.im/

research-article

Authorship Attribution with Temporal Data in Reddit

Authors:

Guilherme Ramos Casimiro,

Luciano Antonio DigiampietriAuthors Info & Claims

SBSI '22: Proceedings of the XVIII Brazilian Symposium on Information Systems

Article No.: 4, Pages 1 - 8

https://doi.org/10.1145/3535511.3535515

Published: 30 June 2022 Publication History

Abstract

Context: The practicality brought by the use of smartphones has resulted, in recent years, in greater interaction through online social networks. Problem: Social networks can influence users both positively and negatively, one of the negative impacts is the spread of fake news. In this context, identifying the correct source of information or whether the information is true becomes an extremely relevant activity. Solution: This paper presents an approach for authorship attributions that combines text mining and temporal analysis techniques. IS Theory: This work is under the Social Network Theory, in particular, the user interaction through a forum network model, in which each post creates a comment thread and the user can reply or not inside the thread. Method: This work is a controlled experiment and it aims to extend a previous case study that used a classification between two and ten authors. The results were validated through a quantitative approach. Summary of Results: Among 10 authors, classification results had more than 97% of accuracy with chars feature having more than 99% of accuracy, among 100 authors all features presented more than 70% of accuracy. Contributions and Impact in the IS area: The main contribution of this works is to validate the authorship attribution in a big data context, using significant features and a robust classifier model.

References

[1]

A. Abbasi and H. Chen. 2005. Applying authorship analysis to extremist-group Web forum messages. IEEE Intelligent Systems 20, 5 (Sep. 2005), 67–75. https://doi.org/10.1109/MIS.2005.81

Digital Library

[2]

Charu C Aggarwal. 2018. Machine learning for text. Springer.

[3]

Jafar Albadarneh, Bashar Talafha, Mahmoud Al-Ayyoub, Belal Zaqaibeh, Mohammad Al-Smadi, Yaser Jararweh, and Elhadj Benkhelifa. 2015. Using Big Data Analytics for Authorship Authentication of Arabic Tweets. In Proceedings of the 8th International Conference on Utility and Cloud Computing (Limassol, Cyprus) (UCC ’15). IEEE Press, Piscataway, NJ, USA, 448–452. http://dl.acm.org/citation.cfm?id=3233397.3233483

Digital Library

[4]

Hosein Azarbonyad, Mostafa Dehghani, Maarten Marx, and Jaap Kamps. 2015. Time-Aware Authorship Attribution for Short Text Streams. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval(Santiago, Chile) (SIGIR ’15). ACM, New York, NY, USA, 727–730. https://doi.org/10.1145/2766462.2767799

Digital Library

[5]

R. Banga and P. Mehndiratta. 2017. Authorship attribution for textual data on online social networks. In 2017 Tenth International Conference on Contemporary Computing (IC3). 1–7. https://doi.org/10.1109/IC3.2017.8284311

[6]

Jason Baumgartner. 2019. Pushshift Reddit comments. https://files.pushshift.io/reddit/comments/

[7]

Guilherme Ramos Casimiro and Luciano Antonio Digiampietri. 2020. Authorship Attribution using data from Reddit forum. In XVI Brazilian Symposium on Information Systems. 1–8.

Digital Library

[8]

S. H. H. Ding, B. C. M. Fung, F. Iqbal, and W. K. Cheung. 2019. Learning Stylometric Representations for Authorship Analysis. IEEE Transactions on Cybernetics 49, 1 (Jan 2019), 107–121. https://doi.org/10.1109/TCYB.2017.2766189

[9]

S. E. M. El Bouanani and I. Kassou. 2013. Using lexicometry and vocabulary analysis techniques to detect a signature for web profile. In 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013). 1494–1498. https://doi.org/10.1145/2492517.2558568

Digital Library

[10]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT press.

Digital Library

[11]

Oren Halvani, Christian Winter, and Lukas Graner. 2017. On the Usefulness of Compression Models for Authorship Verification. In Proceedings of the 12th International Conference on Availability, Reliability and Security(Reggio Calabria, Italy) (ARES ’17). ACM, New York, NY, USA, Article 54, 10 pages. https://doi.org/10.1145/3098954.3104050

Digital Library

[12]

Moshe Koppel, Jonathan Schler, Shlomo Argamon, and Eran Messeri. 2006. Authorship Attribution with Thousands of Candidate Authors. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Seattle, Washington, USA) (SIGIR ’06). ACM, New York, NY, USA, 659–660. https://doi.org/10.1145/1148170.1148304

Digital Library

[13]

R. Layton, P. Watters, and R. Dazeley. 2010. Authorship Attribution for Twitter in 140 Characters or Less. In 2010 Second Cybercrime and Trustworthy Computing Workshop. 1–8. https://doi.org/10.1109/CTC.2010.17

Digital Library

[14]

Hoi Le and Reihaneh Safavi-Naini. 2018. On De-anonymization of Single Tweet Messages. In Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics (Tempe, AZ, USA) (IWSPA ’18). ACM, New York, NY, USA, 8–14. https://doi.org/10.1145/3180445.3180451

Digital Library

[15]

Tatiana Litvinova, Olga Litvinova, and Polina Panicheva. 2019. Authorship Attribution of Russian Forum Posts with Different Types of N-gram Features. In Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval (Tokushima, Japan) (NLPIR 2019). ACM, New York, NY, USA, 9–14. https://doi.org/10.1145/3342827.3342834

Digital Library

[16]

Brendan O’Connor, Ramnath Balasubramanyan, Bryan R Routledge, and Noah A Smith. 2010. From tweets to polls: Linking text sentiment to public opinion time series. In Fourth International AAAI Conference on Weblogs and Social Media.

[17]

C. Perez, B. Birregah, R. Layton, M. Lemercier, and P. Watters. 2013. REPLOT: Retrieving profile links on Twitter for suspicious networks detection. In 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013). 1307–1314. https://doi.org/10.1145/2492517.2500234

Digital Library

[18]

S. Petrasova, N. Khairova, and W. Lewoniewski. 2018. Building the Semantic Similarity Model for Social Network Data Streams. In 2018 IEEE Second International Conference on Data Stream Mining Processing (DSMP). 21–24. https://doi.org/10.1109/DSMP.2018.8478480

[19]

S. R. Pillay and T. Solorio. 2010. Authorship attribution of web forum posts. In 2010 eCrime Researchers Summit. 1–7. https://doi.org/10.1109/ecrime.2010.5706693

[20]

S. E. Seker, K. Al-Naami, and L. Khan. 2013. Author attribution on streaming data. In 2013 IEEE 14th International Conference on Information Reuse Integration (IRI). 497–503. https://doi.org/10.1109/IRI.2013.6642511

[21]

M. Spitters, F. Klaver, G. Koot, and M. v. Staalduinen. 2015. Authorship Analysis on Dark Marketplace Forums. In 2015 European Intelligence and Security Informatics Conference. 1–8. https://doi.org/10.1109/EISIC.2015.47

Digital Library

[22]

Efstathios Stamatatos. 2009. A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology 60, 3 (2009), 538–556.

[23]

M. Sultana, P. Polash, and M. Gavrilova. 2017. Authorship recognition of tweets: A comparison between social behavior and linguistic profiles. In 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC). 471–476. https://doi.org/10.1109/SMC.2017.8122650

Digital Library

[24]

R. H. R. Tan and F. S. Tsai. 2010. Authorship Identification for Online Text. In 2010 International Conference on Cyberworlds. 155–162. https://doi.org/10.1109/CW.2010.50

Digital Library

[25]

J. Yan and S. J. Matthews. 2016. Applying clustering algorithms to determine authorship of chinese twitter messages. In 2016 IEEE MIT Undergraduate Research Technology Conference (URTC). 1–4. https://doi.org/10.1109/URTC.2016.8361150

Index Terms

Authorship Attribution with Temporal Data in Reddit
1. Computing methodologies
2. Information systems
  1. Information systems applications
    1. Data mining

Index terms have been assigned to the content through auto-classification.

Recommendations

Authorship Attribution using data from Reddit forum
SBSI '20: Proceedings of the XVI Brazilian Symposium on Information Systems

As the online social networks become more and more part of people’s daily lives, analyzes of the content posted in these media to avoid the circulation of fake news or doubtful authorship become necessary. This paper analyzes comments from a Reddit ...
Stylometric Analysis for Authorship Attribution on Twitter
BDA 2013: Proceedings of the Second International Conference on Big Data Analytics - Volume 8302

Authorship Attribution (AA), the science of inferring an author for a given piece of text based on its characteristics is a problem with a long history. In this paper, we study the problem of authorship attribution for forensic purposes and present ...
Code Authorship Attribution: Methods and Challenges

Code authorship attribution is the process of identifying the author of a given code. With increasing numbers of malware and advanced mutation techniques, the authors of malware are creating a large number of malware variants. To better deal with this ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

SBSI '22: Proceedings of the XVIII Brazilian Symposium on Information Systems

May 2022

394 pages

ISBN:9781450396981

DOI:10.1145/3535511

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SBSI '22

SBSI '22: XVIII Brazilian Symposium on Information Systems

May 16 - 19, 2022

Curitiba, Brazil

Acceptance Rates

Overall Acceptance Rate 181 of 557 submissions, 32%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
86
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)3

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents