[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3535511.3535515acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbsiConference Proceedingsconference-collections
research-article

Authorship Attribution with Temporal Data in Reddit

Published: 30 June 2022 Publication History

Abstract

Context: The practicality brought by the use of smartphones has resulted, in recent years, in greater interaction through online social networks. Problem: Social networks can influence users both positively and negatively, one of the negative impacts is the spread of fake news. In this context, identifying the correct source of information or whether the information is true becomes an extremely relevant activity. Solution: This paper presents an approach for authorship attributions that combines text mining and temporal analysis techniques. IS Theory: This work is under the Social Network Theory, in particular, the user interaction through a forum network model, in which each post creates a comment thread and the user can reply or not inside the thread. Method: This work is a controlled experiment and it aims to extend a previous case study that used a classification between two and ten authors. The results were validated through a quantitative approach. Summary of Results: Among 10 authors, classification results had more than 97% of accuracy with chars feature having more than 99% of accuracy, among 100 authors all features presented more than 70% of accuracy. Contributions and Impact in the IS area: The main contribution of this works is to validate the authorship attribution in a big data context, using significant features and a robust classifier model.

References

[1]
A. Abbasi and H. Chen. 2005. Applying authorship analysis to extremist-group Web forum messages. IEEE Intelligent Systems 20, 5 (Sep. 2005), 67–75. https://doi.org/10.1109/MIS.2005.81
[2]
Charu C Aggarwal. 2018. Machine learning for text. Springer.
[3]
Jafar Albadarneh, Bashar Talafha, Mahmoud Al-Ayyoub, Belal Zaqaibeh, Mohammad Al-Smadi, Yaser Jararweh, and Elhadj Benkhelifa. 2015. Using Big Data Analytics for Authorship Authentication of Arabic Tweets. In Proceedings of the 8th International Conference on Utility and Cloud Computing (Limassol, Cyprus) (UCC ’15). IEEE Press, Piscataway, NJ, USA, 448–452. http://dl.acm.org/citation.cfm?id=3233397.3233483
[4]
Hosein Azarbonyad, Mostafa Dehghani, Maarten Marx, and Jaap Kamps. 2015. Time-Aware Authorship Attribution for Short Text Streams. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval(Santiago, Chile) (SIGIR ’15). ACM, New York, NY, USA, 727–730. https://doi.org/10.1145/2766462.2767799
[5]
R. Banga and P. Mehndiratta. 2017. Authorship attribution for textual data on online social networks. In 2017 Tenth International Conference on Contemporary Computing (IC3). 1–7. https://doi.org/10.1109/IC3.2017.8284311
[6]
Jason Baumgartner. 2019. Pushshift Reddit comments. https://files.pushshift.io/reddit/comments/
[7]
Guilherme Ramos Casimiro and Luciano Antonio Digiampietri. 2020. Authorship Attribution using data from Reddit forum. In XVI Brazilian Symposium on Information Systems. 1–8.
[8]
S. H. H. Ding, B. C. M. Fung, F. Iqbal, and W. K. Cheung. 2019. Learning Stylometric Representations for Authorship Analysis. IEEE Transactions on Cybernetics 49, 1 (Jan 2019), 107–121. https://doi.org/10.1109/TCYB.2017.2766189
[9]
S. E. M. El Bouanani and I. Kassou. 2013. Using lexicometry and vocabulary analysis techniques to detect a signature for web profile. In 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013). 1494–1498. https://doi.org/10.1145/2492517.2558568
[10]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT press.
[11]
Oren Halvani, Christian Winter, and Lukas Graner. 2017. On the Usefulness of Compression Models for Authorship Verification. In Proceedings of the 12th International Conference on Availability, Reliability and Security(Reggio Calabria, Italy) (ARES ’17). ACM, New York, NY, USA, Article 54, 10 pages. https://doi.org/10.1145/3098954.3104050
[12]
Moshe Koppel, Jonathan Schler, Shlomo Argamon, and Eran Messeri. 2006. Authorship Attribution with Thousands of Candidate Authors. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Seattle, Washington, USA) (SIGIR ’06). ACM, New York, NY, USA, 659–660. https://doi.org/10.1145/1148170.1148304
[13]
R. Layton, P. Watters, and R. Dazeley. 2010. Authorship Attribution for Twitter in 140 Characters or Less. In 2010 Second Cybercrime and Trustworthy Computing Workshop. 1–8. https://doi.org/10.1109/CTC.2010.17
[14]
Hoi Le and Reihaneh Safavi-Naini. 2018. On De-anonymization of Single Tweet Messages. In Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics (Tempe, AZ, USA) (IWSPA ’18). ACM, New York, NY, USA, 8–14. https://doi.org/10.1145/3180445.3180451
[15]
Tatiana Litvinova, Olga Litvinova, and Polina Panicheva. 2019. Authorship Attribution of Russian Forum Posts with Different Types of N-gram Features. In Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval (Tokushima, Japan) (NLPIR 2019). ACM, New York, NY, USA, 9–14. https://doi.org/10.1145/3342827.3342834
[16]
Brendan O’Connor, Ramnath Balasubramanyan, Bryan R Routledge, and Noah A Smith. 2010. From tweets to polls: Linking text sentiment to public opinion time series. In Fourth International AAAI Conference on Weblogs and Social Media.
[17]
C. Perez, B. Birregah, R. Layton, M. Lemercier, and P. Watters. 2013. REPLOT: Retrieving profile links on Twitter for suspicious networks detection. In 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013). 1307–1314. https://doi.org/10.1145/2492517.2500234
[18]
S. Petrasova, N. Khairova, and W. Lewoniewski. 2018. Building the Semantic Similarity Model for Social Network Data Streams. In 2018 IEEE Second International Conference on Data Stream Mining Processing (DSMP). 21–24. https://doi.org/10.1109/DSMP.2018.8478480
[19]
S. R. Pillay and T. Solorio. 2010. Authorship attribution of web forum posts. In 2010 eCrime Researchers Summit. 1–7. https://doi.org/10.1109/ecrime.2010.5706693
[20]
S. E. Seker, K. Al-Naami, and L. Khan. 2013. Author attribution on streaming data. In 2013 IEEE 14th International Conference on Information Reuse Integration (IRI). 497–503. https://doi.org/10.1109/IRI.2013.6642511
[21]
M. Spitters, F. Klaver, G. Koot, and M. v. Staalduinen. 2015. Authorship Analysis on Dark Marketplace Forums. In 2015 European Intelligence and Security Informatics Conference. 1–8. https://doi.org/10.1109/EISIC.2015.47
[22]
Efstathios Stamatatos. 2009. A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology 60, 3 (2009), 538–556.
[23]
M. Sultana, P. Polash, and M. Gavrilova. 2017. Authorship recognition of tweets: A comparison between social behavior and linguistic profiles. In 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC). 471–476. https://doi.org/10.1109/SMC.2017.8122650
[24]
R. H. R. Tan and F. S. Tsai. 2010. Authorship Identification for Online Text. In 2010 International Conference on Cyberworlds. 155–162. https://doi.org/10.1109/CW.2010.50
[25]
J. Yan and S. J. Matthews. 2016. Applying clustering algorithms to determine authorship of chinese twitter messages. In 2016 IEEE MIT Undergraduate Research Technology Conference (URTC). 1–4. https://doi.org/10.1109/URTC.2016.8361150

Index Terms

  1. Authorship Attribution with Temporal Data in Reddit
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      SBSI '22: Proceedings of the XVIII Brazilian Symposium on Information Systems
      May 2022
      394 pages
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 June 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Authorship analysis
      2. Online social media
      3. Temporal data
      4. Text mining

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      SBSI '22

      Acceptance Rates

      Overall Acceptance Rate 181 of 557 submissions, 32%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 86
        Total Downloads
      • Downloads (Last 12 months)15
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 25 Dec 2024

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media