[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Applying Authorship Analysis to Extremist-Group Web Forum Messages

Published: 01 September 2005 Publication History

Abstract

One major challenge facing the intelligence and security community is monitoring online media for terrorist group communications. This study addresses the online anonymity problem by applying authorship analysis to English and Arabic extremist group Web forum messages. The study evaluates the performance impact of different feature categories and techniques across both languages. To enhance writing style identification, researchers incorporated a comprehensive list of online authorship features. Additionally, they created an Arabic language model by adopting specific features and techniques, including an elongation filter and a root-clustering algorithm, to handle challenging linguistic characteristics. A series of experiments indicated a high level of efficacy in the models. Finally, the authors compare the English and Arabic language models and messages to aid the research community's understanding of the dynamics of these groups' authorship tendencies.This article is part of a special issue on Homeland Security.

References

[1]
R. Zheng, et al., "A Framework of Authorship Identification for Online Messages: Writing Style Features and Classification Techniques," to be published in J. Am. Soc. Information Science and Technology (Jasist), 2005.
[2]
J. Rudman, "The State of Authorship Attribution Studies: Some Problems and Solutions, Computers and the Humanities, vol. 31, 1998, pp. 351–365.
[3]
G.U. Yule, The Statistical Study of Literary Vocabulary, Cambridge Univ. Press, 1944.
[4]
O. De Vel, et al., "Mining E-mail Content for Author Identification Forensics," SIGMOD Record, vol. 30, no. 4, 2001, pp. 55–64.
[5]
J.W. Palmer and D.A. Griffith, "An Emerging Model of Web Site Design for Marketing," Comm. ACM, vol. 41, no. 3, 1998, pp. 44–51.
[6]
J.F. Burrows, "Word Patterns and Story Shapes: The Statistical Analysis of Narrative Style," Literary and Linguistic Computing, vol. 2, 1987, pp. 61–67.
[7]
F. Peng, et al., "Automated Authorship Attribution with Character Level Language Models," presented at the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003); http://users.cs.dal.ca/~vlado/papers/2003-EACL03-139.pdf.
[8]
E. Stamatatos N. Fakotakis and G. Kokkinakis, "Computer-Based Authorship Attribution without Lexical Measures," Computers and the Humanities, vol. 35, no. 2, 2001, pp. 193–214.
[9]
K.B. Beesley, "Arabic Finite-State Morphological Analysis and Generation," Proc. 16th Int'l Conf. Computational Linguistics (COLING 96), 1996, Morgan Kaufmann, pp. 89–94.
[10]
S.S. Al-Fedaghi and F. Al-Anzi, "A New Algorithm to Generate Arabic Root-Pattern Forms," Proc. 11th Nat'l Computer Conf., KFUPM, Saudi Arabia, 1989, pp. 391–400.
[11]
L.S. Larkey and M.E. Connell, "Arabic Information Retrieval at UMass in TREC-10," Proc. 10th Text Retrieval Conf. (TREC 2001), Nat'l Inst. of Standards and Technology, 2001.
[12]
I. Hmeidi G. Kanaan and M. Evens, "Design and Implementation of Automatic Indexing for Information Retrieval with Arabic Documents," J. Am. Soc. Information Science, vol. 48, no. 10, 1997, pp. 867–881.
[13]
A.N. De Roeck and W. Al-Fares, "A Morphologically Sensitive Clustering Algorithm for Identifying Arabic Roots," Proc. Assoc. for Computational Linguistics (ACL 00), 2000; www.informatik.uni-trier.de/~ley/db/conf/acl/acl2000.html.

Cited By

View all
  • (2024)Crossing Linguistic Barriers: Authorship Attribution in Sinhala TextsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/365562023:5(1-14)Online publication date: 10-May-2024
  • (2024)Computational techniques to counter terrorism: a systematic surveyMultimedia Tools and Applications10.1007/s11042-023-15545-083:1(1189-1214)Online publication date: 1-Jan-2024
  • (2023)Automatic IQ Estimation from Written text using Stylometry MethodsProceedings of the 2023 7th International Conference on Information System and Data Mining10.1145/3603765.3603769(56-65)Online publication date: 10-May-2023
  • Show More Cited By

Recommendations

Reviews

Peter C. Patton

The question of whether an author leaves an unconscious but statistically discernable "signature" on his or her writing was first visited by Wake at Oxford in 1911. Wake was an eminent classicist, but he was not a statistician, and his sentence length statistics did not prove useful. In the 1960s, a Church of England minister and New Testament scholar, A.Q. Morton, who was a statistician, developed a statistical authorship test for Greek, and used it successfully on the Pauline Epistles, the Gospel of Luke, and the Acts of the Apostles. He and others later used it on Homer's Iliad , also with notable success. The test was very simple, but useful for Greek text; he simply counted the number of times kai was used in each sentence. Kai is a coordinating conjunction in Greek 95 percent of the time (it is an adverb the other five percent), and performs the combined roles of all the coordinating conjunctions in English (and, or, but, and so on). Alvar Ellegard developed a much more sophisticated statistical method [1] for his doctoral dissertation at Uppsala, and used it to prove that Sir Philip Francis, a British civil servant, had written the scathing Junius Letters to the London Public Advertiser criticizing King George III and his war against the American colonies. Junius Brutus killed Julius Caesar, but George III would certainly have hanged this Junius, Philip Francis, for sedition if he knew he was the author of the letters. This fascinating paper takes the unconscious authorship signature problem into new theoretical (but also very practical) realms. The paper presents new methods that go beyond Greek and English literary texts to the analysis of extremist multi-language polemics on Internet Web sites. This extension of the technology opens up new vistas. For example, Internet Web sites are a very new literary genre, and the Arabic language, with its 5,000 roots or stems, is very highly inflected. Arabic has 15 verbal conjugations, compared to Hebrew with only eight, and Indo-European languages with even fewer. The liaison issues in Arabic, which is only written cursively, and which has initial, medial, and final forms for many letters, along with infixes and consonant stacking, add to the morphological, grammatical, and syntactical interface of the language. The authors find that this craggy linguistic interface, while complex, does add some statistical hand and toe holds. Their methods show significant discriminating power in the application of authorship identification techniques to both English and Arabic messages. KKK polemics were used as a sort of English language control in the development of the methods. This well-presented, well-written paper illustrates an important and very current application of computer-based statistical methods for authorship identification. It is so good, and so relevant to our times, that I am surprised it wasn't classified by the US National Security Agency (NSA). Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Intelligent Systems
IEEE Intelligent Systems  Volume 20, Issue 5
September 2005
94 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 September 2005

Author Tags

  1. Web content analysis
  2. Web forum postings
  3. Web mining
  4. authorship analysis
  5. multilingual
  6. security
  7. text analysis

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Crossing Linguistic Barriers: Authorship Attribution in Sinhala TextsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/365562023:5(1-14)Online publication date: 10-May-2024
  • (2024)Computational techniques to counter terrorism: a systematic surveyMultimedia Tools and Applications10.1007/s11042-023-15545-083:1(1189-1214)Online publication date: 1-Jan-2024
  • (2023)Automatic IQ Estimation from Written text using Stylometry MethodsProceedings of the 2023 7th International Conference on Information System and Data Mining10.1145/3603765.3603769(56-65)Online publication date: 10-May-2023
  • (2023)Survey of Authorship Identification Tasks on Arabic TextsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/356415622:4(1-24)Online publication date: 12-Apr-2023
  • (2023)Predicting Violent Extremism with Machine Learning: A Scoping ReviewSN Computer Science10.1007/s42979-023-02355-25:1Online publication date: 16-Nov-2023
  • (2022)Design and Implementation of a Machine Learning-Based English Intelligent Test SystemWireless Communications & Mobile Computing10.1155/2022/58753802022Online publication date: 1-Jan-2022
  • (2022)Authorship Attribution in Greek Literature Using Word AdjacenciesProceedings of the 12th Hellenic Conference on Artificial Intelligence10.1145/3549737.3549750(1-9)Online publication date: 7-Sep-2022
  • (2022)Authorship Attribution with Temporal Data in RedditProceedings of the XVIII Brazilian Symposium on Information Systems10.1145/3535511.3535515(1-8)Online publication date: 16-May-2022
  • (2021)The Design of Reciprocal Learning Between Human and Artificial IntelligenceProceedings of the ACM on Human-Computer Interaction10.1145/34795875:CSCW2(1-36)Online publication date: 18-Oct-2021
  • (2021)LDA-Transformer Model in Chinese Poetry Authorship AttributionInformation Retrieval10.1007/978-3-030-88189-4_5(59-73)Online publication date: 29-Oct-2021
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media