Abstract
The aim of this study is to identify the author of an unauthorized document. Ten different feature vectors are obtained from authorship attributes, n-grams and various combinations of these feature vectors that are extracted from documents, which the authors are intended to be identified. Comparative performance of every feature vector is analyzed by applying Naïve Bayes, SVM, k-NN, RF and MLP classification methods. The most successful classifiers are MLP and SVM. In document classification process, it is observed that n-grams give higher accuracy rates than authorship attributes. Nevertheless, using n-gram and authorship attributes together, gives better results than when each is used alone.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Geritsen, C.M.: Authorship Attribution Using Lexical Attraction, Master Thesis Department of Electrical Engineering and Computer Science. MIT (2003)
Holmes, D.: The Evolution of Stylometry in Humanities Scholarship Literary and Linguistic Computing 13(3), 111–117 (1998)
Koppel, M., Schler, J.: Exploiting Stylistic Idiosyncraises for Authorship Attribution, IJCAI’03 Workshop on Computational Approaches to Style Analysis and Synthesis Acapulco, Mexico (2003)
Mosteller, F., Wallace, D.L.: Inference and Disputed Authorship: The Federalist Reading. Addison-Wesley, MA (1964)
Yule, G.U.: On Sentence Length As a Statistical Characteristic of Style in Prose with Application to Two Cases of Disputed Authorship. Biometrica 30, 363–390 (1938)
Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Computer-Based Authorship Attribution without Lexical Measures. Computers and the Humanities, 193-214 (2001)
Peng, F., Schuurmans, D., Keselj, V., Wang, S.: Language Independent Authorship Attribution using Character Level Language Models. In: 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, pp. 267-274 (2003)
Fung, G., Mangasarian, O.: The Disputed Federalist Papers: SVM Feature Selection via Concave Minimization. In: Proceedings of the, Conference of Diversity in Computing, Atlanta, Georgia, USA, pp. 42-46 (2003)
Kukushkina, O.V., Polikarpov, A.A., Khemelev, D.V.: Using Literal and Grammatical Statistics for Authorship Attribution. Problemy Peredachi Informatsii 37(2) (2000)
Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Automatic Authorship Attribution. In: Nineth Conf. European Chap. Assoc. Computational Linguistics, Bergen, Norway (1999)
Fürnkranz, J.: A Study Using n-gram Features for Text Categorization. Austrian Research Institute for Artifical Intelligence (1998)
Cavnar, W.B.: Using an n-gram-based Document Representation with a Vector Processing Retrieval Model. In: Proceedings of the Third Text Retrieval Conference(TREC-3) (1994)
Amasyalı, M.F., Diri, B.: Automatic Turkish Text Categorization in Terms of Author. Genre and Gender, NLDB, Klagenfurt, Austria, 221–226 (2006)
Diederich, J., Kindermann, J., Leopold, E., Paass, G.: Authorship Attribution with Support Vector Machines. Poster presented at The Learning Workshop (2000)
Diri, B., Amasyalı, M.F.: Automatic Author Detection for Turkish Texts, Artificial Neural Networks and Neural Information Processing (ICANN/ICONIP), 138-141 (2003)
Burrows, J.: Word patterns and story shapes: The Statistical Analysis of Narrative Style. Literary and Linguist Comput. 2, 61–70 (1987)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Türkoğlu, F., Diri, B., Amasyalı, M.F. (2007). Author Attribution of Turkish Texts by Feature Mining. In: Huang, DS., Heutte, L., Loog, M. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues. ICIC 2007. Lecture Notes in Computer Science, vol 4681. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74171-8_110
Download citation
DOI: https://doi.org/10.1007/978-3-540-74171-8_110
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74170-1
Online ISBN: 978-3-540-74171-8
eBook Packages: Computer ScienceComputer Science (R0)