[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4681))

Included in the following conference series:

Abstract

The aim of this study is to identify the author of an unauthorized document. Ten different feature vectors are obtained from authorship attributes, n-grams and various combinations of these feature vectors that are extracted from documents, which the authors are intended to be identified. Comparative performance of every feature vector is analyzed by applying Naïve Bayes, SVM, k-NN, RF and MLP classification methods. The most successful classifiers are MLP and SVM. In document classification process, it is observed that n-grams give higher accuracy rates than authorship attributes. Nevertheless, using n-gram and authorship attributes together, gives better results than when each is used alone.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Geritsen, C.M.: Authorship Attribution Using Lexical Attraction, Master Thesis Department of Electrical Engineering and Computer Science. MIT (2003)

    Google Scholar 

  2. Holmes, D.: The Evolution of Stylometry in Humanities Scholarship Literary and Linguistic Computing 13(3), 111–117 (1998)

    Google Scholar 

  3. Koppel, M., Schler, J.: Exploiting Stylistic Idiosyncraises for Authorship Attribution, IJCAI’03 Workshop on Computational Approaches to Style Analysis and Synthesis Acapulco, Mexico (2003)

    Google Scholar 

  4. Mosteller, F., Wallace, D.L.: Inference and Disputed Authorship: The Federalist Reading. Addison-Wesley, MA (1964)

    MATH  Google Scholar 

  5. Yule, G.U.: On Sentence Length As a Statistical Characteristic of Style in Prose with Application to Two Cases of Disputed Authorship. Biometrica 30, 363–390 (1938)

    Google Scholar 

  6. Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Computer-Based Authorship Attribution without Lexical Measures. Computers and the Humanities, 193-214 (2001)

    Google Scholar 

  7. Peng, F., Schuurmans, D., Keselj, V., Wang, S.: Language Independent Authorship Attribution using Character Level Language Models. In: 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, pp. 267-274 (2003)

    Google Scholar 

  8. Fung, G., Mangasarian, O.: The Disputed Federalist Papers: SVM Feature Selection via Concave Minimization. In: Proceedings of the, Conference of Diversity in Computing, Atlanta, Georgia, USA, pp. 42-46 (2003)

    Google Scholar 

  9. Kukushkina, O.V., Polikarpov, A.A., Khemelev, D.V.: Using Literal and Grammatical Statistics for Authorship Attribution. Problemy Peredachi Informatsii 37(2) (2000)

    Google Scholar 

  10. Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Automatic Authorship Attribution. In: Nineth Conf. European Chap. Assoc. Computational Linguistics, Bergen, Norway (1999)

    Google Scholar 

  11. Fürnkranz, J.: A Study Using n-gram Features for Text Categorization. Austrian Research Institute for Artifical Intelligence (1998)

    Google Scholar 

  12. Cavnar, W.B.: Using an n-gram-based Document Representation with a Vector Processing Retrieval Model. In: Proceedings of the Third Text Retrieval Conference(TREC-3) (1994)

    Google Scholar 

  13. Amasyalı, M.F., Diri, B.: Automatic Turkish Text Categorization in Terms of Author. Genre and Gender, NLDB, Klagenfurt, Austria, 221–226 (2006)

    Google Scholar 

  14. Diederich, J., Kindermann, J., Leopold, E., Paass, G.: Authorship Attribution with Support Vector Machines. Poster presented at The Learning Workshop (2000)

    Google Scholar 

  15. Diri, B., Amasyalı, M.F.: Automatic Author Detection for Turkish Texts, Artificial Neural Networks and Neural Information Processing (ICANN/ICONIP), 138-141 (2003)

    Google Scholar 

  16. Burrows, J.: Word patterns and story shapes: The Statistical Analysis of Narrative Style. Literary and Linguist Comput. 2, 61–70 (1987)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

De-Shuang Huang Laurent Heutte Marco Loog

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Türkoğlu, F., Diri, B., Amasyalı, M.F. (2007). Author Attribution of Turkish Texts by Feature Mining. In: Huang, DS., Heutte, L., Loog, M. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues. ICIC 2007. Lecture Notes in Computer Science, vol 4681. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74171-8_110

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74171-8_110

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74170-1

  • Online ISBN: 978-3-540-74171-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics