[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Automatic Adaptation of Author’s Stylometric Features to Document Types

  • Conference paper
Text, Speech and Dialogue (TSD 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8655))

Included in the following conference series:

Abstract

Many Internet users face the problem of anonymous documents and texts with a counterfeit authorship. The number of questionable documents exceeds the capacity of human experts, therefore a universal automated authorship identification system supporting all types of documents is needed. In this paper, five predominant document types are analysed in the context of the authorship verification: books, blogs, discussions, comments and tweets. A method of an automatic selection of authors’ stylometric features using a double-layer machine learning is proposed and evaluated. Experiments are conducted on ten disjunct train and test sets and a method of an efficient training of large number of machine learning models is introduced (163,700 models were trained).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Daelemans, W.: Explanation in computational stylometry. In: Gelbukh, A. (ed.) CICLing 2013, Part II. LNCS, vol. 7817, pp. 451–462. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  2. Fitzgerald, J.R.: FBI’s communicated threat assessment database: History, design, and implementation. FBI: Law Enforcement Bulletin 76, 6–9 (2007)

    Google Scholar 

  3. Grieve, J.W.: Quantitative authorship attribution: A history and an evaluation of technique. Master’s thesis. Simon Fraser University (2005)

    Google Scholar 

  4. Hilton, O.: Scientific examination of questioned documents. Callaghan (1956)

    Google Scholar 

  5. Hollingsworth, C.: Using dependency-based annotations for authorship identification. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 314–319. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  6. Holmes, D.I.: The Analysis of Literary Style – A Review. Journal of the Royal Statistical Society 148(4), 328–341 (1985)

    Article  Google Scholar 

  7. Iqbal, F., Khan, L.A., Fung, B.C.M., Debbabi, M.: e-mail authorship verification for forensic investigation. In: Proceedings of the 2010 ACM Symposium on Applied Computing, SAC 2010, pp. 1591–1598. ACM Press, New York (2010)

    Google Scholar 

  8. Koppel, M., Schler, J.: Authorship verification as a one-class classification problem. In: Proceedings of the Twenty-first International Conference on Machine Learning, ICML 2004, p. 62. ACM, New York (2004)

    Google Scholar 

  9. Kovář, V., Horák, A., Jakubíček, M.: Syntactic analysis using finite patterns: A new parsing system for czech. In: Vetulani, Z. (ed.) LTC 2009. LNCS (LNAI), vol. 6562, pp. 161–171. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Love, H.: Attributing Authorship: An Introduction. Cambridge University Press (2002)

    Google Scholar 

  11. Luyckx, K., Daelemans, W.: Authorship attribution and verification with many authors and limited data. In: Proceedings of the 22nd International Conference on Computational Linguistics COLING 2008, vol. 1, pp. 513–520. Association for Computational Linguistics, Stroudsburg (2008)

    Google Scholar 

  12. McMenamin, G.R., Choi, D.: Forensic Linguistics: Advances in Forensic Stylistics. Crc Press (2002)

    Google Scholar 

  13. Morton, A.Q., Michaelson, S.: The Q-Sum Plot. Technical report, Department of Computer Science, University of Edinburgh, CSR-3-90 (1990)

    Google Scholar 

  14. Pearl, L., Steyvers, M.: Detecting authorship deception: a supervised machine learning approach using author writeprints. LLC 27(2), 183–196 (2012)

    Google Scholar 

  15. Rygl, J., Horák, A.: Authorship Attribution: Comparison of Single-layer and Double-layer Machine Learning. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 282–289. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  16. Rygl, J., Zemková, K., Kovář, V.: Authorship Verification based on Syntax Features. In: Proceedings of Sixth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2012, 1st edn., Tribun EU, Brno, Czech Republic, pp. 111–119 (2012)

    Google Scholar 

  17. Simpson, E.H.: Measurement of diversity. Nature 163, 688 (1949)

    Article  MATH  Google Scholar 

  18. van Halteren, H.: Linguistic profiling for author recognition and verification. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL 2004. Association for Computational Linguistics, Stroudsburg (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Rygl, J. (2014). Automatic Adaptation of Author’s Stylometric Features to Document Types. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10816-2_7

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10815-5

  • Online ISBN: 978-3-319-10816-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics