Abstract
Many Internet users face the problem of anonymous documents and texts with a counterfeit authorship. The number of questionable documents exceeds the capacity of human experts, therefore a universal automated authorship identification system supporting all types of documents is needed. In this paper, five predominant document types are analysed in the context of the authorship verification: books, blogs, discussions, comments and tweets. A method of an automatic selection of authors’ stylometric features using a double-layer machine learning is proposed and evaluated. Experiments are conducted on ten disjunct train and test sets and a method of an efficient training of large number of machine learning models is introduced (163,700 models were trained).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Daelemans, W.: Explanation in computational stylometry. In: Gelbukh, A. (ed.) CICLing 2013, Part II. LNCS, vol. 7817, pp. 451–462. Springer, Heidelberg (2013)
Fitzgerald, J.R.: FBI’s communicated threat assessment database: History, design, and implementation. FBI: Law Enforcement Bulletin 76, 6–9 (2007)
Grieve, J.W.: Quantitative authorship attribution: A history and an evaluation of technique. Master’s thesis. Simon Fraser University (2005)
Hilton, O.: Scientific examination of questioned documents. Callaghan (1956)
Hollingsworth, C.: Using dependency-based annotations for authorship identification. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 314–319. Springer, Heidelberg (2012)
Holmes, D.I.: The Analysis of Literary Style – A Review. Journal of the Royal Statistical Society 148(4), 328–341 (1985)
Iqbal, F., Khan, L.A., Fung, B.C.M., Debbabi, M.: e-mail authorship verification for forensic investigation. In: Proceedings of the 2010 ACM Symposium on Applied Computing, SAC 2010, pp. 1591–1598. ACM Press, New York (2010)
Koppel, M., Schler, J.: Authorship verification as a one-class classification problem. In: Proceedings of the Twenty-first International Conference on Machine Learning, ICML 2004, p. 62. ACM, New York (2004)
Kovář, V., Horák, A., Jakubíček, M.: Syntactic analysis using finite patterns: A new parsing system for czech. In: Vetulani, Z. (ed.) LTC 2009. LNCS (LNAI), vol. 6562, pp. 161–171. Springer, Heidelberg (2011)
Love, H.: Attributing Authorship: An Introduction. Cambridge University Press (2002)
Luyckx, K., Daelemans, W.: Authorship attribution and verification with many authors and limited data. In: Proceedings of the 22nd International Conference on Computational Linguistics COLING 2008, vol. 1, pp. 513–520. Association for Computational Linguistics, Stroudsburg (2008)
McMenamin, G.R., Choi, D.: Forensic Linguistics: Advances in Forensic Stylistics. Crc Press (2002)
Morton, A.Q., Michaelson, S.: The Q-Sum Plot. Technical report, Department of Computer Science, University of Edinburgh, CSR-3-90 (1990)
Pearl, L., Steyvers, M.: Detecting authorship deception: a supervised machine learning approach using author writeprints. LLC 27(2), 183–196 (2012)
Rygl, J., Horák, A.: Authorship Attribution: Comparison of Single-layer and Double-layer Machine Learning. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 282–289. Springer, Heidelberg (2012)
Rygl, J., Zemková, K., Kovář, V.: Authorship Verification based on Syntax Features. In: Proceedings of Sixth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2012, 1st edn., Tribun EU, Brno, Czech Republic, pp. 111–119 (2012)
Simpson, E.H.: Measurement of diversity. Nature 163, 688 (1949)
van Halteren, H.: Linguistic profiling for author recognition and verification. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL 2004. Association for Computational Linguistics, Stroudsburg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Rygl, J. (2014). Automatic Adaptation of Author’s Stylometric Features to Document Types. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-10816-2_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10815-5
Online ISBN: 978-3-319-10816-2
eBook Packages: Computer ScienceComputer Science (R0)