Abstract
Authorship Attribution is the task of identifying a true author of a given text from a set of suspected authors stylometry features play a vital role in recognizing the right author, it includes lexical and syntactic features. N-gram is one of the popular techniques used to extract syntactic features from the text. The main objective of this work is to use both lexical and syntactic features on a Kannada text and compare the performance of both approaches using different machine learning algorithms. The Kannada language is spoken by the Indian southern state Karnataka. Even though we can see major works in text processing, Authorship Attribution is in a tender state. Researches have been carried out on handwritten Kannada documents but not on digital text. Char n-gram, word n-gram and the combination of these two known as Amalgamation technique are used as syntactic features to extract the writing style of an author. The results show that Support Vector Machine algorithm outperform with 94% and 60% accuracy using N-grams and lexical features respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Gomez Adorno H, Posadas Durán J, Sidorov G, Pinto D (2018) Document embeddings learned on various types of n-grams for cross-topic authorship attribution. Computing 100:741–756. https://doi.org/10.1007/s00607-018-0587-8
Custódio JE, Paraboni I (2018) EACH-USP ensemble cross-domain authorship attribution. Notebook for PAN CLEF. https://pan.webis.de/clef18/pan18-web/author-identification.html
Ge Z, Sun Y, Smith M (2016) Authorship attribution using a neural network language model. In: Thirtieth AAAI conference on artificial intelligence, vol 30, pp 4212–4213
Radha D, Sekhar PC (2019) Author profiling using stylistic and n-gram features. Int J Eng Adv Technol 9(1). ISSN 2249–8958
Sharma A, Nandan A, Ralhan R (2018) An investigation of supervised learning methods for authorship attribution in short hinglish texts using char & word n-grams. ACM Trans ALRL Inf Process 1(1)
Sari Y (2018) Neural and Non-neural Approaches to Authorship Attribution. Psychology, Computer Science, Corpus ID 106407420
Anwar W, Bajwa IS, Ramzan S (2019) Design and implementation of a machine learning-based authorship identification model. Hindawi Sci Program 2019:1–14, Article ID 9431073. https://doi.org/10.1155/2019/9431073
Bacciu A, La Morgia M, Mei A, Nemmi E, Neri V, Stefa J (2019) Cross-domain authorship attribution combining instance based and profile-based features. In: Proceedings of bacciu 2019 cross domain AA, Corpus ID 198489778, CLEF
Al-Sarem M, Alsaeedi A, Saeed F (2020) A deep learning-based artificial neural network method for instance-based arabic language authorship attribution. Int J Adv Soft Comput Appl 12(2):1–15. ISSN 2074–8523
Fourkioti O, Symeonidis S, Arampatzis A (2019) Language models and fusion for authorship attribution. Inf Process Manag 56(6):1–13. ISSN 0306–4573, https://doi.org/10.1016/j.ipm.2019.102061
Tareef KM (2019) Non-word attributes’ efficiency in text mining authorship prediction. J Intell Syst 29(1):1408–1415. https://doi.org/10.1515/jisys-2019-0068
Hossain AS, Akter N, Islam MS (2020) A stylometric approach for author attribution system using neural network and machine learning classifiers. In: 2020: proceedings of the international conference on computing advancements, Article no 22, pp 1–7. https://doi.org/10.1145/3377049.3377079
Romanov A, Shelupanov A, Fedotova A, Goncharov A (2021) Authorship identification of a russian-language text using support vector machine and deep neural networks. Fut Internet 13:3. https://doi.org/10.3390/fi13010003
Vijayakumara B, Fuad MMM (2019) A new method to identify short-text authors using combinations of machine learning and natural language processing techniques. Procedia Comput Sci 159:428–436. https://doi.org/10.1016/j.procs.2019.09.197
Carman M, Ashman H (2019) Evaluating binary n-gram analysis for authorship attribution. Int J Comput Linguist 10(4):60–69 (2019). ISSN 2180–1266
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chandrika, C.P., Kallimani, J.S. (2022). Instance Based Authorship Attribution for Kannada Text Using Amalgamation of Character and Word N-grams Technique. In: Majhi, S., Prado, R.P.d., Dasanapura Nanjundaiah, C. (eds) Distributed Computing and Optimization Techniques. Lecture Notes in Electrical Engineering, vol 903. Springer, Singapore. https://doi.org/10.1007/978-981-19-2281-7_51
Download citation
DOI: https://doi.org/10.1007/978-981-19-2281-7_51
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2280-0
Online ISBN: 978-981-19-2281-7
eBook Packages: Computer ScienceComputer Science (R0)