Instance Based Authorship Attribution for Kannada Text Using Amalgamation of Character and Word N-grams Technique

C. P. Chandrika⁴⁰ &
Jagadish S. Kallimani^40,41

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 903))

682 Accesses

Abstract

Authorship Attribution is the task of identifying a true author of a given text from a set of suspected authors stylometry features play a vital role in recognizing the right author, it includes lexical and syntactic features. N-gram is one of the popular techniques used to extract syntactic features from the text. The main objective of this work is to use both lexical and syntactic features on a Kannada text and compare the performance of both approaches using different machine learning algorithms. The Kannada language is spoken by the Indian southern state Karnataka. Even though we can see major works in text processing, Authorship Attribution is in a tender state. Researches have been carried out on handwritten Kannada documents but not on digital text. Char n-gram, word n-gram and the combination of these two known as Amalgamation technique are used as syntactic features to extract the writing style of an author. The results show that Support Vector Machine algorithm outperform with 94% and 60% accuracy using N-grams and lexical features respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 103.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 129.99; Price includes VAT (United Kingdom)

Hardcover Book: GBP 129.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Feature Selection for Enhanced Author Identification of Turkish Text

Authorship Attribution using Filtered N-grams as Features

Authorship Identification Through Stylometry Analysis Using Text Processing and Machine Learning Algorithms

References

Gomez Adorno H, Posadas Durán J, Sidorov G, Pinto D (2018) Document embeddings learned on various types of n-grams for cross-topic authorship attribution. Computing 100:741–756. https://doi.org/10.1007/s00607-018-0587-8
Article Google Scholar
Custódio JE, Paraboni I (2018) EACH-USP ensemble cross-domain authorship attribution. Notebook for PAN CLEF. https://pan.webis.de/clef18/pan18-web/author-identification.html
Ge Z, Sun Y, Smith M (2016) Authorship attribution using a neural network language model. In: Thirtieth AAAI conference on artificial intelligence, vol 30, pp 4212–4213
Google Scholar
Radha D, Sekhar PC (2019) Author profiling using stylistic and n-gram features. Int J Eng Adv Technol 9(1). ISSN 2249–8958
Google Scholar
Sharma A, Nandan A, Ralhan R (2018) An investigation of supervised learning methods for authorship attribution in short hinglish texts using char & word n-grams. ACM Trans ALRL Inf Process 1(1)
Google Scholar
Sari Y (2018) Neural and Non-neural Approaches to Authorship Attribution. Psychology, Computer Science, Corpus ID 106407420
Google Scholar
Anwar W, Bajwa IS, Ramzan S (2019) Design and implementation of a machine learning-based authorship identification model. Hindawi Sci Program 2019:1–14, Article ID 9431073. https://doi.org/10.1155/2019/9431073
Bacciu A, La Morgia M, Mei A, Nemmi E, Neri V, Stefa J (2019) Cross-domain authorship attribution combining instance based and profile-based features. In: Proceedings of bacciu 2019 cross domain AA, Corpus ID 198489778, CLEF
Google Scholar
Al-Sarem M, Alsaeedi A, Saeed F (2020) A deep learning-based artificial neural network method for instance-based arabic language authorship attribution. Int J Adv Soft Comput Appl 12(2):1–15. ISSN 2074–8523
Google Scholar
Fourkioti O, Symeonidis S, Arampatzis A (2019) Language models and fusion for authorship attribution. Inf Process Manag 56(6):1–13. ISSN 0306–4573, https://doi.org/10.1016/j.ipm.2019.102061
Tareef KM (2019) Non-word attributes’ efficiency in text mining authorship prediction. J Intell Syst 29(1):1408–1415. https://doi.org/10.1515/jisys-2019-0068
Article Google Scholar
Hossain AS, Akter N, Islam MS (2020) A stylometric approach for author attribution system using neural network and machine learning classifiers. In: 2020: proceedings of the international conference on computing advancements, Article no 22, pp 1–7. https://doi.org/10.1145/3377049.3377079
Romanov A, Shelupanov A, Fedotova A, Goncharov A (2021) Authorship identification of a russian-language text using support vector machine and deep neural networks. Fut Internet 13:3. https://doi.org/10.3390/fi13010003
Article Google Scholar
Vijayakumara B, Fuad MMM (2019) A new method to identify short-text authors using combinations of machine learning and natural language processing techniques. Procedia Comput Sci 159:428–436. https://doi.org/10.1016/j.procs.2019.09.197
Article Google Scholar
Carman M, Ashman H (2019) Evaluating binary n-gram analysis for authorship attribution. Int J Comput Linguist 10(4):60–69 (2019). ISSN 2180–1266
Google Scholar

Download references

Author information

Authors and Affiliations

M S Ramaiah Institute of Technology, Bangalore, 560054, India
C. P. Chandrika & Jagadish S. Kallimani
Visvesvaraya Technological University, Belagavi, Karnataka, India
Jagadish S. Kallimani

Authors

C. P. Chandrika
View author publications
You can also search for this author in PubMed Google Scholar
Jagadish S. Kallimani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. P. Chandrika .

Editor information

Editors and Affiliations

Electrical Engineering, Indian Institute of Technology Patna, Patna, Bihar, India
Sudhan Majhi
Telecommunication Engineering, University of Jaén, Jaén, Spain
Rocío Pérez de Prado
Electronics and Communication, SJB Institute of Technology, Bengaluru, Karnataka, India
Chandrappa Dasanapura Nanjundaiah

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chandrika, C.P., Kallimani, J.S. (2022). Instance Based Authorship Attribution for Kannada Text Using Amalgamation of Character and Word N-grams Technique. In: Majhi, S., Prado, R.P.d., Dasanapura Nanjundaiah, C. (eds) Distributed Computing and Optimization Techniques. Lecture Notes in Electrical Engineering, vol 903. Springer, Singapore. https://doi.org/10.1007/978-981-19-2281-7_51

Download citation

DOI: https://doi.org/10.1007/978-981-19-2281-7_51
Published: 12 September 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2280-0
Online ISBN: 978-981-19-2281-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics