[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Instance Based Authorship Attribution for Kannada Text Using Amalgamation of Character and Word N-grams Technique

  • Conference paper
  • First Online:
Distributed Computing and Optimization Techniques

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 903))

  • 682 Accesses

Abstract

Authorship Attribution is the task of identifying a true author of a given text from a set of suspected authors stylometry features play a vital role in recognizing the right author, it includes lexical and syntactic features. N-gram is one of the popular techniques used to extract syntactic features from the text. The main objective of this work is to use both lexical and syntactic features on a Kannada text and compare the performance of both approaches using different machine learning algorithms. The Kannada language is spoken by the Indian southern state Karnataka. Even though we can see major works in text processing, Authorship Attribution is in a tender state. Researches have been carried out on handwritten Kannada documents but not on digital text. Char n-gram, word n-gram and the combination of these two known as Amalgamation technique are used as syntactic features to extract the writing style of an author. The results show that Support Vector Machine algorithm outperform with 94% and 60% accuracy using N-grams and lexical features respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 103.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 129.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
GBP 129.99
Price includes VAT (United Kingdom)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Gomez Adorno H, Posadas Durán J, Sidorov G, Pinto D (2018) Document embeddings learned on various types of n-grams for cross-topic authorship attribution. Computing 100:741–756. https://doi.org/10.1007/s00607-018-0587-8

    Article  Google Scholar 

  2. Custódio JE, Paraboni I (2018) EACH-USP ensemble cross-domain authorship attribution. Notebook for PAN CLEF. https://pan.webis.de/clef18/pan18-web/author-identification.html

  3. Ge Z, Sun Y, Smith M (2016) Authorship attribution using a neural network language model. In: Thirtieth AAAI conference on artificial intelligence, vol 30, pp 4212–4213

    Google Scholar 

  4. Radha D, Sekhar PC (2019) Author profiling using stylistic and n-gram features. Int J Eng Adv Technol 9(1). ISSN 2249–8958

    Google Scholar 

  5. Sharma A, Nandan A, Ralhan R (2018) An investigation of supervised learning methods for authorship attribution in short hinglish texts using char & word n-grams. ACM Trans ALRL Inf Process 1(1)

    Google Scholar 

  6. Sari Y (2018) Neural and Non-neural Approaches to Authorship Attribution. Psychology, Computer Science, Corpus ID 106407420

    Google Scholar 

  7. Anwar W, Bajwa IS, Ramzan S (2019) Design and implementation of a machine learning-based authorship identification model. Hindawi Sci Program 2019:1–14, Article ID 9431073. https://doi.org/10.1155/2019/9431073

  8. Bacciu A, La Morgia M, Mei A, Nemmi E, Neri V, Stefa J (2019) Cross-domain authorship attribution combining instance based and profile-based features. In: Proceedings of bacciu 2019 cross domain AA, Corpus ID 198489778, CLEF

    Google Scholar 

  9. Al-Sarem M, Alsaeedi A, Saeed F (2020) A deep learning-based artificial neural network method for instance-based arabic language authorship attribution. Int J Adv Soft Comput Appl 12(2):1–15. ISSN 2074–8523

    Google Scholar 

  10. Fourkioti O, Symeonidis S, Arampatzis A (2019) Language models and fusion for authorship attribution. Inf Process Manag 56(6):1–13. ISSN 0306–4573, https://doi.org/10.1016/j.ipm.2019.102061

  11. Tareef KM (2019) Non-word attributes’ efficiency in text mining authorship prediction. J Intell Syst 29(1):1408–1415. https://doi.org/10.1515/jisys-2019-0068

    Article  Google Scholar 

  12. Hossain AS, Akter N, Islam MS (2020) A stylometric approach for author attribution system using neural network and machine learning classifiers. In: 2020: proceedings of the international conference on computing advancements, Article no 22, pp 1–7. https://doi.org/10.1145/3377049.3377079

  13. Romanov A, Shelupanov A, Fedotova A, Goncharov A (2021) Authorship identification of a russian-language text using support vector machine and deep neural networks. Fut Internet 13:3. https://doi.org/10.3390/fi13010003

    Article  Google Scholar 

  14. Vijayakumara B, Fuad MMM (2019) A new method to identify short-text authors using combinations of machine learning and natural language processing techniques. Procedia Comput Sci 159:428–436. https://doi.org/10.1016/j.procs.2019.09.197

    Article  Google Scholar 

  15. Carman M, Ashman H (2019) Evaluating binary n-gram analysis for authorship attribution. Int J Comput Linguist 10(4):60–69 (2019). ISSN 2180–1266

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. P. Chandrika .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chandrika, C.P., Kallimani, J.S. (2022). Instance Based Authorship Attribution for Kannada Text Using Amalgamation of Character and Word N-grams Technique. In: Majhi, S., Prado, R.P.d., Dasanapura Nanjundaiah, C. (eds) Distributed Computing and Optimization Techniques. Lecture Notes in Electrical Engineering, vol 903. Springer, Singapore. https://doi.org/10.1007/978-981-19-2281-7_51

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-2281-7_51

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-2280-0

  • Online ISBN: 978-981-19-2281-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics