[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Developing a Framework for a Thai Plagiarism Corpus

  • Conference paper
  • First Online:
Computational Linguistics (PACLING 2019)

Abstract

One problem of building a Thai plagiarism corpus is the unavailability of the corpus with real examples of plagiarized texts. To solve the problem, we present a new design and construction of a Thai plagiarism corpus, called TPLAC-2019, to evaluate the plagiarism detection algorithms for Thai. The process of Thai plagiarism corpus creation consists of two methods: 1) simulated plagiarism method, and 2) artificial plagiarism method. For the simulated plagiarism method, we provided a Thai plagiarism tagging tool called PlaTool and a Thai plagiarism guideline for assisting human annotators to plagiarize the text passages. As for artificial plagiarism method, plagiarized documents are automatically generated by a machine. Besides, a new method to automatically create plagiarized text passages is proposed in the artificial plagiarism method. The objective of this proposed method is to automatically create plagiarized text passages that resemble human language. To evaluate the performance of machine-generated Thai plagiarized text passages, we prepared the test sets which are generated from the baseline and the proposed methods. The experiments are set up to compare the readability of human-readable texts in plagiarized documents between two different methods. The experimental results show that the proposed method helps improve the readability of human-readable texts which is increased up to 40%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Clough, P., Stevenson, M.: Developing a corpus of plagiarised short answers. Lang. Resour. Eval. 45(1), 5–24 (2011)

    Article  Google Scholar 

  2. Taerungruang, S., Aroonmanakun, W.: Constructing an academic Thai plagiarism corpus for benchmarking plagiarism detection systems. J. Lang. Stud. 18(3), 186–202 (2018)

    Google Scholar 

  3. Miranda-Jiménez, S., Stamatatos, E.: Automatic generation of summary obfuscation corpus for plagiarism detection. J. Appl. Sci. 14(3), 99–112 (2017)

    Google Scholar 

  4. Juričić, V., Štefanec, V., Bosanac, S.: Multilingual plagiarism detection corpus. In: 35th International Convention MIPRO, pp. 1310–1314. IEEE, Croatia (2012)

    Google Scholar 

  5. Barrón-Cedeño, A., Potthast, M., Rosso, P., Stein, B., Eiselt, A.: Corpus and evaluation measures for automatic plagiarism detection. In: The Seventh Conference on International Language Resources and Evaluation, Malta (2010)

    Google Scholar 

  6. Potthast, M., Stein, B., Barrón-Cedeño, A., Rosso, P.: An evaluation framework for plagiarism detection. In: 23rd International Conference on Computational Linguistics, pp. 997–1005. Association for Computational Linguistics, China (2010)

    Google Scholar 

  7. Potthast, M., Stein, B., Eiselt, A., Barrón-Cedeño, A., Rosso, P.: Overview of the 1st international competition on plagiarism detection. In: SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09), pp. 1–9 (2009)

    Google Scholar 

  8. Mohtaj, S., Asghari, H., Zarrabi, V.: Developing monolingual English corpus for plagiarism detection using human annotated paraphrase corpus. In: Working Notes of CLEF 2015 (2015)

    Google Scholar 

  9. Siddiqui, M.A., Khan, I.H., Jambi, K.M., Elhaj, S.O., Bagais, A.: Developing an Arabic plagiarism detection corpus. In: The International Conference on Computer Science, Engineering and Information Technology (CSEIT-2014), Australia, pp. 261–269 (2014)

    Google Scholar 

  10. Sharjeel, M., Rayson, P., Muhammad, R., Nawab, A.: UPPC-Urdu paraphrase plagiarism corpus. In: 10th International Conference on Language Resources and Evaluation Conference (LREC), pp. 1832–1836. Lancaster University (2016)

    Google Scholar 

  11. Barrón-Cedeño, A., Vila, M., Marti, M.A., Rosso, P.: Plagiarism meets paraphrasing: Insights for the next generation in automatic plagiarism detection. Comput. Linguist. 39(4), 917–947 (2013)

    Article  Google Scholar 

  12. Clough, P., Gaizauskas, R., Piao, S.S., Wilks, Y., METER: MEasuring TExt Reuse. In: 40th Annual Meeting of the Association for Computational Linguistics, pp. 152–159. Association for Computational Linguistics, Pennsylvania (2002)

    Google Scholar 

  13. Potthast, M., Barrón-Cedeño, A., Eiselt, A., Stein, B., Rosso, P.: Overview of the 2nd international competition on plagiarism detection. In: Notebook Papers of CLEF 2010 LABs and Workshops (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Santipong Thaiprayoon , Pornpimon Palingoon , Kanokorn Trakultaweekoon , Supon Klaithin , Choochart Haruechaiyasak , Alisa Kongthon , Sumonmas Thatpitakkul or Sawit Kasuriya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Thaiprayoon, S. et al. (2020). Developing a Framework for a Thai Plagiarism Corpus. In: Nguyen, LM., Phan, XH., Hasida, K., Tojo, S. (eds) Computational Linguistics. PACLING 2019. Communications in Computer and Information Science, vol 1215. Springer, Singapore. https://doi.org/10.1007/978-981-15-6168-9_42

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-6168-9_42

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-6167-2

  • Online ISBN: 978-981-15-6168-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics