Computer Science > Computation and Language

arXiv:2308.05341 (cs)

[Submitted on 10 Aug 2023]

Title:Classification of Human- and AI-Generated Texts: Investigating Features for ChatGPT

Authors:Lorenz Mindner, Tim Schlippe, Kristina Schaaff

View PDF

Abstract:Recently, generative AIs like ChatGPT have become available to the wide public. These tools can for instance be used by students to generate essays or whole theses. But how does a teacher know whether a text is written by a student or an AI? In our work, we explore traditional and new features to (1) detect text generated by AI from scratch and (2) text rephrased by AI. Since we found that classification is more difficult when the AI has been instructed to create the text in a way that a human would not recognize that it was generated by an AI, we also investigate this more advanced case. For our experiments, we produced a new text corpus covering 10 school topics. Our best systems to classify basic and advanced human-generated/AI-generated texts have F1-scores of over 96%. Our best systems for classifying basic and advanced human-generated/AI-rephrased texts have F1-scores of more than 78%. The systems use a combination of perplexity, semantic, list lookup, error-based, readability, AI feedback, and text vector features. Our results show that the new features substantially help to improve the performance of many classifiers. Our best basic text rephrasing detection system even outperforms GPTZero by 183.8% relative in F1-score.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2308.05341 [cs.CL]
	(or arXiv:2308.05341v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2308.05341
Related DOI:	https://doi.org/10.1007/978-981-99-7947-9_12

Submission history

From: Kristina Schaaff [view email]
[v1] Thu, 10 Aug 2023 05:09:42 UTC (638 KB)

Computer Science > Computation and Language

Title:Classification of Human- and AI-Generated Texts: Investigating Features for ChatGPT

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Classification of Human- and AI-Generated Texts: Investigating Features for ChatGPT

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators