Computer Science > Computation and Language

arXiv:2407.05609 (cs)

[Submitted on 8 Jul 2024]

Title:Open-world Multi-label Text Classification with Extremely Weak Supervision

Authors:Xintong Li, Jinya Jiang, Ria Dharmani, Jayanth Srinivasa, Gaowen Liu, Jingbo Shang

Abstract:We study open-world multi-label text classification under extremely weak supervision (XWS), where the user only provides a brief description for classification objectives without any labels or ground-truth label space. Similar single-label XWS settings have been explored recently, however, these methods cannot be easily adapted for multi-label. We observe that (1) most documents have a dominant class covering the majority of content and (2) long-tail labels would appear in some documents as a dominant class. Therefore, we first utilize the user description to prompt a large language model (LLM) for dominant keyphrases of a subset of raw documents, and then construct a (initial) label space via clustering. We further apply a zero-shot multi-label classifier to locate the documents with small top predicted scores, so we can revisit their dominant keyphrases for more long-tail labels. We iterate this process to discover a comprehensive label space and construct a multi-label classifier as a novel method, X-MLClass. X-MLClass exhibits a remarkable increase in ground-truth label space coverage on various datasets, for example, a 40% improvement on the AAPD dataset over topic modeling and keyword extraction methods. Moreover, X-MLClass achieves the best end-to-end multi-label classification accuracy.

Comments:	Preprint
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2407.05609 [cs.CL]
	(or arXiv:2407.05609v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.05609

Submission history

From: Xintong Li [view email]
[v1] Mon, 8 Jul 2024 04:52:49 UTC (7,812 KB)

Computer Science > Computation and Language

Title:Open-world Multi-label Text Classification with Extremely Weak Supervision

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Open-world Multi-label Text Classification with Extremely Weak Supervision

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators