Computer Science > Computation and Language

arXiv:2410.20494 (cs)

[Submitted on 27 Oct 2024]

Title:MatViX: Multimodal Information Extraction from Visually Rich Articles

Authors:Ghazal Khalighinejad, Sharon Scott, Ollie Liu, Kelly L. Anderson, Rickard Stureborg, Aman Tyagi, Bhuwan Dhingra

Abstract:Multimodal information extraction (MIE) is crucial for scientific literature, where valuable data is often spread across text, figures, and tables. In materials science, extracting structured information from research articles can accelerate the discovery of new materials. However, the multimodal nature and complex interconnections of scientific content present challenges for traditional text-based methods. We introduce \textsc{MatViX}, a benchmark consisting of $324$ full-length research articles and $1,688$ complex structured JSON files, carefully curated by domain experts. These JSON files are extracted from text, tables, and figures in full-length documents, providing a comprehensive challenge for MIE. We introduce an evaluation method to assess the accuracy of curve similarity and the alignment of hierarchical structures. Additionally, we benchmark vision-language models (VLMs) in a zero-shot manner, capable of processing long contexts and multimodal inputs, and show that using a specialized model (DePlot) can improve performance in extracting curves. Our results demonstrate significant room for improvement in current models. Our dataset and evaluation code are available\footnote{\url{this https URL}}.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2410.20494 [cs.CL]
	(or arXiv:2410.20494v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2410.20494

Submission history

From: Ghazal Khalighinejad [view email]
[v1] Sun, 27 Oct 2024 16:13:58 UTC (1,070 KB)

Computer Science > Computation and Language

Title:MatViX: Multimodal Information Extraction from Visually Rich Articles

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:MatViX: Multimodal Information Extraction from Visually Rich Articles

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators