Computer Science > Computer Vision and Pattern Recognition

arXiv:2210.14862 (cs)

[Submitted on 26 Oct 2022 (v1), last revised 27 Oct 2022 (this version, v2)]

Title:Visual Semantic Parsing: From Images to Abstract Meaning Representation

Authors:Mohamed Ashraf Abdelsalam, Zhan Shi, Federico Fancellu, Kalliopi Basioti, Dhaivat J. Bhatt, Vladimir Pavlovic, Afsaneh Fazly

View PDF

Abstract:The success of scene graphs for visual scene understanding has brought attention to the benefits of abstracting a visual input (e.g., image) into a structured representation, where entities (people and objects) are nodes connected by edges specifying their relations. Building these representations, however, requires expensive manual annotation in the form of images paired with their scene graphs or frames. These formalisms remain limited in the nature of entities and relations they can capture. In this paper, we propose to leverage a widely-used meaning representation in the field of natural language processing, the Abstract Meaning Representation (AMR), to address these shortcomings. Compared to scene graphs, which largely emphasize spatial relationships, our visual AMR graphs are more linguistically informed, with a focus on higher-level semantic concepts extrapolated from visual input. Moreover, they allow us to generate meta-AMR graphs to unify information contained in multiple image descriptions under one representation. Through extensive experimentation and analysis, we demonstrate that we can re-purpose an existing text-to-AMR parser to parse images into AMRs. Our findings point to important future research directions for improved scene understanding.

Comments:	published in CoNLL 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2210.14862 [cs.CV]
	(or arXiv:2210.14862v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2210.14862

Submission history

From: Mohamed Abdelsalam [view email]
[v1] Wed, 26 Oct 2022 17:06:42 UTC (4,484 KB)
[v2] Thu, 27 Oct 2022 15:54:50 UTC (4,483 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Visual Semantic Parsing: From Images to Abstract Meaning Representation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Visual Semantic Parsing: From Images to Abstract Meaning Representation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators