Computer Science > Computer Vision and Pattern Recognition

arXiv:2208.00361 (cs)

[Submitted on 31 Jul 2022 (v1), last revised 27 Oct 2022 (this version, v3)]

Title:One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning

Authors:Zhipeng Zhang, Zhimin Wei, Zhongzhen Huang, Rui Niu, Peng Wang

View PDF

Abstract:Referring Expression Comprehension (REC) is one of the most important tasks in visual reasoning that requires a model to detect the target object referred by a natural language expression. Among the proposed pipelines, the one-stage Referring Expression Comprehension (OSREC) has become the dominant trend since it merges the region proposal and selection stages. Many state-of-the-art OSREC models adopt a multi-hop reasoning strategy because a sequence of objects is frequently mentioned in a single expression which needs multi-hop reasoning to analyze the semantic relation. However, one unsolved issue of these models is that the number of reasoning steps needs to be pre-defined and fixed before inference, ignoring the varying complexity of expressions. In this paper, we propose a Dynamic Multi-step Reasoning Network, which allows the reasoning steps to be dynamically adjusted based on the reasoning state and expression complexity. Specifically, we adopt a Transformer module to memorize & process the reasoning state and a Reinforcement Learning strategy to dynamically infer the reasoning steps. The work achieves the state-of-the-art performance or significant improvements on several REC datasets, ranging from RefCOCO (+, g) with short expressions, to Ref-Reasoning, a dataset with long and complex compositional expressions.

Comments:	27 pages, 6 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2208.00361 [cs.CV]
	(or arXiv:2208.00361v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2208.00361

Submission history

From: Zhimin Wei [view email]
[v1] Sun, 31 Jul 2022 04:51:27 UTC (1,228 KB)
[v2] Tue, 11 Oct 2022 10:53:56 UTC (1,454 KB)
[v3] Thu, 27 Oct 2022 11:30:23 UTC (2,205 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators