CVI2: Vol 17, No 4

Volume 17, Issue 4June 2023

Volume 17, Issue 4

June 2023

Publisher:

John Wiley & Sons, Inc.
605 Third Ave. New York, NY
United States

EISSN:1751-9640

Tags:

Bibliometrics

Select All

Export Citations Save to Binder

research-article

Open Access

Video2mesh: 3D human pose and shape recovery by a temporal convolutional transformer network

Pages 379–388https://doi.org/10.1049/cvi2.12172

Abstract

From a 2D video of a person in action, human mesh recovery aims to infer the 3D human pose and shape frame by frame. Despite progress on video‐based human pose and shape estimation, it is still challenging to guarantee high accuracy and smoothness ...

From a video of a person in action, human mesh recovery aims to infer the 3D human pose and shape. We propose a Video2mesh, a temporal convolutional transformer (TConvTransformer) network which is able to recover accurate and smooth human mesh from 2D ...

research-article

Open Access

MCR: Multilayer cross‐fusion with reconstructor for multimodal abstractive summarisation

Pages 389–403https://doi.org/10.1049/cvi2.12173

Abstract

Multimodal abstractive summarisation (MAS) aims to generate a textual summary from multimodal data collection, such as video‐text pairs. Despite the success of recent work, the existing methods lack a thorough analysis for consistency across ...

We propose a novel MCR model for the video‐containing multimodal abstractive summarisation task, aiming to model the thoroughly consistent and complementary semantics in multimodal data. We design the cross‐fusion module implemented by the cross‐modal ...

research-article

Open Access

Self‐supervised non‐rigid structure from motion with improved training of Wasserstein GANs

Pages 404–414https://doi.org/10.1049/cvi2.12175

Abstract

This study proposes a self‐supervised method to reconstruct 3D limbic structures from 2D landmarks extracted from a single view. The loss of self‐consistency can be reduced by performing a random orthogonal projection of the reconstructed 3D ...

We present SS‐Graphformer, a graph convolution and Transformer‐based method for 3D structure reconstruction from 2D landmarks. In addition, geometric self‐consistency is used to achieve self‐supervision; when combined with the 2D structure discriminator, ...

research-article

Open Access

TANet: Transformer‐based asymmetric network for RGB‐D salient object detection

Pages 415–430https://doi.org/10.1049/cvi2.12177

Abstract

Existing RGB‐D salient object detection methods mainly rely on a symmetric two‐stream Convolutional Neural Network (CNN)‐based network to extract RGB and depth channel features separately. However, there are two problems with the symmetric ...

In this paper, we proposed a Transformer‐based asymmetric network (TANet) to address the problem that the Convolutional Neural Network (CNN)‐based models are ineffective in extracting global semantic information while the symmetric two‐stream structures ...

research-article

Open Access

Multi‐directional feature refinement network for real‐time semantic segmentation in urban street scenes

Pages 431–444https://doi.org/10.1049/cvi2.12178

Abstract

Efficient and accurate semantic segmentation is crucial for autonomous driving scene parsing. Capturing detailed information and semantic information efficiently through two‐branch networks has been widely utilised in real‐time semantic ...

This work proposes a network named MRFNet based on two‐branch strategy to solve the problem of accuracy and speed of segmentation in urban scenes. Experiments on Cityscapes and CamVid datasets demonstrate the effectiveness of our method by achieving a ...

research-article

Open Access

Facial expression recognition based on regional adaptive correlation

Pages 445–460https://doi.org/10.1049/cvi2.12179

Abstract

To address the problem that the features extracted by CNN‐based facial expression recognition (FER) do not consider structural information, a region adaptive correlation deep network (RACN) is proposed. The network consists of two branches. In one ...

This paper proposes a regional adaptive correlation network (RACN) to explore more effective description of structural information of faces and enrich the expression feature representation. The network consists of two branches. The proposed second‐order ...

research-article

Open Access

Semantics recalibration and detail enhancement network for real‐time semantic segmentation

Pages 461–472https://doi.org/10.1049/cvi2.12180

Abstract

Real‐time semantic segmentation is a crucial technology in automatic driving scenarios, which needs to meet both high precision and real‐time. The authors observe that learning complex correlations between object categories is vital in the real‐...

We propose a Semantics Recalibration and Detail Enhancement Network for real‐time semantic segmentation based on BiSeNet V2. On the one hand, a lightweight Semantics Recalibration module is designed to effectively extract global semantic contextual ...

research-article

Open Access

Loop and distillation: Attention weights fusion transformer for fine‐grained representation

Pages 473–482https://doi.org/10.1049/cvi2.12181

Abstract

Learning subtle discriminative feature representation plays a significant role in Fine‐Grained Visual Categorisation (FGVC). The vision transformer (ViT) achieves promising performance in the traditional image classification filed due to its multi‐...

We fuse attention weight grouped by head to reinforce the attention of different regions. Subsequently, we adopt three attention weight fusion blocks and channel grouping methods to accurately select discriminative region. In addition, we utilise ...

research-article

Open Access

Selective feature fusion network for salient object detection

Pages 483–495https://doi.org/10.1049/cvi2.12183

Abstract

Fully convolutional neural networks have achieved great success in salient object detection, in which the effective use of multi‐layer features plays a critical role. Based on this advantage, many saliency detectors have emerged in recent years, ...

In this paper, we propose a selective feature fusion network which consists of a selective feature fusion module (SFM) and an attention‐guide hierarchical feature emphasis module (AEM). Selective feature fusion modules adaptively selects the important ...

research-article

Open Access

An efficient mixed attention module

Pages 496–507https://doi.org/10.1049/cvi2.12184

Abstract

Recently, the application of attention mechanisms in convolutional neural networks (CNNs) has become a hot area in computer vision. Most existing methods focus on channel attention or spatial attention. Some mixed attention usually achieves better ...

For recent attention methods to increase performance by increasing complexity, we provide an efficient mixed attention that aggregates channel information and spatial information through a learnable combinatorial formulation. In this way, the modelling ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

IET Computer Vision

Sections

Video2mesh: 3D human pose and shape recovery by a temporal convolutional transformer network

MCR: Multilayer cross‐fusion with reconstructor for multimodal abstractive summarisation

Self‐supervised non‐rigid structure from motion with improved training of Wasserstein GANs

TANet: Transformer‐based asymmetric network for RGB‐D salient object detection

Multi‐directional feature refinement network for real‐time semantic segmentation in urban street scenes

Facial expression recognition based on regional adaptive correlation

Semantics recalibration and detail enhancement network for real‐time semantic segmentation

Loop and distillation: Attention weights fusion transformer for fine‐grained representation

Selective feature fusion network for salient object detection

An efficient mixed attention module