[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
Reflects downloads up to 19 Dec 2024Bibliometrics
Skip Table Of Content Section
research-article
Open Access
Video2mesh: 3D human pose and shape recovery by a temporal convolutional transformer network
Abstract

From a 2D video of a person in action, human mesh recovery aims to infer the 3D human pose and shape frame by frame. Despite progress on video‐based human pose and shape estimation, it is still challenging to guarantee high accuracy and smoothness ...

From a video of a person in action, human mesh recovery aims to infer the 3D human pose and shape. We propose a Video2mesh, a temporal convolutional transformer (TConvTransformer) network which is able to recover accurate and smooth human mesh from 2D ...

research-article
Open Access
MCR: Multilayer cross‐fusion with reconstructor for multimodal abstractive summarisation
Abstract

Multimodal abstractive summarisation (MAS) aims to generate a textual summary from multimodal data collection, such as video‐text pairs. Despite the success of recent work, the existing methods lack a thorough analysis for consistency across ...

We propose a novel MCR model for the video‐containing multimodal abstractive summarisation task, aiming to model the thoroughly consistent and complementary semantics in multimodal data. We design the cross‐fusion module implemented by the cross‐modal ...

research-article
Open Access
Self‐supervised non‐rigid structure from motion with improved training of Wasserstein GANs
Abstract

This study proposes a self‐supervised method to reconstruct 3D limbic structures from 2D landmarks extracted from a single view. The loss of self‐consistency can be reduced by performing a random orthogonal projection of the reconstructed 3D ...

We present SS‐Graphformer, a graph convolution and Transformer‐based method for 3D structure reconstruction from 2D landmarks. In addition, geometric self‐consistency is used to achieve self‐supervision; when combined with the 2D structure discriminator, ...

research-article
Open Access
TANet: Transformer‐based asymmetric network for RGB‐D salient object detection
Abstract

Existing RGB‐D salient object detection methods mainly rely on a symmetric two‐stream Convolutional Neural Network (CNN)‐based network to extract RGB and depth channel features separately. However, there are two problems with the symmetric ...

In this paper, we proposed a Transformer‐based asymmetric network (TANet) to address the problem that the Convolutional Neural Network (CNN)‐based models are ineffective in extracting global semantic information while the symmetric two‐stream structures ...

research-article
Open Access
Multi‐directional feature refinement network for real‐time semantic segmentation in urban street scenes
Abstract

Efficient and accurate semantic segmentation is crucial for autonomous driving scene parsing. Capturing detailed information and semantic information efficiently through two‐branch networks has been widely utilised in real‐time semantic ...

This work proposes a network named MRFNet based on two‐branch strategy to solve the problem of accuracy and speed of segmentation in urban scenes. Experiments on Cityscapes and CamVid datasets demonstrate the effectiveness of our method by achieving a ...

research-article
Open Access
Facial expression recognition based on regional adaptive correlation
Abstract

To address the problem that the features extracted by CNN‐based facial expression recognition (FER) do not consider structural information, a region adaptive correlation deep network (RACN) is proposed. The network consists of two branches. In one ...

This paper proposes a regional adaptive correlation network (RACN) to explore more effective description of structural information of faces and enrich the expression feature representation. The network consists of two branches. The proposed second‐order ...

research-article
Open Access
Semantics recalibration and detail enhancement network for real‐time semantic segmentation
Abstract

Real‐time semantic segmentation is a crucial technology in automatic driving scenarios, which needs to meet both high precision and real‐time. The authors observe that learning complex correlations between object categories is vital in the real‐...

We propose a Semantics Recalibration and Detail Enhancement Network for real‐time semantic segmentation based on BiSeNet V2. On the one hand, a lightweight Semantics Recalibration module is designed to effectively extract global semantic contextual ...

research-article
Open Access
Loop and distillation: Attention weights fusion transformer for fine‐grained representation
Abstract

Learning subtle discriminative feature representation plays a significant role in Fine‐Grained Visual Categorisation (FGVC). The vision transformer (ViT) achieves promising performance in the traditional image classification filed due to its multi‐...

We fuse attention weight grouped by head to reinforce the attention of different regions. Subsequently, we adopt three attention weight fusion blocks and channel grouping methods to accurately select discriminative region. In addition, we utilise ...

research-article
Open Access
Selective feature fusion network for salient object detection
Abstract

Fully convolutional neural networks have achieved great success in salient object detection, in which the effective use of multi‐layer features plays a critical role. Based on this advantage, many saliency detectors have emerged in recent years, ...

In this paper, we propose a selective feature fusion network which consists of a selective feature fusion module (SFM) and an attention‐guide hierarchical feature emphasis module (AEM). Selective feature fusion modules adaptively selects the important ...

research-article
Open Access
An efficient mixed attention module
Abstract

Recently, the application of attention mechanisms in convolutional neural networks (CNNs) has become a hot area in computer vision. Most existing methods focus on channel attention or spatial attention. Some mixed attention usually achieves better ...

For recent attention methods to increase performance by increasing complexity, we provide an efficient mixed attention that aggregates channel information and spatial information through a learnable combinatorial formulation. In this way, the modelling ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.