Export Citations
Multi-object tracking using score-driven hierarchical association strategy between predicted tracklets and objects
Machine vision is one of the major technologies to guarantee intelligent robots’ human-centered embodied intelligence. Especially in the complex dynamic scene involving multi-person, Multi-Object Tracking (MOT), which can accurately identify and ...
Highlights
- Score-driven hierarchical association strategy for irregular movement and dense scenes.
- Motion prediction of occluded objects based on bounding box variation by modeling the motion state.
- The proposed method can attain comparative ...
Gaussian error loss function for image smoothing
Edge-preserving image smoothing plays an important role in the fields of image processing and computational photography, and is widely used for a variety of applications. The edge-preserving filters based on global optimization models have ...
Graphical abstractDisplay Omitted
Highlights
- We propose a loss function ERLF, which possesses strong edge-preserving capability.
- We propose an optimization model based on the ERLF for edge-aware image smoothing.
- We propose an efficient solution to the proposed optimization ...
FHLight: A novel method of indoor scene illumination estimation using improved loss function
In augmented reality tasks, especially in indoor scenes, achieving illumination consistency between virtual objects and real environments is a critical challenge. Currently, mainstream methods are illumination parameters regression and ...
Highlights
- Proposing a two-stage method FHLight to realistically restore scene illumination.
- Mitigating the limitation of SH functions in representing high-frequency lighting.
- Designing the HFIG to effectively integrating low and high ...
MinoritySalMix and adaptive semantic weight compensation for long-tailed classification
In real-world datasets, the widespread presence of a long-tailed distribution often leads models to become overly biased towards majority class samples while ignoring minority class samples. We propose a strategy called MASW (MinoritySalMix and ...
Highlights
- MinoritySalMix fully utilizes the background content in the majority class samples.
- Adaptively adjust the label value composition of two categories in the new sample.
- The MASW model can significantly improve the performance of ...
Feature differences reduction and specific features preserving network for RGB-T salient object detection
In RGB-T salient object detection, effective utilization of the different characteristics of RGB and thermal modalities is essential to achieve accurate detection. Most of the previous methods usually only focus on reducing the differences ...
Highlights
- The network introduces MDRSFPM, which minimizes modality differences while retaining key features.
- Implementation of MGFEM enhances global contextual understanding, essential for improving saliency detection across various scenarios.
Automated anthropometric measurements from 3D point clouds of scanned bodies
- Nahuel E. Garcia-D’Urso,
- Antonio Macia-Lillo,
- Higinio Mora-Mora,
- Jorge Azorin-Lopez,
- Andres Fuster-Guillo
Anthropometry plays a critical role across numerous sectors, particularly within healthcare and fashion, by facilitating the analysis of the human body structure. The significance of anthropometric data cannot be overstated; it is crucial for ...
Graphical abstractDisplay Omitted
Highlights
- Potential applications in healthcare and fashion for precise body analysis.
- Optimized parametric models to extract measurements from unstructured point clouds.
- Method tested on more than 400 bodies, showing high accuracy in various ...
Occlusion-related graph convolutional neural network for multi-object tracking
Multi-Object Tracking (MOT) has recently been improved by Graph Convolutional Neural Networks (GCNNs) for its good performance in characterizing interactive features. However, GCNNs prefer assigning smaller proportions to node features if a node ...
Highlights
- Design an OR-GCNN to fit dense scenes in MOT.
- An interactive similarity module is created based on OR-GCNN.
- Integrate the interactive similarity module with the existing MOT frameworks.
CWGA-Net: Center-Weighted Graph Attention Network for 3D object detection from point clouds
The precision of 3D object detection from unevenly distributed outdoor point clouds is critical in autonomous driving perception systems. Current point-based detectors employ self-attention and graph convolution to establish contextual ...
Highlights
- Establish topological relationships between local points through local graph encoding.
- Enhances central point features by integrating geometric and semantic distances.
- CWGA-Net has better accuracy and good speed.
3DPSR: An innovative approach for pose and shape refinement in 3D human meshes from a single 2D image
In the era of computer vision, 3D human models are gaining a lot of interest in the gaming industry, cloth parsing, avatar creations, and many more applications. In these fields, having a precise 3D human model with accurate shape and pose is ...
Highlights
- The proposed 3DPSR consist of two novel modules; mesh deformation using pose and shape-fitting.
- 3DPSR significantly outperforms state-of-the-art human mesh reconstruction methods on challenging and standard datasets.
- 3D human mesh ...
SVC: Sight view constraint for robust point cloud registration
Partial to Partial Point Cloud Registration (partial PCR) remains a challenging task, particularly when dealing with a low overlap rate. In comparison to the full-to-full registration task, we find that the objective of partial PCR is still not ...
Highlights
- We propose a method (SVC) that could identify incorrect transformations.
- Experiments demonstrate that the SVC applies to both indoor and outdoor scenes.
- We underscore the decision version of the PCR problem as the fundamental ...
SAVE: Encoding spatial interactions for vision transformers
Transformers have achieved impressive performance in visual tasks. Position encoding, which equips vectors (elements of input tokens, queries, keys, or values) with sequence specificity, effectively alleviates the lack of permutation relation in ...
Highlights
- This study unifies positional information with special affine transformations.
- This study proposes SAVE, a new positional paradigm via matrices transformation.
- This study introduces two 2D spatial modes for vision transformers ...
Attention enhanced machine instinctive vision with human-inspired saliency detection
Salient object detection (SOD) enables machines to recognize and accurately segment visually prominent regions in images. Despite recent advancements, existing approaches often lack progressive fusion of low and high-level features, effective ...
Highlights
- Multi-scale feature extraction enhances SOD accuracy using EfficientNet-B7 backbone.
- SOFA module refines spatial features, boosting saliency map precision and sharpness.
- CACR block captures multi-scale context with dilated ...
Text-augmented Multi-Modality contrastive learning for unsupervised visible-infrared person re-identification
Visible-infrared person re-identification holds significant implications for intelligent security. Unsupervised methods can reduce the gap of different modalities without labels. Most previous unsupervised methods only train their models with ...
Highlights
- We introduce textual information to assist in searching homogeneous features.
- We propose two modules to align cross-modality features.
- We propose a new loss to reduce the heterogeneous features in different modality.
- Our ...
DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditions
Fiducial markers are a computer vision tool used for object pose estimation and detection. These markers are highly useful in fields such as industry, medicine and logistics. However, optimal lighting conditions are not always available, and ...
Highlights
- A fiducial marker detection system under difficult lighting, using neural networks.
- A straightforward method to generate synthetic training data.
- A new real-world dataset featuring ArUco markers under difficult lighting conditions.
Display Omitted
RGB-T tracking with frequency hybrid awareness
Recently, impressive progress has been made with transformer-based RGB-T trackers due to the transformer’s effectiveness in capturing low-frequency information (i.e., high-level semantic information). However, some studies have revealed that the ...
Highlights
- We design a novel tracker from a frequency perspective. The tracker has the ability to capture both global and target local information.
- We propose a module for frequency hybrid awareness. This module utilizes the extraction of its own ...
Fine-grained semantic oriented embedding set alignment for text-based person search
Text-based person search aims to retrieve images of a person that are highly semantically relevant to a given textual description. The difficulty of this retrieval task is modality heterogeneity and fine-grained matching. Most existing methods ...
Highlights
- An embedding set alignment module is proposed to extract fine-grained features.
- An adaptive semantic margin loss is introduced for text-image alignment adaptively.
- Extensive experiments on public benchmarks show our method ...
Channel and Spatial Enhancement Network for human parsing
The dominant backbones of neural networks for scene parsing consist of multiple stages, where feature maps in different stages often contain varying levels of spatial and semantic information. High-level features convey more semantics and fewer ...
Highlights
- We propose CSENet addressing semantic and spatial gaps between feature maps.
- Our CSENet consists of CEM for semantic gap and SEM for spatial gap in human parsing.
- Our CSENet is shown to improve the performance of existing ...
TransWild: Enhancing 3D interacting hands recovery in the wild with IoU-guided Transformer
The recovery of 3D interacting hands meshes in the wild (ITW) is crucial for 3D full-body mesh reconstruction, especially when limited 3D annotations are available. The recent ITW interacting hands recovery method brings two hands to a shared 2D ...
Highlights
- A weight-shared IoU-guided Transformer is designed to enhance the feature representation of interacting hands in the wild.
- Augmented ground truth bounding boxes are used during training to reduce the impact of detection variability, ...
Point cloud segmentation neural network with same-type point cloud assistance
This paper proposes neural network architectures for point cloud segmentation, which leverage prior knowledge derived from same-type point clouds. The approach involves concurrent processing of two point clouds: a target point cloud necessitating ...
Highlights
- The same-type point cloud is used to assist in the segmentation of the point cloud.
- A feature combination module combines the features of two point clouds.
- Experiments on benchmark datasets show the approach improves performance.
Camouflaged Object Detection via location-awareness and feature fusion
Camouflaged object detection aims to completely segment objects immersed in their surroundings from the background. However, existing deep learning methods often suffer from the following shortcomings: (1) They have difficulty in accurately ...
Highlights
- We propose a LFNet tailored for COD tasks. It enhances COD performance by mining fine-grained features and implementing a top-down fusion approach.
- We develop a SLM to dynamically capture the structural information of targets from the ...
Machine learning applications in breast cancer prediction using mammography
Breast cancer is the second leading cause of cancer-related deaths among women. Early detection of lumps and subsequent risk assessment significantly improves prognosis. In screening mammography, radiologist interpretation of mammograms is prone ...
Highlights
- ML methods for breast cancer prediction using mamography lack of critical details.
- Re-implementaion of the CNN methods is crucial for advancing the field.
- Reproducibility of current methods is challenging.
- Code-sharing and ...
Non-negative subspace feature representation for few-shot learning in medical imaging
Unlike typical visual scene recognition tasks, where massive datasets are available to train deep neural networks (DNNs), medical image diagnosis using DNNs often faces challenges due to data scarcity. In this paper, we investigate the ...
Highlights
- Revealed SVD drawbacks in medical imaging feature representation for data scarcity.
- Proposed NMF and its variations as a viable alternative to SVD in such scenarios.
- Explored NMF and supervised NMF in the subspace-based few-shot ...
SAFENet: Semantic-Aware Feature Enhancement Network for unsupervised cross-domain road scene segmentation
Unsupervised cross-domain road scene segmentation has attracted substantial interest because of its capability to perform segmentation on new and unlabeled domains, thereby reducing the dependence on expensive manual annotations. This is achieved ...
Highlights
- Use category textual semantics for source-target domain alignment, enhancing feature representation.
- Apply AdaIN with Momentum to align high-dimensional image features, reducing source domain overfitting.
- Implement self-training ...
A multi-label classification method based on transformer for deepfake detection
With the continuous development of hardware and deep learning technologies, existing forgery techniques are capable of more refined facial manipulations, making detection tasks increasingly challenging. Therefore, forgery detection cannot be ...
Graphical abstractDisplay Omitted
Highlights
- A multi-label detection approach enhances deepfake detection of various facial components in images.
- A Detail-Enhanced Attention Module improves feature extraction, boosting detection of subtle forgery traces.
- A new Global-Local ...
CF-SOLT: Real-time and accurate traffic accident detection using correlation filter-based tracking
Traffic accident detection using video surveillance is valuable research work in intelligent transportation systems. It is useful for responding to traffic accidents promptly that can avoid traffic jam or prevent secondary accident. In traffic ...
Highlights
- The CF-SOLT method includes a correlation filter to prevent vehicle ID switching.
- The traffic accident detection method is proposed by tracking occluded vehicles.
- The combination of pedestrian and vehicle behavior reduces accident ...