Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleMay 2024
PersonMAE: Person Re-Identification Pre-Training With Masked AutoEncoders
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10029–10040https://doi.org/10.1109/TMM.2024.3405649Pre-training is playing an increasingly important role in learning generic feature representation for Person Re-identification (ReID). We argue that a high-quality ReID representation should have three properties, namely, multi-level awareness, occlusion ...
- research-articleMay 2024
MuJo-SF: Multimodal Joint Slot Filling for Attribute Value Prediction of E-Commerce Commodities
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10354–10366https://doi.org/10.1109/TMM.2024.3407667Supplementing product attribute information is a critical step for E-commerce platforms, which further benefits various downstream tasks, including product recommendation, product search, and product knowledge graph construction. Intuitively, the visual ...
- research-articleMay 2024
DanceComposer: Dance-to-Music Generation Using a Progressive Conditional Music Generator
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10237–10250https://doi.org/10.1109/TMM.2024.3405734A wonderful piece of music is the essence and soul of dance, which motivates the study of automatic music generation for dance. To create appropriate music from dance, cross-modal correlations between dance and music such as rhythm and style, should be ...
- research-articleMay 2024
Difference-Aware Distillation for Semantic Segmentation
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10069–10080https://doi.org/10.1109/TMM.2024.3405619In recent years, various distillation methods for semantic segmentation have been proposed. However, these methods typically train the student model to imitate the intermediate features or logits of the teacher model directly, thereby overlooking the high-...
- research-articleMay 2024
SGDM: An Adaptive Style-Guided Diffusion Model for Personalized Text to Image Generation
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9804–9813https://doi.org/10.1109/TMM.2024.3399075The existing personalized text-to-image generation models face issues such as repeated training and insufficient generalization capabilities. We present an adaptive Style-Guided Diffusion Model (SGDM). When provided with a set of stylistically consistent ...
-
- research-articleMay 2024
UniDCP: Unifying Multiple Medical Vision-Language Tasks via Dynamic Cross-Modal Learnable Prompts
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9736–9748https://doi.org/10.1109/TMM.2024.3397191Medical vision-language pre-training (Med-VLP) models have recently accelerated the fast-growing medical diagnostics application. However, most Med-VLP models learn task-specific representations independently from scratch, thereby leading to great ...
- research-articleMay 2024
DSIS-DPR:Structured Instance Segmentation and Diffusion Prior Refinement for Dental Anatomy Learning
- Xianyun Wang,
- Linhong Wang,
- Zhenchen Yang,
- Jiacong Zhou,
- Yuchen Zheng,
- Feng Chen,
- Richang Hong,
- Jun Yu,
- Fan Yang
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9464–9476https://doi.org/10.1109/TMM.2024.3394777Instance segmentation in medical imaging plays a crucial role in clinical diagnostic tasks, and have shown promising performance in practical applications. In this article, we discuss a more fine-grained instance segmentation task: dental structured ...
- research-articleMay 2024
A Category-Aware Curriculum Learning for Data-Free Knowledge Distillation
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9603–9618https://doi.org/10.1109/TMM.2024.3395844Constructing effective proxy data is one of the core challenges in data-free knowledge distillation. The existing models ignore the influence of the category entanglement of the generated data on the distillation. To alleviate this issue, imitating the ...
- research-articleApril 2024
Music-Driven Choreography Based on Music Feature Clusters and Dynamic Programming
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9330–9341https://doi.org/10.1109/TMM.2024.3390232Generating choreography from music poses a significant challenge. Conventional dance generation methods are limited by only being able to match specific dance movements to music with corresponding rhythms, restricting the utilization of existing dance ...
- research-articleApril 2024
PGCN: Pyramidal Graph Convolutional Network for EEG Emotion Recognition
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9070–9082https://doi.org/10.1109/TMM.2024.3385676Emotion recognition is essential in the diagnosis and rehabilitation of various mental diseases. In the last decade, electroencephalogram (EEG)-based emotion recognition has been intensively investigated due to its prominative accuracy and reliability, ...
- research-articleApril 2024
Deepfake Detection Fighting Against Noisy Label Attack
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9047–9059https://doi.org/10.1109/TMM.2024.3385286The face manipulation technique such as Deepfake has been widely used to create realistic faces, which raises growing concerns in the community. Based on the correct labeled data, the current Deepfake detectors are mostly trained on the clean dataset, ...
- research-articleMarch 2024
Cross-Domain Low-Dose CT Image Denoising With Semantic Preservation and Noise Alignment
IEEE Transactions on Multimedia (TOM), Volume 26Pages 8771–8782https://doi.org/10.1109/TMM.2024.3382509Deep learning (DL)-based Low-dose CT (LDCT) image denoising methods may face domain shift problem, where data from different domains (i.e., hospitals) may have similar anatomical regions but exhibit different intrinsic noise characteristics. Therefore, we ...
- research-articleMay 2024
Self-Similarity Prior Distillation for Unsupervised Remote Physiological Measurement
IEEE Transactions on Multimedia (TOM), Volume 26Pages 10290–10305https://doi.org/10.1109/TMM.2024.3405720Remote photoplethysmography (rPPG) is a non-invasive technique that aims to capture subtle variations in facial pixels caused by changes in blood volume resulting from cardiac activities. Most existing unsupervised methods for rPPG tasks focus on the ...
- research-articleMay 2024
TextAdapter: Self-Supervised Domain Adaptation for Cross-Domain Text Recognition
IEEE Transactions on Multimedia (TOM), Volume 26Pages 9854–9865https://doi.org/10.1109/TMM.2024.3400669Text recognition remains challenging, primarily due to the scarcity of annotated real data or the hard labor to annotate large-scale real data. Most existing solutions rely on synthetic training data, where the synthetic-to-real domain gaps limit the ...
- research-articleApril 2024
Context-Guided Black-Box Attack for Visual Tracking
IEEE Transactions on Multimedia (TOM), Volume 26Pages 8824–8835https://doi.org/10.1109/TMM.2024.3382473With the recent advancement of deep neural networks, visual tracking has achieved substantial progress in tracking accuracy. However, the robustness and security of tracking methods developed based on current deep models have not been thoroughly explored, ...
- research-articleMarch 2024
TTS: Hilbert Transform-Based Generative Adversarial Network for Tattoo and Scene Text Spotting
- Ayan Banerjee,
- Shivakumara Palaiahnakote,
- Umapada Pal,
- Apostolos Antonacopoulos,
- Tong Lu,
- Josep Llados Canet
IEEE Transactions on Multimedia (TOM), Volume 26Pages 8226–8241https://doi.org/10.1109/TMM.2024.3378458Text spotting in natural scenes is of increasing interest and significance due to its critical role in several applications, such as visual question answering, named entity recognition and event rumor detection on social media. One of the newly emerging ...
- research-articleOctober 2023
Multi-Task Paired Masking With Alignment Modeling for Medical Vision-Language Pre-Training
IEEE Transactions on Multimedia (TOM), Volume 26Pages 4706–4721https://doi.org/10.1109/TMM.2023.3325965In recent years, the growing demand for medical imaging diagnosis has placed a significant burden on radiologists. As a solution, Medical Vision-Language Pre-training (Med-VLP) methods have been proposed to learn universal representations from medical ...
- research-articleOctober 2023
The Beauty of Repetition: An Algorithmic Composition Model With Motif-Level Repetition Generator and Outline-to-Music Generator in Symbolic Music Generation
IEEE Transactions on Multimedia (TOM), Volume 26Pages 4320–4333https://doi.org/10.1109/TMM.2023.3321495Most musical compositions utilize repetition as a fundamental element to create captivating aesthetic experiences. However, the potential of repetition in machine-learning-based algorithmic composition has not been thoroughly investigated. This article ...
- research-articleSeptember 2023
Reversible Data Hiding-Based Contrast Enhancement With Multi-Group Stretching for ROI of Medical Image
IEEE Transactions on Multimedia (TOM), Volume 26Pages 3909–3923https://doi.org/10.1109/TMM.2023.3318048Reversible data hiding-based contrast enhancement (RDHCE) can be used in contrast enhancement for medical images, and it has been a popular research topic in recent years. However, the existing RDHCE methods suffer from the problem of inaccurate ...
- research-articleSeptember 2023
Distortion-Aware Self-Supervised Indoor 360<inline-formula><tex-math notation="LaTeX">$^{\circ }$</tex-math></inline-formula> Depth Estimation via Hybrid Projection Fusion and Structural Regularities
IEEE Transactions on Multimedia (TOM), Volume 26Pages 3998–4011https://doi.org/10.1109/TMM.2023.3318470Owing to the rapid development of emerging 360<inline-formula><tex-math notation="LaTeX">$^{\circ }$</tex-math></inline-formula> panoramic imaging techniques, indoor 360<inline-formula><tex-math notation="LaTeX">$^{\circ }$</tex-math></inline-formula> ...