Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleSeptember 2024
Unpacking the Gap Box Against Data-Free Knowledge Distillation
IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 46, Issue 9Pages 6280–6291https://doi.org/10.1109/TPAMI.2024.3379505Data-free knowledge distillation (DFKD) improves the student model (S) by mimicking the class probability from a pre-trained teacher model (T) without training data. Under such setting, an ideal scenario is that T can help generate ”good” ...
- research-articleMay 2024
See Widely, Think Wisely: Toward Designing a Generative Multi-agent System to Burst Filter Bubbles
CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing SystemsArticle No.: 484, Pages 1–24https://doi.org/10.1145/3613904.3642545The proliferation of AI-powered search and recommendation systems has accelerated the formation of “filter bubbles” that reinforce people’s biases and narrow their perspectives. Previous research has attempted to address this issue by increasing the ...
- research-articleFebruary 2024
Disentangled partial label learning
AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial IntelligenceArticle No.: 1228, Pages 11007–11015https://doi.org/10.1609/aaai.v38i10.28976Partial label learning (PLL) induces a multi-class classifier from training examples each associated with a set of candidate labels, among which only one is valid. The formation of real-world data typically arises from heterogeneous entanglement of ...
- research-articleJanuary 2024
Toward Egocentric Compositional Action Anticipation with Adaptive Semantic Debiasing
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 5Article No.: 122, Pages 1–21https://doi.org/10.1145/3633333Predicting the unknown from the first-person perspective is expected as a necessary step toward machine intelligence, which is essential for practical applications including autonomous driving and robotics. As a human-level task, egocentric action ...
- research-articleJanuary 2024
Domain-Aware Graph Network for Bridging Multi-Source Domain Adaptation
IEEE Transactions on Multimedia (TOM), Volume 26Pages 7210–7224https://doi.org/10.1109/TMM.2024.3361729Domain adaptation (DA) addresses the challenge of distribution discrepancy between the training and test data, while multi-source domain adaptation (MSDA) is particularly appealing for realistic scenarios. With the emergence of extensive unlabeled ...
-
- research-articleJanuary 2024
Convolution-Enhanced Bi-Branch Adaptive Transformer With Cross-Task Interaction for Food Category and Ingredient Recognition
IEEE Transactions on Image Processing (TIP), Volume 33Pages 2572–2586https://doi.org/10.1109/TIP.2024.3374211Recently, visual food analysis has received more and more attention in the computer vision community due to its wide application scenarios, e.g., diet nutrition management, smart restaurant, and personalized diet recommendation. Considering that food ...
- research-articleDecember 2023
Learning from biased soft labels
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsArticle No.: 2603, Pages 59566–59584Since the advent of knowledge distillation, many researchers have been intrigued by the dark knowledge hidden in the soft labels generated by the teacher model. This prompts us to scrutinize the circumstances under which these soft labels are effective. ...
- research-articleFebruary 2023
Graph Attention Transformer Network for Multi-label Image Classification
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 19, Issue 4Article No.: 150, Pages 1–16https://doi.org/10.1145/3578518Multi-label classification aims to recognize multiple objects or attributes from images. The key to solving this issue relies on effectively characterizing the inter-label correlations or dependencies, which bring the prevailing graph neural network. ...
- research-articleFebruary 2023
Balanced masking strategy for multi-label image classification
AbstractData imbalance is an essential issue in multi-label image classification that may reduce the model’s generalization ability. Unlike oversampling or undersampling in single-label image classification, simply removing or repeating some images in ...
- surveyJanuary 2023
A Survey on Video Moment Localization
ACM Computing Surveys (CSUR), Volume 55, Issue 9Article No.: 188, Pages 1–37https://doi.org/10.1145/3556537Video moment localization, also known as video moment retrieval, aims to search a target segment within a video described by a given natural language query. Beyond the task of temporal action localization whereby the target actions are pre-defined, video ...
- research-articleOctober 2022
Delving Globally into Texture and Structure for Image Inpainting
MM '22: Proceedings of the 30th ACM International Conference on MultimediaPages 1270–1278https://doi.org/10.1145/3503161.3548265Image inpainting has achieved remarkable progress and inspired abundant methods, where the critical bottleneck is identified as how to fulfill the high-frequency structure and low-frequency texture information on the masked regions with semantics. To ...
- research-articleOctober 2022
Self-Supervised Graph Neural Network for Multi-Source Domain Adaptation
MM '22: Proceedings of the 30th ACM International Conference on MultimediaPages 3907–3916https://doi.org/10.1145/3503161.3548121Domain adaptation (DA) tries to tackle the scenarios when the test data does not fully follow the same distribution of the training data, and multi-source domain adaptation (MSDA) is very attractive for real world applications. By learning from large-...
- research-articleFebruary 2022
Hierarchical Deep Click Feature Prediction for Fine-Grained Image Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 44, Issue 2Pages 563–578https://doi.org/10.1109/TPAMI.2019.2932058The click feature of an image, defined as the user click frequency vector of the image on a predefined word vocabulary, is known to effectively reduce the semantic gap for fine-grained image recognition. Unfortunately, user click frequency data are ...
- research-articleJanuary 2022
Hierarchical User Intent Graph Network for Multimedia Recommendation
IEEE Transactions on Multimedia (TOM), Volume 24Pages 2701–2712https://doi.org/10.1109/TMM.2021.3088307Understanding user preference on item context is the key to acquire a high-quality multimedia recommendation. Typically, the pre-existing features of items are derived from pre-trained models (e.g. visual features of micro-videos extracted from some ...
- research-articleOctober 2021
HoloBoard: a Large-format Immersive Teaching Board based on pseudo HoloGraphics
UIST '21: The 34th Annual ACM Symposium on User Interface Software and TechnologyPages 441–456https://doi.org/10.1145/3472749.3474761In this paper, we present HoloBoard, an interactive large-format pseduo-holographic display system for lecture based classes. With its unique properties of immersive visual display and transparent screen, we designed and implemented a rich set of novel ...
- abstractSeptember 2021
MMPT'21: International Joint Workshop on Multi-Modal Pre-Training for Multimedia Understanding
ICMR '21: Proceedings of the 2021 International Conference on Multimedia RetrievalPages 694–695https://doi.org/10.1145/3460426.3470947Pre-training has been an emerging topic that provides a way to learn strong representation in many fields (e.g., natural language processing, computing vision). In the last few years, we have witnessed many research works on multi-modal pre-training ...
- proceedingAugust 2021
MMPT '21: Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia Understanding
It is our great pleasure to welcome you to the ICMR 2021 Workshop on Multi-Modal Pre- Training for Multimedia Understanding - MMPT 2021.
The First International Joint Workshop on Multi-Modal Pre-Training for Multimedia Understanding aims to gather peer ...
- research-articleOctober 2020
An Egocentric Action Anticipation Framework via Fusing Intuition and Analysis
MM '20: Proceedings of the 28th ACM International Conference on MultimediaPages 402–410https://doi.org/10.1145/3394171.3413964In this paper, we focus on egocentric action anticipation from videos, which enables various applications, such as helping intelligent wearable assistants understand users' needs and enhance their capabilities in the interaction process. It requires ...
- research-articleMarch 2020
CDbin: Compact Discriminative Binary Descriptor Learned With Efficient Neural Network
IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 30, Issue 3Pages 862–874https://doi.org/10.1109/TCSVT.2019.2896095As an important computer vision task, image matching requires efficient and discriminative local descriptors. Most of the existing descriptors like SIFT and ORB are hand-crafted; therefore it is necessary to study more optimized descriptors through end-to-...
- research-articleDecember 2019
Image Recognition by Predicted User Click Feature With Multidomain Multitask Transfer Deep Network
IEEE Transactions on Image Processing (TIP), Volume 28, Issue 12Pages 6047–6062https://doi.org/10.1109/TIP.2019.2921861The click feature of an image, defined as a user click count vector based on click data, has been demonstrated to be effective for reducing the semantic gap for image recognition. Unfortunately, most of the traditional image recognition datasets do not ...