Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- ArticleNovember 2024
- ArticleNovember 2024
Gaussian Grouping: Segment and Edit Anything in 3D Scenes
AbstractThe recent Gaussian Splatting achieves high-quality and real-time novel-view synthesis of the 3D scenes. However, it is solely concentrated on the appearance and geometry modeling, while lacking in fine-grained object-level scene understanding. To ...
- research-articleJuly 2024
Lightweight image super-resolution via flexible meta pruning
ICML'24: Proceedings of the 41st International Conference on Machine LearningArticle No.: 2495, Pages 60305–60314Lightweight image super-resolution (SR) methods have obtained promising results with moderate model complexity. These approaches primarily focus on a lightweight architecture design, but neglect to further reduce network redundancy. While some model ...
- research-articleJuly 2024
Flexible residual binarization for image super-resolution
ICML'24: Proceedings of the 41st International Conference on Machine LearningArticle No.: 2468, Pages 59731–59740Binarized image super-resolution (SR) has attracted much research attention due to its potential to drastically reduce parameters and operations. However, most binary SR works binarize network weights directly, which hinders high-frequency information ...
-
- research-articleDecember 2023
Real-time motion prediction via heterogeneous polyline transformer with relative pose encoding
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsArticle No.: 2507, Pages 57481–57499The real-world deployment of an autonomous driving system requires its components to run on-board and in real-time, including the motion prediction module that predicts the future trajectories of surrounding traffic participants. Existing agent-centric ...
- research-articleDecember 2023
QuantSR: accurate low-bit quantization for efficient image super-resolution
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsArticle No.: 2483, Pages 56838–56848Low-bit quantization in image super-resolution (SR) has attracted copious attention in recent research due to its ability to reduce parameters and operations significantly. However, many quantized SR models suffer from accuracy degradation compared to ...
- research-articleDecember 2023
BiMatting: efficient video matting via binarization
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsArticle No.: 1876, Pages 43307–43321Real-time video matting on edge devices faces significant computational resource constraints, limiting the widespread use of video matting in applications such as online conferences and short-form video production. Binarization is a powerful compression ...
- research-articleDecember 2023
Segment anything in high quality
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsArticle No.: 1303, Pages 29914–29934The recent Segment Anything Model (SAM) represents a big leap in scaling up segmentation models, allowing for powerful zero-shot capabilities and flexible prompting. Despite being trained with 1.1 billion masks, SAM's mask prediction quality falls short ...
- research-articleDecember 2023
QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple Object Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 45, Issue 12Pages 15380–15393https://doi.org/10.1109/TPAMI.2023.3301975Similarity learning has been recognized as a crucial step for object tracking. However, existing multiple object tracking methods only use sparse ground truth matching as the training objective, while ignoring the majority of the informative regions in ...
- research-articleNovember 2023
Unifying Flow, Stereo and Depth Estimation
IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 45, Issue 11Pages 13941–13958https://doi.org/10.1109/TPAMI.2023.3298645We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images. Unlike previous specialized architectures for each specific task, we ...
- ArticleSeptember 2023
COOLer: Class-Incremental Learning for Appearance-Based Multiple Object Tracking
AbstractContinual learning allows a model to learn multiple tasks sequentially while retaining the old knowledge without the training data of the preceding tasks. This paper extends the scope of continual learning research to class-incremental learning ...
- research-articleJuly 2023
BiBench: benchmarking and analyzing network binarization
ICML'23: Proceedings of the 40th International Conference on Machine LearningArticle No.: 1177, Pages 28351–28388Network binarization emerges as one of the most promising compression approaches offering extraordinary computation and memory savings by minimizing the bit-width. However, recent research has shown that applying existing binarization algorithms to ...
- research-articleJuly 2023
Scaling vision transformers to 22 billion parameters
- Mostafa Dehghani,
- Josip Djolonga,
- Basil Mustafa,
- Piotr Padlewski,
- Jonathan Heek,
- Justin Gilmer,
- Andreas Steiner,
- Mathilde Caron,
- Robert Geirhos,
- Ibrahim Alabdulmohsin,
- Rodolphe Jenatton,
- Lucas Beyer,
- Michael Tschannen,
- Anurag Arnab,
- Xiao Wang,
- Carlos Riquelme,
- Matthias Minderer,
- Joan Puigcerver,
- Utku Evci,
- Manoj Kumar,
- Sjoerd Van Steenkiste,
- Gamaleldin F. Elsayed,
- Aravindh Mahendran,
- Fisher Yu,
- Avital Oliver,
- Fantine Huot,
- Jasmijn Bastings,
- Mark Patrick Collier,
- Alexey A. Gritsenko,
- Vighnesh Birodkar,
- Cristina Vasconcelos,
- Yi Tay,
- Thomas Mensink,
- Alexander Kolesnikov,
- Filip Pavetić,
- Dustin Tran,
- Thomas Kipf,
- Mario Lučić,
- Xiaohua Zhai,
- Daniel Keysers,
- Jeremiah Harmsen,
- Neil Houlsby
ICML'23: Proceedings of the 40th International Conference on Machine LearningArticle No.: 296, Pages 7480–7512The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and ...
- ArticleOctober 2022
The Tenth Visual Object Tracking VOT2022 Challenge Results
- Matej Kristan,
- Aleš Leonardis,
- Jiří Matas,
- Michael Felsberg,
- Roman Pflugfelder,
- Joni-Kristian Kämäräinen,
- Hyung Jin Chang,
- Martin Danelljan,
- Luka Čehovin Zajc,
- Alan Lukežič,
- Ondrej Drbohlav,
- Johanna Björklund,
- Yushan Zhang,
- Zhongqun Zhang,
- Song Yan,
- Wenyan Yang,
- Dingding Cai,
- Christoph Mayer,
- Gustavo Fernández,
- Kang Ben,
- Goutam Bhat,
- Hong Chang,
- Guangqi Chen,
- Jiaye Chen,
- Shengyong Chen,
- Xilin Chen,
- Xin Chen,
- Xiuyi Chen,
- Yiwei Chen,
- Yu-Hsi Chen,
- Zhixing Chen,
- Yangming Cheng,
- Angelo Ciaramella,
- Yutao Cui,
- Benjamin Džubur,
- Mohana Murali Dasari,
- Qili Deng,
- Debajyoti Dhar,
- Shangzhe Di,
- Emanuel Di Nardo,
- Daniel K. Du,
- Matteo Dunnhofer,
- Heng Fan,
- Zhenhua Feng,
- Zhihong Fu,
- Shang Gao,
- Rama Krishna Gorthi,
- Eric Granger,
- Q. H. Gu,
- Himanshu Gupta,
- Jianfeng He,
- Keji He,
- Yan Huang,
- Deepak Jangid,
- Rongrong Ji,
- Cheng Jiang,
- Yingjie Jiang,
- Felix Järemo Lawin,
- Ze Kang,
- Madhu Kiran,
- Josef Kittler,
- Simiao Lai,
- Xiangyuan Lan,
- Dongwook Lee,
- Hyunjeong Lee,
- Seohyung Lee,
- Hui Li,
- Ming Li,
- Wangkai Li,
- Xi Li,
- Xianxian Li,
- Xiao Li,
- Zhe Li,
- Liting Lin,
- Haibin Ling,
- Bo Liu,
- Chang Liu,
- Si Liu,
- Huchuan Lu,
- Rafael M. O. Cruz,
- Bingpeng Ma,
- Chao Ma,
- Jie Ma,
- Yinchao Ma,
- Niki Martinel,
- Alireza Memarmoghadam,
- Christian Micheloni,
- Payman Moallem,
- Le Thanh Nguyen-Meidine,
- Siyang Pan,
- ChangBeom Park,
- Danda Paudel,
- Matthieu Paul,
- Houwen Peng,
- Andreas Robinson,
- Litu Rout,
- Shiguang Shan,
- Kristian Simonato,
- Tianhui Song,
- Xiaoning Song,
- Chao Sun,
- Jingna Sun,
- Zhangyong Tang,
- Radu Timofte,
- Chi-Yi Tsai,
- Luc Van Gool,
- Om Prakash Verma,
- Dong Wang,
- Fei Wang,
- Liang Wang,
- Liangliang Wang,
- Lijun Wang,
- Limin Wang,
- Qiang Wang,
- Gangshan Wu,
- Jinlin Wu,
- Xiaojun Wu,
- Fei Xie,
- Tianyang Xu,
- Wei Xu,
- Yong Xu,
- Yuanyou Xu,
- Wanli Xue,
- Zizheng Xun,
- Bin Yan,
- Dawei Yang,
- Jinyu Yang,
- Wankou Yang,
- Xiaoyun Yang,
- Yi Yang,
- Yichun Yang,
- Zongxin Yang,
- Botao Ye,
- Fisher Yu,
- Hongyuan Yu,
- Jiaqian Yu,
- Qianjin Yu,
- Weichen Yu,
- Kang Ze,
- Jiang Zhai,
- Chengwei Zhang,
- Chunhu Zhang,
- Kaihua Zhang,
- Tianzhu Zhang,
- Wenkang Zhang,
- Zhibin Zhang,
- Zhipeng Zhang,
- Jie Zhao,
- Shaochuan Zhao,
- Feng Zheng,
- Haixia Zheng,
- Min Zheng,
- Bineng Zhong,
- Jiawen Zhu,
- Xuefeng Zhu,
- Yueting Zhuang
AbstractThe Visual Object Tracking challenge VOT2022 is the tenth annual tracker benchmarking activity organized by the VOT initiative. Results of 93 entries are presented; many are state-of-the-art trackers published at major computer vision conferences ...
- ArticleOctober 2022
SAGA: Stochastic Whole-Body Grasping with Contact
AbstractThe synthesis of human grasping has numerous applications including AR/VR, video games and robotics. While methods have been proposed to generate realistic hand–object interaction for object grasping and manipulation, these typically only consider ...
- ArticleOctober 2022
Tracking Every Thing in the Wild
AbstractCurrent multi-category Multiple Object Tracking (MOT) metrics use class labels to group tracking results for per-class evaluation. Similarly, MOT methods typically only associate objects with the same class predictions. These two prevalent ...
- ArticleOctober 2022
TACS: Taxonomy Adaptive Cross-Domain Semantic Segmentation
AbstractTraditional domain adaptive semantic segmentation addresses the task of adapting a model to a novel target domain under limited or no additional supervision. While tackling the input domain gap, the standard domain adaptation settings assume no ...
- ArticleOctober 2022
Learning Online Multi-sensor Depth Fusion
- Erik Sandström,
- Martin R. Oswald,
- Suryansh Kumar,
- Silvan Weder,
- Fisher Yu,
- Cristian Sminchisescu,
- Luc Van Gool
AbstractMany hand-held or mixed reality devices are used with a single sensor for 3D reconstruction, although they often comprise multiple sensors. Multi-sensor depth fusion is able to substantially improve the robustness and accuracy of 3D reconstruction ...
- ArticleOctober 2022
Video Mask Transfiner for High-Quality Video Instance Segmentation
AbstractWhile Video Instance Segmentation (VIS) has seen rapid progress, current approaches struggle to predict high-quality masks with accurate boundary details. Moreover, the predicted segmentations often fluctuate over time, suggesting that temporal ...