More Web Proxy on the site http://driver.im/

research-article

E2Net: Excitative-Expansile Learning for Weakly Supervised Object Localization

Authors:

Rongrong JiAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 573 - 581

https://doi.org/10.1145/3474085.3475211

Published: 17 October 2021 Publication History

Abstract

Weakly supervised object localization (WSOL) has gained recent popularity, which seeks to train localizers with only image-level labels. However, due to relying heavily on classification objective for training, prevailing WSOL methods only localize discriminative parts of object, ignoring other useful information, such as the wings of a bird, and suffer from severe rotation variations. Moreover, learning object localization imposes CNNs to attend non-salient regions under weak supervision, which may negatively influence image classification results. To address these challenges, this paper proposes a novel end-to-end Excitation-Expansion network, coined as E$^2$Net, to localize entire objects with only image-level labels, which served as the base of most multimedia tasks. The proposed E$^2$Net consists of two key components: Maxout-Attention Excitation (MAE) and Orientation-Sensitive Expansion (OSE). Firstly, MAE module aims to activate non-discriminative localization features while simultaneously recovering discriminative classification cues. To this end, we couple erasing strategy with maxout learning efficiently to facilitate entire-object localization without hurting classification accuracy. Secondly, to address rotation variations, the proposed OSE module expands less salient object parts along with all possible orientations. Particularly, OSE module dynamically combines selective attention banks from various orientated expansions of receptive-field, which introduces additional multi-parallel localization heads. Extensive experiments on ILSVRC 2012 and CUB-200-2011 demonstrate that the proposed E$^2$Net outperforms the previous state-of-the-art WSOL methods and also significantly improves classification performance.

References

[1]

Bowen Cheng, Yunchao Wei, Honghui Shi, Rogerio Feris, Jinjun Xiong, and Thomas Huang. 2018. Revisiting RCNN: On Awakening the Classification Power of Faster RCNN. In ECCV.

[2]

Junsuk Choe and Hyunjung Shim. 2019. Attention-based dropout layer for weakly supervised object localization. In CVPR.

[3]

Qi Wu Qing Du Wei Jia Fen Liu, Guanghui Xu and Mingkui Tan. 2020. Cascade Reasoning Network for Text-based Visual Question Answering. In ACMMM.

[4]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.

[5]

Qibin Hou, Daquan Zhou, and Jiashi Feng. 2021. Coordinate attention for efficient mobile network design. arXiv (2021).

[6]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In CVPR.

[7]

Soumil Kanwal, Vineet Mehta, and Abhinav Dhall. 2020. Large Scale Hierarchical Anomaly Detection and Temporal Localization. In ACMMM.

Digital Library

[8]

Hei Law and Jia Deng. 2018. Cornernet: Detecting objects as paired keypoints. In ECCV.

[9]

Guibiao Liao, Wei Gao, Qiuping Jiang, Ronggang Wang, and Ge Li. 2020. MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection. In ACMMM.

Digital Library

[10]

Mingbao Lin, Rongrong Ji, Yan Wang, Yichen Zhang, Baochang Zhang, Yonghong Tian, and Ling Shao. 2020 a. Hrank: Filter pruning using high-rank feature map. In CVPR.

[11]

Mingbao Lin, Rongrong Ji, Yuxin Zhang, Baochang Zhang, Yongjian Wu, and Yonghongg Tian. 2020 b. Channel Pruning via Automatic Structure Search. In IJCAI.

[12]

Fei Liu, Jing Liu, Richang Hong, and Hanqing Lu. 2019. Erasing-based attention learning for visual question answering. In ACMMM.

Digital Library

[13]

Jinjie Mai, Meng Yang, and Wenfeng Luo. 2020. Erasing integrated learning: A simple yet effective approach for weakly supervised object localization. In CVPR.

[14]

Liang Peng, Yang Yang, Zheng Wang, Xiao Wu, and Zi Huang. 2019. Cra-net: Composed relation attention network for visual question answering. In ACMMM.

Digital Library

[15]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. IJCV (2015).

Digital Library

[16]

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV.

[17]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv (2014).

[18]

Krishna Kumar Singh and Yong Jae Lee. 2017. Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In ICCV.

[19]

Ming Sun, Yuchen Yuan, Feng Zhou, and Errui Ding. 2018. Multi-attention multi-class constraint for fine-grained image recognition. In ECCV.

[20]

Mingxing Tan, Ruoming Pang, and Quoc V Le. 2020. Efficientdet: Scalable and efficient object detection. In CVPR.

[21]

Eu Wern Teh, Mrigank Rochan, and Yang Wang. 2016. Attention Networks for Weakly Supervised Object Localization. In BMVC.

[22]

Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The caltech-ucsd birds-200--2011 dataset. (2011).

[23]

Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. 2017. Residual attention network for image classification. In CVPR.

[24]

Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In CVPR.

[25]

Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. Cbam: Convolutional block attention module. In ECCV.

[26]

Chenfei Wu, Jinlai Liu, Xiaojie Wang, and Xuan Dong. 2018. Object-difference attention: A simple relational attention for visual question answering. In ACMMM.

Digital Library

[27]

Haolan Xue, Chang Liu, Fang Wan, Jianbin Jiao, Xiangyang Ji, and Qixiang Ye. 2019. Danet: Divergent activation for weakly supervised object localization. In ICCV.

[28]

Ke Yang, Dongsheng Li, and Yong Dou. 2019. Towards precise end-to-end weakly supervised object detection network. In ICCV.

[29]

Seunghan Yang, Yoonhyung Kim, Youngeun Kim, and Changick Kim. 2020. Combinational Class Activation Maps for Weakly Supervised Object Localization. In WACV.

[30]

Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. 2019. Cutmix: Regularization strategy to train strong classifiers with localizable features. In ICCV.

[31]

Beichen Zhang, Liang Li, Li Su, Shuhui Wang, Jincan Deng, Zheng-Jun Zha, and Qingming Huang. 2020 a. Structural Semantic Adversarial Active Learning for Image Captioning. In ACMMM.

Digital Library

[32]

Miao Zhang, Yu Zhang, Yongri Piao, Beiqi Hu, and Huchuan Lu. 2020 d. Feature reintegration over differential treatment: A top-down and adaptive fusion network for RGB-D salient object detection. In ACMMM.

Digital Library

[33]

Wenqiao Zhang, Xin Eric Wang, Siliang Tang, Haizhou Shi, Haochen Shi, Jun Xiao, Yueting Zhuang, and William Yang Wang. 2020 b. Relational Graph Learning for Grounded Video Description Generation. In ACMMM.

Digital Library

[34]

Xiaolin Zhang, Yunchao Wei, Jiashi Feng, Yi Yang, and Thomas S Huang. 2018a. Adversarial complementary learning for weakly supervised object localization. In CVPR.

[35]

Xiaolin Zhang, Yunchao Wei, Guoliang Kang, Yi Yang, and Thomas Huang. 2018b. Self-produced guidance for weakly-supervised object localization. In ECCV.

[36]

Xiaolin Zhang, Yunchao Wei, and Yi Yang. 2020 c. Inter-image communication for weakly supervised localization. arXiv (2020).

[37]

Yuanyi Zhong, Jianfeng Wang, Jian Peng, and Lei Zhang. 2020. Anchor box optimization for object detection. In WACV.

[38]

Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In CVPR.

[39]

Peiqin Zhuang, Yali Wang, and Yu Qiao. 2020. Learning Attentive Pairwise Interaction for Fine-Grained Classification. In AAAI.

Cited By

Zhu LShe QChen QRen QLu Y(2024)Boosting Weakly Supervised Object Localization and Segmentation With Domain AdaptionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.341103646:12(8680-8695)Online publication date: Dec-2024
https://doi.org/10.1109/TPAMI.2024.3411036
Pang YZhang HZhu LLiu DLiu L(2024)Self-similarity guided probabilistic embedding matching based on transformer for occluded person re-identificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121504237:PBOnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121504
Park SLee TLee YKang BEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)FDCNet: Feature Drift Compensation Network for Class-Incremental Weakly Supervised Object LocalizationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612450(2045-2053)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612450
Show More Cited By

Index Terms

E2Net: Excitative-Expansile Learning for Weakly Supervised Object Localization
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Weakly Supervised Object Localization with Noisy-Label Learning
Pattern Recognition and Computer Vision
Abstract
A novel perspective for Weakly Supervised object localization is proposed in this paper. Most recent pseudo-label-based methods only consider how to get better pseudo-labels and do not consider how to apply these imperfect labels properly. We ...
Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization
Computer Vision – ECCV 2022
Abstract
Classification activation map (CAM), utilizing the classification structure to generate pixel-wise localization maps, is a crucial mechanism for weakly supervised object localization (WSOL). However, CAM directly uses the classifier trained on ...
Multi-fold MIL Training for Weakly Supervised Object Localization
CVPR '14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition

Object category localization is a challenging problem in computer vision. Standard supervised training requires bounding box annotations of object instances. This time-consuming annotation process is sidestepped in weakly supervised learning. In this ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the Fundamental Research Funds for the central universities
the National Science Fund for Distinguished Young Scholars
the National Natural Science Foundation of China
Guangdong Basic and Applied Basic Research Foundation

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
221
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhu LShe QChen QRen QLu Y(2024)Boosting Weakly Supervised Object Localization and Segmentation With Domain AdaptionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.341103646:12(8680-8695)Online publication date: Dec-2024
https://doi.org/10.1109/TPAMI.2024.3411036
Pang YZhang HZhu LLiu DLiu L(2024)Self-similarity guided probabilistic embedding matching based on transformer for occluded person re-identificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121504237:PBOnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121504
Park SLee TLee YKang BEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)FDCNet: Feature Drift Compensation Network for Class-Incremental Weakly Supervised Object LocalizationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612450(2045-2053)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612450
Zhu LShe QChen QMeng XGeng MJin LZhang YRen QLu Y(2023)Background-Aware Classification Activation Map for Weakly Supervised Object LocalizationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.330962145:12(14175-14191)Online publication date: 29-Aug-2023
https://dl.acm.org/doi/10.1109/TPAMI.2023.3309621
Chen ZDing JCao LShen YZhang SJiang GJi R(2023)Category-aware Allocation Transformer for Weakly Supervised Object Localization2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00611(6620-6629)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICCV51070.2023.00611
Tan LDai PJi RWu YMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)Dynamic Prototype Mask for Occluded Person Re-IdentificationProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3547764(531-540)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3547764

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents