[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3474085.3475211acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

E2Net: Excitative-Expansile Learning for Weakly Supervised Object Localization

Published: 17 October 2021 Publication History

Abstract

Weakly supervised object localization (WSOL) has gained recent popularity, which seeks to train localizers with only image-level labels. However, due to relying heavily on classification objective for training, prevailing WSOL methods only localize discriminative parts of object, ignoring other useful information, such as the wings of a bird, and suffer from severe rotation variations. Moreover, learning object localization imposes CNNs to attend non-salient regions under weak supervision, which may negatively influence image classification results. To address these challenges, this paper proposes a novel end-to-end Excitation-Expansion network, coined as E$^2$Net, to localize entire objects with only image-level labels, which served as the base of most multimedia tasks. The proposed E$^2$Net consists of two key components: Maxout-Attention Excitation (MAE) and Orientation-Sensitive Expansion (OSE). Firstly, MAE module aims to activate non-discriminative localization features while simultaneously recovering discriminative classification cues. To this end, we couple erasing strategy with maxout learning efficiently to facilitate entire-object localization without hurting classification accuracy. Secondly, to address rotation variations, the proposed OSE module expands less salient object parts along with all possible orientations. Particularly, OSE module dynamically combines selective attention banks from various orientated expansions of receptive-field, which introduces additional multi-parallel localization heads. Extensive experiments on ILSVRC 2012 and CUB-200-2011 demonstrate that the proposed E$^2$Net outperforms the previous state-of-the-art WSOL methods and also significantly improves classification performance.

References

[1]
Bowen Cheng, Yunchao Wei, Honghui Shi, Rogerio Feris, Jinjun Xiong, and Thomas Huang. 2018. Revisiting RCNN: On Awakening the Classification Power of Faster RCNN. In ECCV.
[2]
Junsuk Choe and Hyunjung Shim. 2019. Attention-based dropout layer for weakly supervised object localization. In CVPR.
[3]
Qi Wu Qing Du Wei Jia Fen Liu, Guanghui Xu and Mingkui Tan. 2020. Cascade Reasoning Network for Text-based Visual Question Answering. In ACMMM.
[4]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.
[5]
Qibin Hou, Daquan Zhou, and Jiashi Feng. 2021. Coordinate attention for efficient mobile network design. arXiv (2021).
[6]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In CVPR.
[7]
Soumil Kanwal, Vineet Mehta, and Abhinav Dhall. 2020. Large Scale Hierarchical Anomaly Detection and Temporal Localization. In ACMMM.
[8]
Hei Law and Jia Deng. 2018. Cornernet: Detecting objects as paired keypoints. In ECCV.
[9]
Guibiao Liao, Wei Gao, Qiuping Jiang, Ronggang Wang, and Ge Li. 2020. MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection. In ACMMM.
[10]
Mingbao Lin, Rongrong Ji, Yan Wang, Yichen Zhang, Baochang Zhang, Yonghong Tian, and Ling Shao. 2020 a. Hrank: Filter pruning using high-rank feature map. In CVPR.
[11]
Mingbao Lin, Rongrong Ji, Yuxin Zhang, Baochang Zhang, Yongjian Wu, and Yonghongg Tian. 2020 b. Channel Pruning via Automatic Structure Search. In IJCAI.
[12]
Fei Liu, Jing Liu, Richang Hong, and Hanqing Lu. 2019. Erasing-based attention learning for visual question answering. In ACMMM.
[13]
Jinjie Mai, Meng Yang, and Wenfeng Luo. 2020. Erasing integrated learning: A simple yet effective approach for weakly supervised object localization. In CVPR.
[14]
Liang Peng, Yang Yang, Zheng Wang, Xiao Wu, and Zi Huang. 2019. Cra-net: Composed relation attention network for visual question answering. In ACMMM.
[15]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. IJCV (2015).
[16]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV.
[17]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv (2014).
[18]
Krishna Kumar Singh and Yong Jae Lee. 2017. Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In ICCV.
[19]
Ming Sun, Yuchen Yuan, Feng Zhou, and Errui Ding. 2018. Multi-attention multi-class constraint for fine-grained image recognition. In ECCV.
[20]
Mingxing Tan, Ruoming Pang, and Quoc V Le. 2020. Efficientdet: Scalable and efficient object detection. In CVPR.
[21]
Eu Wern Teh, Mrigank Rochan, and Yang Wang. 2016. Attention Networks for Weakly Supervised Object Localization. In BMVC.
[22]
Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The caltech-ucsd birds-200--2011 dataset. (2011).
[23]
Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. 2017. Residual attention network for image classification. In CVPR.
[24]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In CVPR.
[25]
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. Cbam: Convolutional block attention module. In ECCV.
[26]
Chenfei Wu, Jinlai Liu, Xiaojie Wang, and Xuan Dong. 2018. Object-difference attention: A simple relational attention for visual question answering. In ACMMM.
[27]
Haolan Xue, Chang Liu, Fang Wan, Jianbin Jiao, Xiangyang Ji, and Qixiang Ye. 2019. Danet: Divergent activation for weakly supervised object localization. In ICCV.
[28]
Ke Yang, Dongsheng Li, and Yong Dou. 2019. Towards precise end-to-end weakly supervised object detection network. In ICCV.
[29]
Seunghan Yang, Yoonhyung Kim, Youngeun Kim, and Changick Kim. 2020. Combinational Class Activation Maps for Weakly Supervised Object Localization. In WACV.
[30]
Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. 2019. Cutmix: Regularization strategy to train strong classifiers with localizable features. In ICCV.
[31]
Beichen Zhang, Liang Li, Li Su, Shuhui Wang, Jincan Deng, Zheng-Jun Zha, and Qingming Huang. 2020 a. Structural Semantic Adversarial Active Learning for Image Captioning. In ACMMM.
[32]
Miao Zhang, Yu Zhang, Yongri Piao, Beiqi Hu, and Huchuan Lu. 2020 d. Feature reintegration over differential treatment: A top-down and adaptive fusion network for RGB-D salient object detection. In ACMMM.
[33]
Wenqiao Zhang, Xin Eric Wang, Siliang Tang, Haizhou Shi, Haochen Shi, Jun Xiao, Yueting Zhuang, and William Yang Wang. 2020 b. Relational Graph Learning for Grounded Video Description Generation. In ACMMM.
[34]
Xiaolin Zhang, Yunchao Wei, Jiashi Feng, Yi Yang, and Thomas S Huang. 2018a. Adversarial complementary learning for weakly supervised object localization. In CVPR.
[35]
Xiaolin Zhang, Yunchao Wei, Guoliang Kang, Yi Yang, and Thomas Huang. 2018b. Self-produced guidance for weakly-supervised object localization. In ECCV.
[36]
Xiaolin Zhang, Yunchao Wei, and Yi Yang. 2020 c. Inter-image communication for weakly supervised localization. arXiv (2020).
[37]
Yuanyi Zhong, Jianfeng Wang, Jian Peng, and Lei Zhang. 2020. Anchor box optimization for object detection. In WACV.
[38]
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In CVPR.
[39]
Peiqin Zhuang, Yali Wang, and Yu Qiao. 2020. Learning Attentive Pairwise Interaction for Fine-Grained Classification. In AAAI.

Cited By

View all
  • (2024)Boosting Weakly Supervised Object Localization and Segmentation With Domain AdaptionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.341103646:12(8680-8695)Online publication date: Dec-2024
  • (2024)Self-similarity guided probabilistic embedding matching based on transformer for occluded person re-identificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121504237:PBOnline publication date: 1-Feb-2024
  • (2023)FDCNet: Feature Drift Compensation Network for Class-Incremental Weakly Supervised Object LocalizationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612450(2045-2053)Online publication date: 26-Oct-2023
  • Show More Cited By

Index Terms

  1. E2Net: Excitative-Expansile Learning for Weakly Supervised Object Localization

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '21: Proceedings of the 29th ACM International Conference on Multimedia
    October 2021
    5796 pages
    ISBN:9781450386517
    DOI:10.1145/3474085
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. deep learning
    2. object localization
    3. weakly supervised learning

    Qualifiers

    • Research-article

    Funding Sources

    • the Fundamental Research Funds for the central universities
    • the National Science Fund for Distinguished Young Scholars
    • the National Natural Science Foundation of China
    • Guangdong Basic and Applied Basic Research Foundation

    Conference

    MM '21
    Sponsor:
    MM '21: ACM Multimedia Conference
    October 20 - 24, 2021
    Virtual Event, China

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Boosting Weakly Supervised Object Localization and Segmentation With Domain AdaptionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.341103646:12(8680-8695)Online publication date: Dec-2024
    • (2024)Self-similarity guided probabilistic embedding matching based on transformer for occluded person re-identificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121504237:PBOnline publication date: 1-Feb-2024
    • (2023)FDCNet: Feature Drift Compensation Network for Class-Incremental Weakly Supervised Object LocalizationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612450(2045-2053)Online publication date: 26-Oct-2023
    • (2023)Background-Aware Classification Activation Map for Weakly Supervised Object LocalizationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.330962145:12(14175-14191)Online publication date: 29-Aug-2023
    • (2023)Category-aware Allocation Transformer for Weakly Supervised Object Localization2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00611(6620-6629)Online publication date: 1-Oct-2023
    • (2022)Dynamic Prototype Mask for Occluded Person Re-IdentificationProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3547764(531-540)Online publication date: 10-Oct-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media