Abstract
We propose a guided dropout regularizer for deep networks based on the evidence of a network prediction defined as the firing of neurons in specific paths. In this work, we utilize the evidence at each neuron to determine the probability of dropout, rather than dropping out neurons uniformly at random as in standard dropout. In essence, we dropout with higher probability those neurons which contribute more to decision making at training time. This approach penalizes high saliency neurons that are most relevant for model prediction, i.e. those having stronger evidence. By dropping such high-saliency neurons, the network is forced to learn alternative paths in order to maintain loss minimization, resulting in a plasticity-like behavior, a characteristic of human brains too. We demonstrate better generalization ability, an increased utilization of network neurons, and a higher resilience to network compression using several metrics over four image/video recognition benchmarks.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Achille, A., & Soatto, S. (2018). Information dropout: Learning optimal representations through noisy computation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 40(12), 2897–2905.
Ba, J., & Frey, B. (2013). Adaptive dropout for training deep neural networks. Advances in Neural Information Processing Systems (NIPS), 26, 3084–3092.
Baldi, P., & Sadowski, P. J. (2013). Understanding dropout. Advances in Neural Information Processing Systems (NIPS), 26, 2814–2822.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR).
Gal, Y., Hron, J., & Kendall, A. (2017). Concrete dropout. In Advances in neural information processing systems (NIPS).
Ghiasi, G., Lin, T. Y., & Le, Q. V. (2018). Dropblock: A regularization method for convolutional networks. Advances in Neural Information Processing Systems (NIPS), 31, 10727–10737.
Gomez, A. N., Zhang, I., Swersky, K., Gal, Y., & Hinton, G. E. (2018). Targeted dropout. In: NIPS Compact deep neural network representation with industrial applications workshop.
Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Technical report 7694, California Institute of Technology. http://authors.library.caltech.edu/7694.
Hebb, D. O. (2005). The organization of behavior: A neuropsychological theory. Routledge: Psychology Press.
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. In NIPS deep learning and representation learning workshop.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.
Kang, G., Li, J., & Tao, D. (2017). Shakeout: A new approach to regularized deep neural network training. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 40(5), 1245–58.
Kingma, D. P., Salimans, T., & Welling, M. (2015). Variational dropout and the local reparameterization trick. Advances in Neural Information Processing Systems (NIPS), 28, 2575–2583.
Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple layers of features from tiny images. Citeseer.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS).
Li, Z., Gong, B., & Yang, T. (2016). Improved dropout for shallow and deep learning. In Advances in neural information processing systems (NIPS).
Ma, S., Bargal, S. A., Zhang, J., Sigal, L., & Sclaroff, S. (2017). Do less and achieve more: Training CNNs for action recognition utilizing action images from the web. Pattern Recognition, 68, 334–345.
Miconi, T., Clune, J., & Stanley, K. O. (2018). Differentiable plasticity: Training plastic neural networks with backpropagation. arXiv preprint arXiv:1804.02464.
Mittal, D., Bhardwaj, S., Khapra, M. M., & Ravindran, B. (2018). Recovering from random pruning: On the plasticity of deep convolutional neural networks. In Winter conference on applications of computer vision.
Morerio, P., Cavazza, J., Volpi, R., Vidal, R., & Murino, V. (2017). Curriculum dropout. In Proceedings of IEEE international conference on computer vision (ICCV).
Rennie, S. J., Goel, V., & Thomas, S. (2014). Annealed dropout training of deep networks. In Spoken language technology workshop (SLT), IEEE, pp. 159–164).
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of IEEE international conference on computer vision (ICCV).
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Song, S., Miller, K. D., & Abbott, L. F. (2000). Competitive hebbian learning through spike-timing-dependent synaptic plasticity. Nature Neuroscience, 3(9), 919.
Soomro, K., Zamir, A. R., & Shah, M. (2012). UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research (JMLR), 15(1), 1929–1958.
Wager, S., Wang, S., & Liang, P. S. (2013). Dropout training as adaptive regularization. Advances in Neural Information Processing Systems (NIPS), 26, 351–359.
Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., & Fergus, R. (2013). Regularization of neural networks using dropconnect. In Proceedings of international conference on machine learning (ICML).
Wang, S., & Manning, C. (2013). Fast dropout training. In Proceedings of international conference on machine learning (ICML), pp. 118–126.
Wu, H., & Gu, X. (2015). Towards dropout training for convolutional neural networks. Neural Networks, 71, 1–10.
Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. In Proceedings of British machine vision conference (BMVC).
Zhang, J., Lin, Z., Brandt, J., Shen, X., & Sclaroff, S. (2016). Top-down neural attention by excitation backprop. In Proceedings of European conference on computer vision (ECCV).
Zhang, J., Bargal, S. A., Lin, Z., Brandt, J., Shen, X., & Sclaroff, S. (2017). Top-down neural attention by excitation backprop. International Journal of Computer Vision (IJCV), 126, 1–19.
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR).
Acknowledgements
This work was supported in part by the Defense Advanced Research Projects Agency (DARPA) Explainable Artificial Intelligence (XAI) program, an IBM PhD Fellowship, a Hariri Graduate Fellowship, and gifts from Adobe and NVidia. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA, DOI/IBC, or the U.S. Government.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Nikos KOMODAKIS.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was done when A. Zunino was in PAVIS, at Istituto Italiano di Tecnologia.
Rights and permissions
About this article
Cite this article
Zunino, A., Bargal, S.A., Morerio, P. et al. Excitation Dropout: Encouraging Plasticity in Deep Neural Networks. Int J Comput Vis 129, 1139–1152 (2021). https://doi.org/10.1007/s11263-020-01422-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-020-01422-y