More Web Proxy on the site http://driver.im/

Article

Spatial transformer networks

Authors:

Karen Simonyan,

Andrew Zisserman,

Koray KavukcuogluAuthors Info & Claims

NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2

Pages 2017 - 2025

Published: 07 December 2015 Publication History

Abstract

Convolutional Neural Networks define an exceptionally powerful class of models, but are still limited by the lack of ability to be spatially invariant to the input data in a computationally and parameter efficient manner. In this work we introduce a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network. This differentiable module can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimisation process. We show that the use of spatial transformers results in models which learn invariance to translation, scale, rotation and more generic warping, resulting in state-of-the-art performance on several benchmarks, and for a number of classes of transformations.

References

[1]

J. Ba, V. Mnih, and K. Kavukcuoglu. Multiple object recognition with visual attention. ICLR, 2015.

[2]

S. Branson, G. Van Horn, S. Belongie, and P. Perona. Bird species categorization using pose normalized deep convolutional nets. BMVC., 2014.

[3]

J. Bruna and S. Mallat. Invariant scattering convolution networks. IEEE PAMI, 35(8):1872-1886, 2013.

[4]

M. Cimpoi, S. Maji, and A. Vedaldi. Deep filter banks for texture recognition and segmentation. In CVPR, 2015.

[5]

T. S. Cohen and M. Welling. Transformation properties of learned visual representations. ICLR, 2015.

[6]

D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. Scalable object detection using deep neural networks. In CVPR, 2014.

[7]

B. J. Frey and N. Jojic. Fast, large-scale transformation-invariant clustering. In NIPS, 2001.

[8]

R. Gens and P. M. Domingos. Deep symmetry networks. In NIPS, 2014.

[9]

R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.

[10]

I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet. Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv:1312.6082, 2013.

[11]

K. Gregor, I. Danihelka, A. Graves, and D. Wierstra. Draw: A recurrent neural network for image generation. ICML, 2015.

[12]

G. E. Hinton. A parallel computation that assigns canonical object-based frames of reference. In IJCAI, 1981.

[13]

G. E. Hinton, A. Krizhevsky, and S. D. Wang. Transforming auto-encoders. In ICANN. 2011.

[14]

G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, 2012.

[15]

S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML, 2015.

[16]

M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. Synthetic data and artificial neural networks for natural scene text recognition. NIPS DLW, 2014.

[17]

A. Kanazawa, A. Sharma, and D. Jacobs. Locally scale-invariant convolutional neural networks. In NIPS, 2014.

[18]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.

[19]

K. Lenc and A. Vedaldi. Understanding image representations by measuring their equivariance and equivalence. CVPR, 2015.

[20]

T. Lin, A. RoyChowdhury, and S. Maji. Bilinear CNN models for fine-grained visual recognition. arXiv:1504.07889, 2015.

[21]

Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng. Reading digits in natural images with unsupervised feature learning. In NIPS DLW, 2011.

[22]

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. arXiv:1409.0575, 2014.

[23]

P. Sermanet, A. Frome, and E. Real. Attention for fine-grained categorization. arXiv:1412.7054, 2014.

[24]

M. Simon and E. Rodner. Neural activation constellations: Unsupervised part model discovery with convolutional networks. arXiv:1504.08289, 2015.

[25]

K. Sohn and H. Lee. Learning invariant representations with local transformations. arXiv: 1206.6418, 2012.

[26]

M. F. Stollenga, J. Masci, F. Gomez, and J. Schmidhuber. Deep networks with internal selective attention through feedback connections. In NIPS, 2014.

[27]

T. Tieleman. Optimizing Neural Networks that Generate Images. PhD thesis, University of Toronto, 2014.

[28]

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The Caltech-UCSD Birds-200-2011 dataset. 2011.

[29]

K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. ICML, 2015.

[30]

X. Zhang, J. Zou, X. Ming, K. He, and J. Sun. Efficient and accurate approximations of nonlinear convolutional networks. arXiv:1411.4229, 2014.

Cited By

Wang LMa YZhang WZhao XZhao X(2024)Learning on sample-efficient and label-efficient multi-view cardiac data with graph transformerPattern Recognition Letters10.1016/j.patrec.2024.03.001180:C(127-133)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1016/j.patrec.2024.03.001
Niu CLi KWang DZhu WXu HDong J(2024)GR-GANPattern Recognition10.1016/j.patcog.2024.110815156:COnline publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1016/j.patcog.2024.110815
Wang ZYu YLi X(2024)Rethinking local-to-global representation learning for rotation-invariant point cloud analysisPattern Recognition10.1016/j.patcog.2024.110624154:COnline publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1016/j.patcog.2024.110624
Show More Cited By

Spatial transformer networks
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Performance evaluation of artificial neural networks for spatial data analysis

the artificial neural network training algorithm is implemented in MATLAB language. This implementation is focused on the network parameters in order to get the optimal architecture of the network that means (the optimal neural network is the network ...
Spatial interpolation using MLP–RBFN hybrid networks

It is easy for a multi-layered perception MLP to fit a stratified spatial interpolation pattern whose form is close to open surface; while it is easy for a radial basis function network RBFN to fit a pocket radial spatial interpolation pattern whose ...
Reactive Load Control of Parallel Transformer Operations Using Neural Networks
IEA/AIE '02: Proceedings of the 15th international conference on Industrial and engineering applications of artificial intelligence and expert systems: developments in applied artificial intelligence

Artificial Neural Network (ANN) is used in various fields including control and analysis of power systems. ANN in its learning process establishes the relationship between input variables by means of its weights updating, and provides a good response to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2

December 2015

3626 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 07 December 2015

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

291
Total Citations
View Citations
2
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang LMa YZhang WZhao XZhao X(2024)Learning on sample-efficient and label-efficient multi-view cardiac data with graph transformerPattern Recognition Letters10.1016/j.patrec.2024.03.001180:C(127-133)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1016/j.patrec.2024.03.001
Niu CLi KWang DZhu WXu HDong J(2024)GR-GANPattern Recognition10.1016/j.patcog.2024.110815156:COnline publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1016/j.patcog.2024.110815
Wang ZYu YLi X(2024)Rethinking local-to-global representation learning for rotation-invariant point cloud analysisPattern Recognition10.1016/j.patcog.2024.110624154:COnline publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1016/j.patcog.2024.110624
Ding YBu JQin ZYou LCao MQin ZPang M(2024)C2FResMorphPattern Recognition10.1016/j.patcog.2024.110615154:COnline publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1016/j.patcog.2024.110615
Hou YGould SZheng L(2024)View-coherent correlation consistency for semi-supervised semantic segmentationPattern Recognition10.1016/j.patcog.2023.110089147:COnline publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1016/j.patcog.2023.110089
Qin XLi FHe CPei RZhang X(2024)Improving visual grounding with multi-modal interaction and auto-regressive vertex generationNeurocomputing10.1016/j.neucom.2024.128227598:COnline publication date: 14-Sep-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.128227
Huang KYang HJiang YYin Z(2024)SDNetNeurocomputing10.1016/j.neucom.2024.128160601:COnline publication date: 7-Oct-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.128160
Ge YChen ZYu MYue QYou RZhu L(2024)MambaTSRNeurocomputing10.1016/j.neucom.2024.128104599:COnline publication date: 28-Sep-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.128104
Tabejamaat MNegin FBremond F(2024)Improving texture integrity through second-order constraints on warping mapsNeurocomputing10.1016/j.neucom.2024.127739591:COnline publication date: 28-Jul-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.127739
Kommanduri RGhorai M(2024)DAST-NetNeurocomputing10.1016/j.neucom.2024.127444579:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.127444
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents