More Web Proxy on the site http://driver.im/

Article

Free access

Sequence-to-segments networks for segment detection

Authors:

Jianming Zhang,

Radomír Měch,

Dimitris SamarasAuthors Info & Claims

NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

Pages 3511 - 3520

Published: 03 December 2018 Publication History

PDF eReader Publisher Site

Abstract

Detecting segments of interest from an input sequence is a challenging problem which often requires not only good knowledge of individual target segments, but also contextual understanding of the entire input sequence and the relationships between the target segments. To address this problem, we propose the Sequence-to-Segments Network (S²N), a novel end-to-end sequential encoder-decoder architecture. S²N first encodes the input into a sequence of hidden states that progressively capture both local and holistic information. It then employs a novel decoding architecture, called Segment Detection Unit (SDU), that integrates the decoder state and encoder hidden states to detect segments sequentially. During training, we formulate the assignment of predicted segments to ground truth as the bipartite matching problem and use the Earth Mover's Distance to calculate the localization errors. Experiments on temporal action proposal and video summarization show that S²N achieves state-of-the-art performance on both tasks.

References

[1]

D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning and Representation, 2014.

[2]

S. Buch, V. Escorcia, B. Ghanem, L. Fei-Fei, and J. Niebles. End-to-end, single-stream temporal action detection in untrimmed videos. In Proceedings of the British Machine Vision Conference, 2017.

[3]

S. Buch, V. Escorcia, C. Shen, B. Ghanem, and J. C. Niebles. Sst: Single-stream temporal action proposals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.

[4]

F. Caba Heilbron, J. Carlos Niebles, and B. Ghanem. Fast temporal activity proposals for efficient detection of human actions in untrimmed videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.

[5]

K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, 2014.

[6]

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555, 2014.

[7]

J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.

[8]

V. Escorcia, F. C. Heilbron, J. C. Niebles, and B. Ghanem. Daps: Deep action proposals for action understanding. In Proceedings of the European Conference on Computer Vision, 2016.

[9]

J. Gao, Z. Yang, K. Chen, C. Sun, and R. Nevatia. Turn tap: Temporal unit regression network for temporal action proposals. In Proceedings of the International Conference on Computer Vision, 2017.

[10]

J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin. Convolutional sequence to sequence learning. Proceedings of the International Conference on Machine Learning, 2017.

Digital Library

[11]

B. Gong, W.-L. Chao, K. Grauman, and F. Sha. Diverse sequential subset selection for supervised video summarization. In Advances in Neural Information Processing Systems, 2014.

Digital Library

[12]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, 2014.

[13]

A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the International Conference on Machine Learning, 2006.

Digital Library

[14]

A. Graves, G. Wayne, and I. Danihelka. Neural turing machines. arXiv:1410.5401, 2014.

[15]

M. Gygli, H. Grabner, H. Riemenschneider, and L. Van Gool. Creating summaries from user videos. In Proceedings of the European Conference on Computer Vision, 2014.

[16]

M. Gygli, H. Grabner, and L. Van Gool. Video summarization by learning submodular mixtures of objectives. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.

[17]

M. Hoai and F. De la Torre. Max-margin early event detectors. International Journal of Computer Vision, 107(2):191-202, 2014.

Digital Library

[18]

M. Hoai, Z.-Z. Lan, and F. De la Torre. Joint segmentation and classification of human actions in video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011.

Digital Library

[19]

M. Hoai, L. Torresani, F. De la Torre, and C. Rother. Learning discriminative localization from weakly labeled data. Pattern Recognition, 47(3):1523-1534, 2014.

Digital Library

[20]

L. Hou, C.-P. Yu, and D. Samaras. Squared earth mover's distance-based loss for training deep neural networks. arXiv:1611.05916, 2016.

[21]

Y.-G. Jiang, J. Liu, A. Roshan Zamir, G. Toderici, I. Laptev, M. Shah, and R. Sukthankar. THUMOS challenge: Action recognition with a large number of classes. http://crcv.ucf.edu/THUMOS14/, 2014.

[22]

R. Jozefowicz, W. Zaremba, and I. Sutskever. An empirical exploration of recurrent network architectures. In Proceedings of the International Conference on Machine Learning, 2015.

Digital Library

[23]

A. Karpathy, J. Johnson, and L. Fei-Fei. Visualizing and understanding recurrent networks. In Proceedings of the International Conference on Learning and Representation, 2016.

[24]

D. R. Kelley, Y. A. Reshef, D. Belanger, C. McLean, J. Snoek, and M. Bileschi. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome research, 2018.

[25]

D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning and Representation, 2015.

[26]

S. Li, W. Li, C. Cook, C. Zhu, and Y. Gao. Independently recurrent neural network (indrnn): Building a longer and deeper rnn. arXiv:1803.04831, 2018.

[27]

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, 2016.

[28]

D. G. Luenberger. Introduction to linear and nonlinear programming. Addison-Wesley publishing company, 1973.

Digital Library

[29]

S. Ma, L. Sigal, and S. Sclaroff. Learning activity progression in lstms for activity detection and early detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.

[30]

B. Mahasseni, M. Lam, and S. Todorovic. Unsupervised video summarization with adversarial lstm networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.

[31]

M. H. Nguyen, L. Torresani, F. De la Torre, and C. Rother. Weakly supervised discriminative localization and classification: a joint learning process. In Proceedings of the International Conference on Computer Vision, 2009.

[32]

N. Peng and M. Dredze. Named entity recognition for chinese social media with jointly trained embeddings. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, 2015.

[33]

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.

[34]

D. Rumelhart, G. Hinton, and R. Williams. Learning internal representations by error propagation. In Parallel Distributed Processing, volume 1, chapter 8, pages 318-362. MIT Press, Cambridge, MA, 1986.

[35]

S. Shalev-Shwartz and A. Tewari. Stochastic methods for l1-regularized loss minimization. Journal of Machine Learning Research, 12(Jun):1865-1892, 2011.

Digital Library

[36]

Z. Shou, D. Wang, and S.-F. Chang. Temporal action localization in untrimmed videos via multi-stage cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.

[37]

Z. Shou, J. Chan, A. Zareian, K. Miyazawa, and S.-F. Chang. Cdc: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.

[38]

N. Srivastava, E. Mansimov, and R. Salakhudinov. Unsupervised learning of video representations using lstms. In Proceedings of the International Conference on Machine Learning, 2015.

Digital Library

[39]

R. Stewart, M. Andriluka, and A. Y. Ng. End-to-end people detection in crowded scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.

[40]

I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, 2014.

Digital Library

[41]

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the International Conference on Computer Vision, 2015.

Digital Library

[42]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, 2017.

Digital Library

[43]

O. Vinyals, M. Fortunato, and N. Jaitly. Pointer networks. In Advances in Neural Information Processing Systems, 2015.

Digital Library

[44]

O. Vinyals, L. Kaiser, T. Koo, S. Petrov, I. Sutskever, and G. Hinton. Grammar as a foreign language. In Advances in Neural Information Processing Systems, 2015.

Digital Library

[45]

O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.

[46]

G. Yu and J. Yuan. Fast action proposals for human action detection and search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.

[47]

K. Zhang, W.-L. Chao, F. Sha, and K. Grauman. Video summarization with long short-term memory. In Proceedings of the European Conference on Computer Vision, 2016.

[48]

K. Zhou and Y. Qiao. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In Proceedings of the AAAI Conference on Artificial Intelligence, 2017.

Cited By

Deldari SSmith DXue HSalim F(2021)Time Series Change Point Detection with Self-Supervised Contrastive Predictive CodingProceedings of the Web Conference 202110.1145/3442381.3449903(3124-3135)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3449903

Sequence-to-segments networks for segment detection
1. Computing methodologies

Recommendations

On Straight Line Segment Detection

In this paper we propose a comprehensive method for detecting straight line segments in any digital image, accurately controlling both false positive and false negative detections. Based on Helmholtz principle, the proposed method is parameterless. At ...
Collinear Segment Detection Using HT Neighborhoods

In this paper, geometrical analysis is used to extract novel straight line segment features from the wings around the peaks of the Hough Transform (HT). Based on these features, a practical segment detection method is proposed which has the ability to ...
Achieving Counterfactual Explanation for Sequence Anomaly Detection
Machine Learning and Knowledge Discovery in Databases. Research Track and Demo Track
Abstract
Anomaly detection on discrete sequential data has been investigated for a long time because of its potential in various applications, such as detecting novel attacks or abnormal system behaviors from log messages. Although many approaches can ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

December 2018

11021 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 03 December 2018

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
93
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)5

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Deldari SSmith DXue HSalim F(2021)Time Series Change Point Detection with Self-Supervised Contrastive Predictive CodingProceedings of the Web Conference 202110.1145/3442381.3449903(3124-3135)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3449903

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents