Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13680))

Included in the following conference series:

European Conference on Computer Vision

2699 Accesses
18 Citations

Abstract

We present a novel method for few-shot video classification, which performs appearance and temporal alignments. In particular, given a pair of query and support videos, we conduct appearance alignment via frame-level feature matching to achieve the appearance similarity score between the videos, while utilizing temporal order-preserving priors for obtaining the temporal similarity score between the videos. Moreover, we introduce a few-shot video classification framework that leverages the above appearance and temporal similarity scores across multiple steps, namely prototype-based training and testing as well as inductive and transductive prototype refinement. To the best of our knowledge, our work is the first to explore transductive few-shot video classification. Extensive experiments on both Kinetics and Something-Something V2 datasets show that both appearance and temporal alignments are crucial for datasets with temporal order sensitivity such as Something-Something V2. Our approach achieves similar or better results than previous methods on both datasets. Our code is available at https://github.com/VinAIResearch/fsvc-ata.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Is an Object-Centric Video Representation Beneficial for Transfer?

Generalized Many-Way Few-Shot Video Classification

Scenes-Objects-Actions: A Multi-task, Multi-label Video Dataset

References

Ben-Ari, R., Nacson, M.S., Azulai, O., Barzelay, U., Rotman, D.: TAEN: temporal aware embedding network for few-shot action recognition. In: CVPR (2021)
Google Scholar
Bo, Y., Lu, Y., He, W.: Few-shot learning of video action recognition only based on video contents. In: WACV (2020)
Google Scholar
Boudiaf, M., Ziko, I., Rony, J., Dolz, J., Piantanida, P., Ben Ayed, I.: Information maximization for few-shot learning. NeurIPS (2020)
Google Scholar
Cao, C., Li, Y., Lv, Q., Wang, P., Zhang, Y.: Few-shot action recognition with implicit temporal alignment and pair similarity optimization. In: CVIU (2021)
Google Scholar
Cao, K., Ji, J., Cao, Z., Chang, C.Y., Niebles, J.C.: Few-shot video classification via temporal alignment. In: CVPR (2020)
Google Scholar
Carreira, J., Zisserman, A.: Quo Vadis, action recognition? A new model and the kinetics dataset. In: CVPR (2017)
Google Scholar
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: NeurIPS (2013)
Google Scholar
Cuturi, M., Blondel, M.: Soft-DTW: a differentiable loss function for time-series. In: ICML (2017)
Google Scholar
Das, R., Wang, Y.X., Moura, J.M.: On the importance of distractors for few-shot classification. In: ICCV (2021)
Google Scholar
Fei, N., Gao, Y., Lu, Z., Xiang, T.: Z-score normalization, hubness, and few-shot learning. In: ICCV (2021)
Google Scholar
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400 (2017)
Fu, Y., Zhang, L., Wang, J., Fu, Y., Jiang, Y.G.: Depth guided adaptive meta-fusion network for few-shot video recognition. In: ACM MM (2020)
Google Scholar
Geetha, P., Narayanan, V.: A survey of content-based video retrieval (2008)
Google Scholar
Ghaffari, S., Saleh, E., Forsyth, D., Wang, Y.X.: On the importance of firth bias reduction in few-shot classification. In: ICLR (2022)
Google Scholar
Goyal, R., et al.: The “something something” video database for learning and evaluating visual common sense. In: ICCV (2017)
Google Scholar
Haresh, S., et al.: Learning by aligning videos in time. In: CVPR (2021)
Google Scholar
Haresh, S., Kumar, S., Zia, M.Z., Tran, Q.H.: Towards anomaly detection in dashcam videos. In: 2020 IEEE Intelligent Vehicles Symposium (IV), pp. 1407–1414. IEEE (2020)
Google Scholar
Hou, R., Chang, H., Ma, B., Shan, S., Chen, X.: Cross attention network for few-shot classification. arXiv preprint arXiv:1910.07677 (2019)
Kang, D., Kwon, H., Min, J., Cho, M.: Relational embedding for few-shot classification. In: ICCV (2021)
Google Scholar
Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Khan, H., et al.: Timestamp-supervised action segmentation with graph convolutional networks. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2022)
Google Scholar
Konin, A., Syed, S.N., Siddiqui, S., Kumar, S., Tran, Q.H., Zia, M.Z.: Retroactivity: rapidly deployable live task guidance experiences. In: IEEE International Symposium on Mixed and Augmented Reality Demonstration (2020)
Google Scholar
Kumar, S., Haresh, S., Ahmed, A., Konin, A., Zia, M.Z., Tran, Q.H.: Unsupervised activity segmentation by joint representation learning and online clustering. In: CVPR (2022)
Google Scholar
Le, D., Nguyen, K.D., Nguyen, K., Tran, Q.H., Nguyen, R., Hua, B.S.: POODLE: improving few-shot learning via penalizing out-of-distribution samples. In: NeurIPS (2021)
Google Scholar
Li, Z., Zhou, F., Chen, F., Li, H.: Meta-SGD: learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835 (2017)
Lichtenstein, M., Sattigeri, P., Feris, R., Giryes, R., Karlinsky, L.: TAFSSL: task-adaptive feature sub-space learning for few-shot classification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 522–539. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_31
Chapter Google Scholar
Liu, W., Tekin, B., Coskun, H., Vineet, V., Fua, P., Pollefeys, M.: Learning to align sequential actions in the wild. arXiv preprint arXiv:2111.09301 (2021)
Lu, S., Ye, H.J., Zhan, D.C.: Few-shot action recognition with compromised metric via optimal transport. arXiv preprint arXiv:2104.03737 (2021)
Ma, C., Huang, Z., Gao, M., Xu, J.: Few-shot learning via Dirichlet tessellation ensemble. In: ICLR (2022)
Google Scholar
Munkhdalai, T., Sordoni, A., Wang, T., Trischler, A.: Metalearned neural memory. In: NeurIPS (2019)
Google Scholar
Oreshkin, B., Rodríguez López, P., Lacoste, A.: TADAM: task dependent adaptive metric for improved few-shot learning. In: NeurIPS (2018)
Google Scholar
Qiao, S., Liu, C., Shen, W., Yuille, A.L.: Few-shot image recognition by predicting parameters from activations. In: CVPR (2018)
Google Scholar
Ren, M., et al.: Meta-learning for semi-supervised few-shot classification. arXiv preprint arXiv:1803.00676 (2018)
Rodriguez, M., Sivic, J., Laptev, I., Audibert, J.Y.: Data-driven crowd analysis in videos. In: 2011 International Conference on Computer Vision, pp. 1235–1242. IEEE (2011)
Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Rusu, A.A., et al.: Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960 (2018)
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: ICML (2016)
Google Scholar
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: NeurIPS (2017)
Google Scholar
Snoek, C.G., Worring, M.: Concept-Based Video Retrieval. Now Publishers Inc. (2009)
Google Scholar
Su, B., Hua, G.: Order-preserving wasserstein distance for sequence matching. In: CVPR (2017)
Google Scholar
Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6479–6488 (2018)
Google Scholar
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: CVPR (2018)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV (2015)
Google Scholar
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: CVPR (2018)
Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: NeurIPS (2016)
Google Scholar
Wang, R., Pontil, M., Ciliberto, C.: The role of global labels in few-shot classification and how to infer them. In: NeurIPS (2021)
Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR (2018)
Google Scholar
Wertheimer, D., Tang, L., Hariharan, B.: Few-shot classification with feature map reconstruction networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8012–8021 (2021)
Google Scholar
Wu, J., Zhang, T., Zhang, Y., Wu, F.: Task-aware part mining network for few-shot learning. In: ICCV (2021)
Google Scholar
Yang, S., Liu, L., Xu, M.: Free lunch for few-shot learning: Distribution calibration. arXiv preprint arXiv:2101.06395 (2021)
Zhan, B., Monekosso, D.N., Remagnino, P., Velastin, S.A., Xu, L.Q.: Crowd analysis: a survey. Mach. Vis. Appl. 19(5), 345–357 (2008)
Article Google Scholar
Zhang, C., Cai, Y., Lin, G., Shen, C.: DeepEMD: differentiable earth mover’s distance for few-shot learning. arXiv preprint arXiv:2003.06777 (2020)
Zhang, H., Zhang, L., Qi, X., Li, H., Torr, P.H.S., Koniusz, P.: Few-shot action recognition with permutation-invariant attention. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 525–542. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_31
Chapter Google Scholar
Zhang, S., Zhou, J., He, X.: Learning implicit temporal alignment for few-shot video classification. arXiv preprint arXiv:2105.04823 (2021)
Zhang, X., Meng, D., Gouk, H., Hospedales, T.M.: Shallow bayesian meta learning for real-world few-shot recognition. In: ICCV (2021)
Google Scholar
Zhu, L., Yang, Y.: Compound memory networks for few-shot video classification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 782–797. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_46
Chapter Google Scholar
Zhu, Z., Wang, L., Guo, S., Wu, G.: A closer look at few-shot video classification: a new baseline and benchmark. arXiv preprint arXiv:2110.12358 (2021)
Ziko, I., Dolz, J., Granger, E., Ayed, I.B.: Laplacian regularized few-shot learning. In: ICML (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

VinAI Research, Hanoi, Vietnam
Khoi D. Nguyen, Khoi Nguyen, Binh-Son Hua & Rang Nguyen
Retrocausal, Inc., Redmond, USA
Quoc-Huy Tran

Authors

Khoi D. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Quoc-Huy Tran
View author publications
You can also search for this author in PubMed Google Scholar
Khoi Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Binh-Son Hua
View author publications
You can also search for this author in PubMed Google Scholar
Rang Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2491 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, K.D., Tran, QH., Nguyen, K., Hua, BS., Nguyen, R. (2022). Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13680. Springer, Cham. https://doi.org/10.1007/978-3-031-20044-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-20044-1_27
Published: 20 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20043-4
Online ISBN: 978-3-031-20044-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Is an Object-Centric Video Representation Beneficial for Transfer?

Generalized Many-Way Few-Shot Video Classification

Scenes-Objects-Actions: A Multi-task, Multi-label Video Dataset

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2491 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Is an Object-Centric Video Representation Beneficial for Transfer?

Generalized Many-Way Few-Shot Video Classification

Scenes-Objects-Actions: A Multi-task, Multi-label Video Dataset

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2491 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation