[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition

Published: 16 May 2024 Publication History

Abstract

The goal of sign language recognition (SLR) is to help those who are hard of hearing or deaf overcome the communication barrier. Most existing approaches can be typically divided into two lines, i.e., Skeleton-based, and RGB-based methods, but both lines of methods have their limitations. Skeleton-based methods do not consider facial expressions, while RGB-based approaches usually ignore the fine-grained hand structure. To overcome both limitations, we propose a new framework called the Spatial-temporal Part-aware network (StepNet), based on RGB parts. As its name suggests, it is made up of two modules: Part-level Spatial Modeling and Part-level Temporal Modeling. Part-level Spatial Modeling, in particular, automatically captures the appearance-based properties, such as hands and faces, in the feature space without the use of any keypoint-level annotations. On the other hand, Part-level Temporal Modeling implicitly mines the long short-term context to capture the relevant attributes over time. Extensive experiments demonstrate that our StepNet, thanks to spatial-temporal modules, achieves competitive Top-1 Per-instance accuracy on three commonly used SLR benchmarks, i.e., 56.89% on WLASL, 77.2% on NMFs-CSL, and 77.1% on BOBSL. Additionally, the proposed method is compatible with the optical flow input and can produce superior performance if fused. For those who are hard of hearing, we hope that our work can act as a preliminary step.

References

[1]
Samuel Albanie, Gül Varol, Liliane Momeni, Triantafyllos Afouras, Joon Son Chung, Neil Fox, and Andrew Zisserman. 2020. BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues. In European Conference on Computer Vision (ECCV’20). 35–53.
[2]
Samuel Albanie, Gül Varol, Liliane Momeni, Hannah Bull, Triantafyllos Afouras, Himel Chowdhury, Neil Fox, Bencie Woll, Rob Cooper, Andrew McParland, and Andrew Zisserman. 2021. BBC-Oxford British sign language dataset. arXiv:2111.03635 (2021).
[3]
Matyáš Boháček and Marek Hrúz. 2022. Sign pose-based transformer for word-level sign language recognition. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV’22). 182–191.
[4]
Patrick Buehler, Andrew Zisserman, and Mark Everingham. 2009. Learning sign language by watching TV (using weakly aligned subtitles). In IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR’09). 2961–2968.
[5]
Necati Cihan Camgoz, Oscar Koller, Simon Hadfield, and Richard Bowden. 2020. Sign language transformers: Joint end-to-end sign language recognition and translation. In IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR’20). 10023–10033.
[6]
Congqi Cao, Cuiling Lan, Yifan Zhang, Wenjun Zeng, Hanqing Lu, and Yanning Zhang. 2019. Skeleton-based action recognition with gated convolutional neural networks. IEEE Transactions on Circuits and Systems for Video Technology 29, 11 (2019), 3247–3257. DOI:
[7]
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR’17). 6299–6308.
[8]
Wikipedia Contributors. 2004. Sign language. Wikipedia The Free Encyclopedia. Retrieved July 22, 2004 from https://en.wikipedia.org/wiki/Sign_language
[9]
Helen Cooper, Eng-Jon Ong, Nicolas Pugeault, and Richard Bowden. 2012. Sign language recognition using sub-units. Journal of Machine Learning Research 13, (July 2012), JMLR, 2205–2231.
[10]
Runpeng Cui, Hu Liu, and Changshui Zhang. 2019. A deep neural framework for continuous sign language recognition by iterative training. IEEE Transactions on Multimedia 21, 7 (2019), 1880–1891. DOI:
[11]
Yuhang Ding, Xin Yu, and Yi Yang. 2021. RFNet: Region-aware fusion network for incomplete multi-modal brain tumor segmentation. In International Conference on Computer Vision (ICCV’21). 3975–3984.
[12]
Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for Skeleton based action recognition. In IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR’15). 1110–1118.
[13]
Jiali Duan, Jun Wan, Shuai Zhou, Xiaoyuan Guo, and Stan Z. Li. 2018. A unified framework for multi-modal isolated gesture recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14, 1s, Article 21 (Feb.2018), 16 pages. DOI:
[14]
Hehe Fan, Xin Yu, Yuhang Ding, Yi Yang, and Mohan Kankanhalli. 2022. Pstnet: Point spatio-temporal convolution on point cloud sequences. In International Conference on Learning Representations (ICLR’22).
[15]
Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. Slowfast networks for video recognition. In International Conference on Computer Vision (ICCV’19). 6202–6211.
[16]
Harshala Gammulle, Simon Denman, Sridha Sridharan, and Clinton Fookes. 2021. TMMF: Temporal multi-modal fusion for single-stage continuous gesture recognition. IEEE Transactions on Image Processing 30 (2021), 7689–7701. DOI:
[17]
Zan Gao, Leming Guo, Tongwei Ren, An-An Liu, Zhi-Yong Cheng, and Shengyong Chen. 2022. Pairwise two-stream ConvNets for cross-domain action recognition with small data. IEEE Transactions on Neural Networks and Learning Systems 33, 3 (2022), 1147–1161. DOI:
[18]
Kirsti Grobel and Marcell Assan. 1997. Isolated sign language recognition using hidden Markov models. In 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation, Vol. 1. IEEE, 162–167.
[19]
Qingji Guan, Yaping Huang, Zhun Zhong, Zhedong Zheng, Liang Zheng, and Yi Yang. 2020. Thorax disease classification with attention guided convolutional neural network. Pattern Recognition Letters 131 (2020), 38–45.
[20]
Dan Guo, Wengang Zhou, Anyang Li, Houqiang Li, and Meng Wang. 2019. Hierarchical recurrent deep fusion using adaptive clip summarization for sign language translation. IEEE Transactions on Image Processing 29 (2019), 1575–1590.
[21]
Dan Guo, Wengang Zhou, Houqiang Li, and Meng Wang. 2017. Online early-late fusion based on adaptive HMM for sign language recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14, 1 (2017), 1–18.
[22]
Al Amin Hosain, Panneer Selvam Santhalingam, Parth Pathak, Huzefa Rangwala, and Jana Kosecka. 2021. Hand pose guided 3d pooling for word-level sign language recognition. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV’21). 3429–3439.
[23]
Hezhen Hu, Weichao Zhao, Wengang Zhou, and Houqiang Li. 2023. SignBERT+: Hand-model-aware self-supervised pre-training for sign language understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 9 (Sept.2023), 11221–11239. DOI:
[24]
Hezhen Hu, Weichao Zhao, Wengang Zhou, Yuechen Wang, and Houqiang Li. 2021. SignBERT: Pre-training of hand-model-aware representation for sign language recognition. In International Conference on Computer Vision (ICCV’21). 11087–11096.
[25]
Hezhen Hu, Wengang Zhou, and Houqiang Li. 2021. Hand-model-aware sign language recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 1558–1566.
[26]
Hezhen Hu, Wengang Zhou, Junfu Pu, and Houqiang Li. 2021. Global-local enhancement network for NMF-aware sign language recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17, 3 (2021), 1–19.
[27]
Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li, and Yun Fu. 2021. Sign language recognition via Skeleton-aware multi-model ensemble. arXiv:2110.06161 (2021).
[28]
Oscar Koller, Jens Forster, and Hermann Ney. 2015. Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers. Computer Vision and Image Understanding 141 (2015), 108–125.
[29]
Oscar Koller, O. Zargaran, Hermann Ney, and Richard Bowden. 2016. Deep sign: Hybrid CNN-HMM for continuous sign language recognition. In Proceedings of the British Machine Vision Conference (BMVC’16). Article 136, 136.1–136.12.
[30]
Oscar Koller, Sepehr Zargaran, Hermann Ney, and Richard Bowden. 2018. Deep sign: Enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs. International Journal of Computer Vision 126, 12 (2018), 1311–1325.
[31]
Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu. 2018. Co-occurrence feature learning from Skeleton data for action recognition and detection with hierarchical aggregation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). International Joint Conferences on Artificial Intelligence Organization, 786–792. DOI:
[32]
Dongxu Li, Cristian Rodriguez, Xin Yu, and Hongdong Li. 2020. Transferring cross-domain knowledge for video sign language recognition. In IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR’20). 6205–6214.
[33]
Dongxu Li, Cristian Rodriguez, Xin Yu, and Hongdong Li. 2020. Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV’20). 1459–1469.
[34]
Kexin Li, Zongxin Yang, Lei Chen, Yi Yang, and Jun Xiao. 2023. CATR: Combinatorial-dependence audio-queried transformer for audio-visual video segmentation. In Proceedings of the 31st ACM International Conference on Multimedia. 1485–1494.
[35]
Yan Li, Bin Ji, Xintian Shi, Jianguo Zhang, Bin Kang, and Limin Wang. 2020. Tea: Temporal excitation and aggregation for action recognition. In IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR’20). 909–918.
[36]
Ji Lin, Chuang Gan, and Song Han. 2019. TSM: Temporal shift module for efficient video understanding. In International Conference on Computer Vision (ICCV’19). 7083–7093.
[37]
Jinliang Lin, Zhedong Zheng, Zhun Zhong, Zhiming Luo, Shaozi Li, Yi Yang, and Nicu Sebe. 2022. Joint representation learning and keypoint detection for cross-view geo-localization. IEEE Transactions on Image Processing (TIP’22) 31 (2022), 3780–3792.
[38]
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual Instruction Tuning. In Neural Information Processing Systems (NIPS’23). https://openreview.net/forum?id=w0H2xGHlkw
[39]
Rui Liu and Yahong Han. 2022. Instance-sequence reasoning for video question answering. Frontiers of Computer Science 16, 6 (2022), 1–9. DOI:
[40]
Xin Liu, Henglin Shi, Haoyu Chen, Zitong Yu, Xiaobai Li, and Guoying Zhao. 2021. iMiGUE: An Identity-free Video Dataset for Micro-Gesture Understanding and Emotion Analysis. arxiv:cs.CV/2107.00285 (2021).
[41]
Zhaoyang Liu, Donghao Luo, Yabiao Wang, Limin Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, and Tong Lu. 2020. Teinet: Towards an efficient architecture for video recognition. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’20), Vol. 34. 11669–11676.
[42]
Zhengzhe Liu, Lei Pang, and Xiaojuan Qi. 2024. MEN: Mutual enhancement networks for sign language recognition and education. IEEE Transactions on Neural Networks and Learning Systems 35, 1 (2024), 311–325.
[43]
Stephan Liwicki and Mark Everingham. 2009. Automatic recognition of fingerspelled words in British sign language. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. 50–57.
[44]
Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv:1711.05101 (2017).
[45]
Ilya Loshchilov and Frank Hutter. 2017. SGDR: Stochastic gradient descent with warm restarts. In International Conference on Learning Representations (ICLR’17). https://openreview.net/forum?id=Skq89Scxx
[46]
Minlong Lu, Ze-Nian Li, Yueming Wang, and Gang Pan. 2019. Deep attention network for egocentric action recognition. IEEE Transactions on Image Processing 28, 8 (2019), 3703–3713.
[47]
Fan Ma, Xiaojie Jin, Heng Wang, Yuchen Xian, Jiashi Feng, and Yi Yang. 2024. Vista-LLaMA: Reliable video narrator via equal distance to visual tokens. In IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR’24).
[48]
Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, and Josef Sivic. 2019. HowTo100M: Learning a text-video embedding by watching hundred million narrated video clips. In International Conference on Computer Vision (ICCV’19).
[49]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32 (2019).
[50]
Lionel Pigou, Aäron Van Den Oord, Sander Dieleman, Mieke Van Herreweghe, and Joni Dambre. 2018. Beyond temporal pooling: Recurrence and temporal convolutions for gesture recognition in video. International Journal of Computer Vision 126, 2 (2018), 430–439.
[51]
Zhenyue Qin, Yang Liu, Pan Ji, Dongwoo Kim, Lei Wang, R. I. McKay, Saeed Anwar, and Tom Gedeon. 2024. Fusing higher-order features in graph neural networks for Skeleton-based action recognition. IEEE Transactions on Neural Networks and Learning Systems 35, 4 (2024), 4783–4797.
[52]
Zhaofan Qiu, Ting Yao, and Tao Mei. 2017. Learning spatio-temporal representation with pseudo-3D residual networks. In International Conference on Computer Vision (ICCV’17). 5533–5541.
[53]
Ruijie Quan, Wenguan Wang, Zhibo Tian, Fan Ma, and Yi Yang. 2024. Psychometry: An omnifit model for image reconstruction from human brain activity. In IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR’24).
[54]
Javier Romero, Dimitrios Tzionas, and Michael J. Black. 2017. Embodied hands: modeling and capturing hands and bodies together. ACM Transactions on Graphics (TOG’17) 36, 6 (2017), 1–17.
[55]
Prem Selvaraj, Gokul NC, Pratyush Kumar, and Mitesh Khapra. 2021. OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages. arxiv:cs.CL/2110.05877 (2021).
[56]
Xiaolong Shen, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, and Yi Yang. 2023. Global-to-local modeling for video-based 3d human pose and shape estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’23). 8887–8896.
[57]
Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2020. Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Transactions on Image Processing 29 (2020), 9532–9545. DOI:
[58]
Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems (NIPS’14). Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger (Eds.). Vol. 27. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2014/file/00ec53c4682d36f5c4359f4ae7bd7ba1-Paper.pdf
[59]
Ozge Mercanoglu Sincan, Anil Osman Tur, and Hacer Yalim Keles. 2019. Isolated sign language recognition with multi-scale features using LSTM. In 2019 27th Signal Processing and Communications Applications Conference (SIU’19). IEEE, 1–4.
[60]
Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, and Jiaying Liu. 2017. An end-to-end spatio-temporal attention model for human action recognition from Skeleton data. In Proceedings of the AAAI Conference on Artificial Intelligence. 4263–4270.
[61]
Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. 2018. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European Conference on Computer Vision (ECCV). 480–496.
[62]
Yucheng Suo, Zhedong Zheng, Xiaohan Wang, Bang Zhang, and Yi Yang. 2024. Jointly harnessing prior structures and temporal consistency for sign language video generation. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 20, 6, Article 185 (March2024), 18 pages. DOI:
[63]
Alaa Tharwat, Tarek Gaber, Aboul Ella Hassanien, Mohamed K. Shahin, and Basma Refaat. 2015. SIFT-based arabic sign language recognition system. In Proceedings of the Afro-European Conference for Industrial Advancement. 359–370.
[64]
Suramya Tomar. 2006. Converting video formats with FFmpeg. Linux Journal 2006, 146 (2006), 10.
[65]
Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A closer look at spatiotemporal convolutions for action recognition. In IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR’18). 6450–6459.
[66]
Anirudh Tunga, Sai Vidyaranya Nuthalapati, and Juan Wachs. 2021. Pose-based sign language recognition using GCN and BERT. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV’21). 31–40.
[67]
Li-Chun Wang, Ru Wang, De-Hui Kong, and Bao-Cai Yin. 2014. Similarity assessment model for chinese sign language videos. IEEE Transactions on Multimedia 16, 3 (2014), 751–761. DOI:
[68]
Shiguang Wang, Zhizhong Li, Yue Zhao, Yuanjun Xiong, Limin Wang, and Dahua Lin. 2020. denseflow. https://github.com/open-mmlab/denseflow
[69]
Tingyu Wang, Zhedong Zheng, Chenggang Yan, Jiyong Zhang, Yaoqi Sun, Bolun Zheng, and Yi Yang. 2022. Each part matters: Local patterns facilitate cross-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology 32, 2 (Feb. 2022), 867–879.
[70]
Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, and Kevin Murphy. 2018. Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In Proceedings of the European Conference on Computer Vision (ECCV). 305–321.
[71]
Hongyang Xue, Zhou Zhao, and Deng Cai. 2017. Unifying the video and question attentions for open-ended video question answering. IEEE Transactions on Image Processing 26, 12 (2017), 5656–5666. DOI:
[72]
Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for Skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence. 7444–7452.
[73]
Hao Yang, Chunfeng Yuan, Li Zhang, Yunda Sun, Weiming Hu, and Stephen J. Maybank. 2020. STA-CNN: Convolutional spatial-temporal attention learning for action recognition. IEEE Transactions on Image Processing 29 (2020), 5783–5793. DOI:
[74]
Zongxin Yang, Guikun Chen, Xiaodi Li, Wenguan Wang, and Yi Yang. 2024. Doraemongpt: Toward understanding dynamic scenes with large language models. arXiv preprint arXiv:2401.08392 (2024).
[75]
Zongxin Yang, Yunchao Wei, and Yi Yang. 2021. Collaborative video object segmentation by multi-scale foreground-background integration. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 9 (2021), 4701–4712.
[76]
Farhad Yasir, P. W. Chandana Prasad, Abeer Alsadoon, and Amr Elchouemi. 2015. SIFT based approach on Bangla sign language recognition. In International Workshop on Combinatorial Image Analysis. 35–39.
[77]
Zitong Yu, Benjia Zhou, Jun Wan, Pichao Wang, Haoyu Chen, Xin Liu, Stan Z. Li, and Guoying Zhao. 2021. Searching multi-rate and multi-modal temporal enhanced networks for gesture recognition. IEEE Transactions on Image Processing 30 (2021), 5626–5640. DOI:
[78]
Christopher Zach, Thomas Pock, and Horst Bischof. 2007. A duality based approach for realtime TV-L1 optical flow. In Proceedings of Joint Pattern Recognition Symposium. Springer, 214–223.
[79]
Weichao Zhao, Hezhen Hu, Wengang Zhou, Jiaxin Shi, and Houqiang Li. 2023. BEST: BERT Pre-Training for Sign Language Recognition with Coupling Tokenization. arxiv:cs.CV/2302.05075 (2023).
[80]
Zengqun Zhao, Qingshan Liu, and Shanmin Wang. 2021. Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Transactions on Image Processing 30 (2021), 6544–6556. DOI:
[81]
Zhedong Zheng, Xiaohan Wang, Nenggan Zheng, and Yi Yang. 2024. Parameter-efficient person Re-identification in the 3D space. IEEE Transactions on Neural Networks and Learning Systems (2024), 1–14. DOI:
[82]
Zhedong Zheng, Liang Zheng, and Yi Yang. 2018. Pedestrian alignment network for large-scale person re-identification. IEEE Transactions on Circuits and Systems for Video Technology 29, 10 (2018), 3037–3045. DOI:
[83]
Hao Zhou, Wengang Zhou, Yun Zhou, and Houqiang Li. 2022. Spatial-temporal multi-cue network for sign language recognition and translation. IEEE Transactions on Multimedia 24 (2022), 768–779. DOI:

Cited By

View all
  • (2024)Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognitionFrontiers in Computational Neuroscience10.3389/fncom.2024.150829718Online publication date: 26-Nov-2024
  • (2024)Enhancing Brazilian Sign Language Recognition Through Skeleton Image Representation2024 37th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)10.1109/SIBGRAPI62404.2024.10716301(1-6)Online publication date: 30-Sep-2024
  • (2024)SML: A Skeleton-based multi-feature learning method for sign language recognitionKnowledge-Based Systems10.1016/j.knosys.2024.112288301(112288)Online publication date: Oct-2024

Index Terms

  1. StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 7
      July 2024
      973 pages
      EISSN:1551-6865
      DOI:10.1145/3613662
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 May 2024
      Online AM: 03 April 2024
      Accepted: 14 March 2024
      Revised: 13 February 2024
      Received: 08 October 2023
      Published in TOMM Volume 20, Issue 7

      Check for updates

      Author Tags

      1. Sign language recognition
      2. video analysis

      Qualifiers

      • Research-article

      Funding Sources

      • National Natural Science Foundation of China
      • Fundamental Research Funds for the Central Universities

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)280
      • Downloads (Last 6 weeks)42
      Reflects downloads up to 26 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognitionFrontiers in Computational Neuroscience10.3389/fncom.2024.150829718Online publication date: 26-Nov-2024
      • (2024)Enhancing Brazilian Sign Language Recognition Through Skeleton Image Representation2024 37th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)10.1109/SIBGRAPI62404.2024.10716301(1-6)Online publication date: 30-Sep-2024
      • (2024)SML: A Skeleton-based multi-feature learning method for sign language recognitionKnowledge-Based Systems10.1016/j.knosys.2024.112288301(112288)Online publication date: Oct-2024

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media