Abstract
Analyzing laparoscopic surgery videos presents a complex and multifaceted challenge, with applications including surgical training, intra-operative surgical complication prediction, and post-operative surgical assessment. Identifying crucial events within these videos is a significant prerequisite in a majority of these applications. In this paper, we introduce a comprehensive dataset tailored for relevant event recognition in laparoscopic gynecology videos. Our dataset includes annotations for critical events associated with major intra-operative challenges and post-operative complications. To validate the precision of our annotations, we assess event recognition performance using several CNN-RNN architectures. Furthermore, we introduce and evaluate a hybrid transformer architecture coupled with a customized training-inference framework to recognize four specific events in laparoscopic surgery videos. Leveraging the Transformer networks, our proposed architecture harnesses inter-frame dependencies to counteract the adverse effects of relevant content occlusion, motion blur, and surgical scene variation, thus significantly enhancing event recognition accuracy. Moreover, we present a frame sampling strategy designed to manage variations in surgical scenes and the surgeons’ skill level, resulting in event recognition with high temporal resolution. We empirically demonstrate the superiority of our proposed methodology in event recognition compared to conventional CNN-RNN architectures through a series of extensive experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aldahoul, N., Karim, H.A., Tan, M.J.T., Fermin, J.L.: Transfer learning and decision fusion for real time distortion classification in laparoscopic videos. IEEE Access 9, 115006–115018 (2021)
Bello, I., Zoph, B., Vaswani, A., Shlens, J., Le, Q.V.: Attention augmented convolutional networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 3286–3295 (2019)
Czempiel, T., et al.: TeCNO: surgical phase recognition with multi-stage temporal convolutional networks. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 343–352. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_33
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Funke, I., Mees, S.T., Weitz, J., Speidel, S.: Video-based surgical skill assessment using 3D convolutional neural networks. Int. J. Comput. Assist. Radiol. Surg. 14, 1217–1225 (2019)
Ghamsarian, N.: Enabling relevance-based exploration of cataract videos. In: Proceedings of the 2020 International Conference on Multimedia Retrieval, pp. 378–382 (2020)
Ghamsarian, N., Amirpourazarian, H., Timmerer, C., Taschwer, M., Schöffmann, K.: Relevance-based compression of cataract surgery videos using convolutional neural networks. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 3577–3585 (2020)
Ghamsarian, N., Taschwer, M., Putzgruber-Adamitsch, D., Sarny, S., El-Shabrawi, Y., Schoeffmann, K.: LensID: A CNN-RNN-based framework towards lens irregularity detection in cataract surgery videos. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12908, pp. 76–86. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87237-3_8
Ghamsarian, N., Taschwer, M., Putzgruber-Adamitsch, D., Sarny, S., Schoeffmann, K.: Relevance detection in cataract surgery videos by spatio-temporal action localization. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10720–10727. IEEE (2021)
Golany, T., et al.: Artificial intelligence for phase recognition in complex laparoscopic cholecystectomy. Surg. Endosc. 36(12), 9215–9223 (2022)
He, Z., Mottaghi, A., Sharghi, A., Jamal, M.A., Mohareri, O.: An empirical study on activity recognition in long surgical videos. In: Machine Learning for Health, pp. 356–372. PMLR (2022)
Huang, G.: Surgical action recognition and prediction with transformers. In: 2022 IEEE 2nd International Conference on Software Engineering and Artificial Intelligence (SEAI), pp. 36–40. IEEE (2022)
Jin, Y., et al.: Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med. Image Anal. 59, 101572 (2020)
Kiyasseh, D., et al.: A vision transformer for decoding surgeon activity from surgical videos. Nat. Biomed. Eng. 7, 780–796 (2023)
Leibetseder, A., Primus, M.J., Petscharnig, S., Schoeffmann, K.: Image-based smoke detection in laparoscopic videos. In: Cardoso, M.J., et al. (eds.) CARE/CLIP -2017. LNCS, vol. 10550, pp. 70–87. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67543-5_7
Lim, S., Ghosh, S., Niklewski, P., Roy, S.: Laparoscopic suturing as a barrier to broader adoption of laparoscopic surgery. J. Soc. Laparoendosc. Surg. 21(3), e2017.00021 (2017)
Loukas, C.: Video content analysis of surgical procedures. Surg. Endosc. 32, 553–568 (2018)
Loukas, C., Georgiou, E.: Smoke detection in endoscopic surgery videos: a first step towards retrieval of semantic events. Int. J. Med. Robot. Comput. Assist. Surg. 11(1), 80–94 (2015)
Loukas, C., Varytimidis, C., Rapantzikos, K., Kanakis, M.A.: Keyframe extraction from laparoscopic videos based on visual saliency detection. Comput. Methods Programs Biomed. 165, 13–23 (2018)
Lux, M., Marques, O., Schöffmann, K., Böszörmenyi, L., Lajtai, G.: A novel tool for summarization of arthroscopic videos. Multimedia Tools Appl. 46, 521–544 (2010)
Namazi, B., Sankaranarayanan, G., Devarajan, V.: Automatic detection of surgical phases in laparoscopic videos. In: Proceedings on the International Conference in Artificial Intelligence (ICAI), pp. 124–130 (2018)
Namazi, B., Sankaranarayanan, G., Devarajan, V.: A contextual detector of surgical tools in laparoscopic videos using deep learning. Surg. Endosc. 36, 679–688 (2022)
Nasirihaghighi, S., Ghamsarian, N., Stefanics, D., Schoeffmann, K., Husslein, H.: Action recognition in video recordings from gynecologic laparoscopy. In: 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS), pp. 29–34 (2023)
Polat, M., Incebiyik, A., Tammo, O.: Abdominal access in laparoscopic surgery of obese patients: a novel abdominal access technique. Ann. Saudi Med. 43(4), 236–242 (2023)
Schoeffmann, K., Del Fabro, M., Szkaliczki, T., Böszörmenyi, L., Keckstein, J.: Keyframe extraction in endoscopic video. Multimedia Tools Appl. 74, 11187–11206 (2015)
Shi, C., Zheng, Y., Fey, A.M.: Recognition and prediction of surgical gestures and trajectories using transformer models in robot-assisted surgery. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8017–8024. IEEE (2022)
Shi, P., Zhao, Z., Liu, K., Li, F.: Attention-based spatial-temporal neural network for accurate phase recognition in minimally invasive surgery: feasibility and efficiency verification. J. Comput. Des. Eng. 9(2), 406–416 (2022)
Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., Padoy, N.: EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2016)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Wang, C., Cheikh, F.A., Kaaniche, M., Elle, O.J.: A smoke removal method for laparoscopic images. arXiv preprint arXiv:1803.08410 (2018)
Acknowledgement
This work was funded by the FWF Austrian Science Fund under grant P 32010-N38. The authors would like to thank Daniela Stefanics for her help with data annotations.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Nasirihaghighi, S., Ghamsarian, N., Husslein, H., Schoeffmann, K. (2024). Event Recognition in Laparoscopic Gynecology Videos with Hybrid Transformers. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14565. Springer, Cham. https://doi.org/10.1007/978-3-031-56435-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-56435-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56434-5
Online ISBN: 978-3-031-56435-2
eBook Packages: Computer ScienceComputer Science (R0)