Multi-Modal Deep Learning for Assessing Surgeon Technical Skill
<p>The trials were recorded using three modalities. The top is an image of the final product, the middle is a screen capture of the video data with a visualization of the joints tracked by the Leap sensor. The bottom is an example of the kinematic time series data, representing the temporal 3-dimensional movement of the hand joints during the knot tying task.</p> "> Figure 2
<p>Participants came from 10 surgical divisions, with experiences ranging from PGY1 to Fellow.</p> "> Figure 3
<p>Images were analyzed using a ResNet-based network, and the kinematic data was analyzed using a 1D ResNet-18 as a ‘feature extractor’, followed by 2 bidirectional LSTM layers. The combined multi-modal network is concurrently trained on both the image and kinematic data as input, and predicts all four GRS domains.</p> "> Figure 4
<p>Participant experience and rating on the `Overall Performance’ domain. A significant difference was found between the Beginner and Intermediate groups.</p> "> Figure 5
<p>Graphical comparison of the MSE on the GRS Domains—lower MSE is better.</p> ">
Abstract
:1. Introduction
- Development of a multi-modal deep learning model that combines data from both images of the final surgical product and kinematic data of the procedure. We demonstrate that this model can assess surgical performance with comparable performance to the expert human raters on several assessment domains. This is significant since existing approaches are limited in scope and predominately focus on predicting solely high-level categories.
- Ablation studies comparing the image-based, kinematic-based, and combined multi-modal networks. We show that the multi-modal network demonstrates the best overall performance.
- A new dataset of seventy-two surgical trainees and surgeons collected during a University of Toronto Department of Surgery Prep Camp and Orthopaedics Bootcamp. This consists of image, video, and kinematic data of the simulated surgical task, as well as skills assessment evaluations performed by three expert raters. This large dataset will present new and challenging opportunities for data-driven approaches to surgical skills assessment and gesture recognition tasks. (The dataset can be downloaded here: https://osf.io/rg35w/ ).
Related Work
2. Materials and Methods
2.1. Surgical Task
2.2. Data Collection
- High resolution digital photograph of the final product
- Anonymized video recording of the operative field
- 3D kinematic motion tracking of the hands using a Leap Sensor
2.3. Task Ratings
- 1.
- Respect for Tissue
- 2.
- Time and Motion
- 3.
- Quality of Final Product
- 4.
- Overall Performance
2.4. Data Pre-Processing
2.5. Data Augmentation
2.6. Machine Learning Models
2.7. Statistical Analysis
3. Results
3.1. Dataset Analysis
3.2. Deep Learning Model Performance
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Reznick, R.K.; MacRae, H. Teaching surgical skills–changes in the wind. N. Engl. J. Med. 2006, 355, 2664–2669. [Google Scholar] [CrossRef] [PubMed]
- Sonnadara, R.R.; Mui, C.; McQueen, S.; Mironova, P.; Nousiainen, M.; Safir, O.; Kraemer, W.; Ferguson, P.; Alman, B.; Reznick, R. Reflections on Competency-Based Education and Training for Surgical Residents. J. Surg. Educ. 2014, 71, 151–158. [Google Scholar] [CrossRef] [PubMed]
- Boet, S.; Etherington, C.; Lam, S.; Lê, M.; Proulx, L.; Britton, M.; Kenna, J.; Przybylak-Brouillard, A.; Grimshaw, J.; Grantcharov, T.; et al. Implementation of the Operating Room Black Box Research Program at the Ottawa Hospital Through Patient, Clinical, and Organizational Engagement: Case Study. J. Med. Internet Res. 2021, 23, e15443. [Google Scholar] [CrossRef] [PubMed]
- Poursartip, B.; LeBel, M.E.; McCracken, L.C.; Escoto, A.; Patel, R.V.; Naish, M.D.; Trejos, A.L. Energy-Based Metrics for Arthroscopic Skills Assessment. Sensors 2017, 17, 1808. [Google Scholar] [CrossRef]
- Yanik, E.; Intes, X.; Kruger, U.; Yan, P.; Diller, D.; Voorst, B.; Makled, B.; Norfleet, J.; De, S. Deep neural networks for the assessment of surgical skills: A systematic review. J. Def. Model. Simul. Appl. Methodol. Technol. 2021, 19, 159–171. [Google Scholar] [CrossRef]
- Gao, Y.; Vedula, S.S.; Reiley, C.E.; Ahmidi, N.; Varadarajan, B.; Lin, H.C.; Tao, L.; Zappella, L.; Béjar, B.; Yuh, D.D.; et al. JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS): A Surgical Activity Dataset for Human Motion Modeling. In Proceedings of the Modeling and Monitoring of Computer Assisted Interventions (M2CAI)—MICCAI Workshop, Boston, MA, USA, 14–18 September 2014. [Google Scholar]
- Fard, M.J.; Ameri, S.; Ellis, R.D.; Chinnam, R.B.; Pandya, A.K.; Klein, M.D. Automated robot-assisted surgical skill evaluation: Predictive analytics approach. Int. J. Med Robot. Comput. Assist. Surg. 2018, 14, e1850. [Google Scholar] [CrossRef]
- Law, H.; Ghani, K.; Deng, J. Surgeon Technical Skill Assessment Using Computer Vision Based Analysis. In Proceedings of the 2nd Machine Learning for Healthcare Conference, Boston, MA, USA, 18–19 August 2017; Available online: https://proceedings.mlr.press/v68/law17a.html (accessed on 17 August 2022).
- Watson, R.A. Use of a machine learning algorithm to classify expertise: Analysis of hand motion patterns during a simulated surgical task. Acad. Med. 2014, 89, 1163–1167. [Google Scholar] [CrossRef]
- Martin, J.A.; Regehr, G.; Reznick, R.; Macrae, H.; Murnaghan, J.; Hutchison, C.; Brown, M. Objective structured assessment of technical skill (OSATS) for surgical residents. Br. J. Surg. 1997, 84, 273–278. [Google Scholar]
- Khalid, S.; Goldenberg, M.G.; Grantcharov, T.P.; Taati, B.; Rudzicz, F. Evaluation of Deep Learning Models for Identifying Surgical Actions and Measuring Performance. JAMA Netw. Open 2020, 3, e201664. [Google Scholar] [CrossRef]
- Aneeq, Z.; Yachna, S.; Vinay, B. Video and accelerometer-based motion analysis for automated surgical skills assessment. Int. J. Comput. Assist. Radiol. Surg. 2018, 13, 443–455. [Google Scholar]
- Ismail Fawaz, H.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.A. Accurate and interpretable evaluation of surgical skills from kinematic data using fully convolutional neural networks. Int. J. Comput. Assist. Radiol. Surg. 2019, 14, 1611–1617. [Google Scholar] [CrossRef] [PubMed]
- O’Driscoll, O.; Hisey, R.; Camire, D.; Erb, J.; Howes, D.; Fichtinger, G.; Ungi, T. Object detection to compute performance metrics for skill assessment in central venous catheterization. In SPIE 11598, Proceedings of the Medical Imaging 2021: Image-Guided Procedures, Robotic Interventions, and Modeling, Online, 15–19 February 2021; Linte, C.A., Siewerdsen, J.H., Eds.; International Society for Optics and Photonics, SPIE: Bellingham, WA, USA, 2021; Volume 11598, pp. 315–322. [Google Scholar] [CrossRef]
- O’Driscoll, O.; Hisey, R.; Holden, M.; Camire, D.; Erb, J.; Howes, D.; Ungi, T.; Fichtinger, G. Feasibility of object detection for skill assessment in central venous catheterization. In SPIE 12034, Proceedings of the Medical Imaging 2022: Image-Guided Procedures, Robotic Interventions, and Modeling, San Diego, CA, USA, 20–23 February 2022; Linte, C.A., Siewerdsen, J.H., Eds.; International Society for Optics and Photonics, SPIE: Bellingham, WA, USA, 2022; Volume 12034, pp. 358–365. [Google Scholar] [CrossRef]
- Zia, A.; Essa, I. Automated surgical skill assessment in RMIS training. Int. J. Comput. Assist. Radiol. Surg. 2018, 13, 731–739. [Google Scholar] [CrossRef] [PubMed]
- Ismail Fawaz, H.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.A. Evaluating Surgical Skills from Kinematic Data Using Convolutional Neural Networks. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2018, Granada, Spain, 16–20 September 2018; Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 214–221. [Google Scholar]
- Ordonez, F.J.; Roggen, D. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition. Sensors 2016, 16, 115. [Google Scholar] [CrossRef] [PubMed]
- Burns, D.M.; Leung, N.; Hardisty, M.; Whyne, C.M.; Henry, P.; McLachlin, S. Shoulder Physiotherapy Exercise Recognition: Machine Learning the Inertial Signals from a Smartwatch. Physiol. Meas. 2018, 39, 075007. [Google Scholar] [CrossRef] [PubMed]
- Hammerla, N.Y.; Halloran, S.; Ploetz, T. Deep, Convolutional, and Recurrent Models for Human Activity Recognition using Wearables. arXiv 2016, arXiv:1604.08880. [Google Scholar]
- Rueda, F.M.; Grzeszick, R.; Fink, G.A.; Feldhorst, S.; ten Hompel, M. Convolutional Neural Networks for Human Activity Recognition Using Body-Worn Sensors. Informatics 2018, 5, 26. [Google Scholar] [CrossRef]
- Huang, J.; Lin, S.; Wang, N.; Dai, G.; Xie, Y.; Zhou, J. TSE-CNN: A Two-Stage End-to-End CNN for Human Activity Recognition. IEEE J. Biomed. Health Inform. 2020, 24, 292–299. [Google Scholar] [CrossRef]
- Cheng, Y.; Ji, X.; Li, X.; Zhang, T.; Malebary, S.J.; Qu, X.; Xu, W. Identifying Child Users via Touchscreen Interactions. ACM Trans. Sens. Netw. (TOSN) 2020, 16, 1–25. [Google Scholar] [CrossRef]
- Seeland, M.; Mäder, P. Multi-view classification with convolutional neural networks. PLoS ONE 2021, 16, e0245230. [Google Scholar] [CrossRef]
- DiMaio, S.; Hanuschik, M.; Kreaden, U. The da Vinci Surgical System. In Surgical Robotics; Rosen, J., Hannaford, B., Satava, R., Eds.; Springer: Boston, MA, USA, 2011. [Google Scholar] [CrossRef]
- Burns, D.M.; Whyne, C.M. Seglearn: A Python Package for Learning Sequences and Time Series. J. Mach. Learn. Res. 2018, 19, 3238–3244. [Google Scholar]
- Itzkovich, D.; Sharon, Y.; Jarc, A.; Refaely, Y.; Nisky, I. Using Augmentation to Improve the Robustness to Rotation of Deep Learning Segmentation in Robotic-Assisted Surgical Data. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 5068–5075. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Varno, F.; Soleimani, B.H.; Saghayi, M.; Di-Jorio, L.; Matwin, S. Efficient Neural Task Adaptation by Maximum Entropy Initialization. arXiv 2019, arXiv:1905.10698. [Google Scholar]
- Koo, T.; Li, M. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J. Chiropr. Med. 2016, 15, 155–163. [Google Scholar] [CrossRef] [PubMed]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Hopmans, C.J.; den Hoed, P.T.; van der Laan, L.; van der Harst, E.; van der Elst, M.; Mannaerts, G.H.H.; Dawson, I.; Timman, R.; Wijnhoven, B.P.; Ijzermans, J.N.M. Assessment of surgery residents’ operative skills in the operating theater using a modified Objective Structured Assessment of Technical Skills (OSATS): A prospective multicenter study. Surgery 2014, 156, 1078–1088. [Google Scholar] [CrossRef] [PubMed]
Domain | Rating Scale |
---|---|
Respect for Tissue | 1—Very poor: Frequent or excessive pulling or sawing of tissue |
3—Competent: Careful handling of tissue with occasional sawing or pulling | |
5—Clearly superior: Consistent atraumatic handling of tissue | |
Time and Motion | 1—Very poor: Many unnecessary movements |
3—Competent: Efficient time/motion but some unnecessary moves | |
5—Clearly superior: Clear economy of movement and maximum efficiency | |
Quality of Final Product | 1—Very poor |
3—Competent | |
5—Clearly superior | |
Overall Performance | 1—Very poor |
3—Competent | |
5—Clearly superior |
Hyperparameter | Value |
---|---|
Learning rate | |
Optimizer | Adam () |
Batch size | 16 |
Dropout | 0.50 |
Epochs (frozen backbone) | 50 |
Epochs (fine-tuning backbone) | 50 |
Loss function | Mean Squared Error |
Image dimensions | (1024, 1024) |
Timeseries length | 4223 timestamps |
GRS Domain | ICC (2,3) | SEM (2,3) | ICC (2,1) | SEM (2,1) |
---|---|---|---|---|
Respect for Tissue | 0.71 | 0.45 | 0.47 | 0.62 |
Time and Motion | 0.70 | 0.47 | 0.44 | 0.64 |
Quality of Final Product | 0.83 | 0.40 | 0.63 | 0.61 |
Overall Performance | 0.73 | 0.39 | 0.47 | 0.55 |
GRS Domains | Rater 1 | Rater 2 | Rater 3 | |||
---|---|---|---|---|---|---|
ICC | SEM | ICC | SEM | ICC | SEM | |
Respect for Tissue | 0.84 | 0.43 | 0.49 | 0.55 | 0.55 | 0.54 |
Time and Motion | 0.83 | 0.46 | 0.57 | 0.58 | 0.62 | 0.48 |
Quality of Final Product | 0.88 | 0.40 | 0.79 | 0.47 | 0.69 | 0.43 |
Overall Performance | 0.85 | 0.37 | 0.60 | 0.49 | 0.58 | 0.48 |
GRS Domain | ICC (2,3) | SEM (2,3) | ICC (2,1) | SEM (2,1) |
---|---|---|---|---|
Respect for Tissue | 0.78 | 0.44 | 0.54 | 0.63 |
Time and Motion | 0.81 | 0.41 | 0.58 | 0.61 |
Quality of Final Product | 0.93 | 0.30 | 0.82 | 0.49 |
Overall Performance | 0.86 | 0.30 | 0.68 | 0.30 |
Model | Metric | Respect for Tissue | Time and Motion | Quality of Final Product | Overall Performance |
---|---|---|---|---|---|
Image Model | MSE | - | - | 0.146 | - |
RMSE | - | - | 0.392 | - | |
MAE | - | - | 0.293 | - | |
R2 | - | - | 0.778 | - | |
Kinematic Model | MSE | 0.336 | 0.420 | - | 0.373 |
RMSE | 0.579 | 0.648 | - | 0.610 | |
MAE | 0.523 | 0.456 | - | 0.431 | |
R2 | 0.337 | 0.244 | - | 0.453 | |
Multi-modal Model | MSE | 0.480 | 0.356 | 0.186 | 0.194 |
RMSE | 0.693 | 0.597 | 0.431 | 0.440 | |
MAE | 0.545 | 0.459 | 0.331 | 0.315 | |
R2 | 0.136 | 0.476 | 0.838 | 0.618 | |
Rater 1 | MSE | 0.464 | 0.348 | 0.531 | 0.505 |
RMSE | 0.681 | 0.590 | 0.729 | 0.710 | |
MAE | 0.528 | 0.474 | 0.449 | 0.407 | |
Rater 2 | MSE | 0.546 | 0.553 | 0.545 | 0.466 |
RMSE | 0.739 | 0.744 | 0.738 | 0.683 | |
MAE | 0.586 | 0.483 | 0.425 | 0.436 | |
Rater 3 | MSE | 0.288 | 0.363 | 0.193 | 0.290 |
RMSE | 0.537 | 0.602 | 0.439 | 0.539 | |
MAE | 0.409 | 0.426 | 0.291 | 0.336 |
Model | Metric | Respect for Tissue | Time and Motion | Quality of Final Product | Overall Performance |
---|---|---|---|---|---|
Image Model | ICC(2,1) | - | - | 0.888 | - |
SEM(2,1) | - | - | 0.257 | - | |
Kinematic Model | ICC(2,1) | 0.477 | 0.621 | - | 0.534 |
SEM(2,1) | 0.464 | 0.441 | - | 0.416 | |
Multi-modal Model | ICC(2,1) | 0.301 | 0.591 | 0.904 | 0.746 |
SEM(2,1) | 0.499 | 0.428 | 0.309 | 0.305 | |
Rater 1 | ICC(2,1) | 0.717 | 0.779 | 0.823 | 0.616 |
SEM(2,1) | 0.476 | 0.414 | 0.512 | 0.502 | |
Rater 2 | ICC(2,1) | 0.606 | 0.627 | 0.758 | 0.508 |
SEM(2,1) | 0.516 | 0.524 | 0.521 | 0.689 | |
Rater 3 | ICC(2,1) | 0.797 | 0.797 | 0.924 | 0.789 |
SEM(2,1) | 0.377 | 0.423 | 0.308 | 0.379 |
GRS Domain | ||
---|---|---|
Multi-Modal Model (Ours) | FCN [13] | |
Respect for Tissue | 0.18 | - |
Time and Motion | 0.73 | - |
Quality of Final Product | 0.95 | - |
Overall Performance | 0.82 | - |
Mean | 0.67 | 0.65 |
GRS Domain | Accuracy | |
---|---|---|
Multi-Modal Model (Ours) | Embedding Analysis [11] | |
Time and Motion | 0.54 | 0.32 |
Quality of Final Product | 0.76 | 0.51 |
Overall Performance | 0.76 | 0.41 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kasa, K.; Burns, D.; Goldenberg, M.G.; Selim, O.; Whyne, C.; Hardisty, M. Multi-Modal Deep Learning for Assessing Surgeon Technical Skill. Sensors 2022, 22, 7328. https://doi.org/10.3390/s22197328
Kasa K, Burns D, Goldenberg MG, Selim O, Whyne C, Hardisty M. Multi-Modal Deep Learning for Assessing Surgeon Technical Skill. Sensors. 2022; 22(19):7328. https://doi.org/10.3390/s22197328
Chicago/Turabian StyleKasa, Kevin, David Burns, Mitchell G. Goldenberg, Omar Selim, Cari Whyne, and Michael Hardisty. 2022. "Multi-Modal Deep Learning for Assessing Surgeon Technical Skill" Sensors 22, no. 19: 7328. https://doi.org/10.3390/s22197328
APA StyleKasa, K., Burns, D., Goldenberg, M. G., Selim, O., Whyne, C., & Hardisty, M. (2022). Multi-Modal Deep Learning for Assessing Surgeon Technical Skill. Sensors, 22(19), 7328. https://doi.org/10.3390/s22197328