Abstract
Automating objective assessment of surgical technical skill is necessary to support training and professional certification at scale, even in settings with limited access to an expert surgeon. Likewise, automated surgical activity recognition can improve operating room workflow efficiency, teaching and self-review, and aid clinical decision support systems. However, current supervised learning methods to do so, rely on large training datasets. Crowdsourcing has become a standard in curating such large training datasets in a scalable manner. The use of crowdsourcing in surgical data annotation and its effectiveness has been studied only in a few settings. In this study, we evaluated reliability and validity of crowdsourced annotations for information on surgical instruments (name of instruments and pixel location of key points on instruments). For 200 images sampled from videos of two cataract surgery procedures, we collected 9 independent annotations per image. We observed an inter-rater agreement of 0.63 (Fleiss’ kappa), and an accuracy of 0.88 for identification of instruments compared against an expert annotation. We obtained a mean pixel error of 5.77 pixels for annotation of instrument tip key points. Our study shows that crowdsourcing is a reliable and accurate alternative to expert annotations to identify instruments and instrument tip key points in videos of cataract surgery.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Vedula, S., Ishii, M., Hager, G.: Objective assessment of surgical technical skill and competency in the operating room. Ann. Rev. Biomed. Eng. 21(19), 301–325 (2017)
Puri, S., Kiely, A., Wang, J., Woodfield, A., Ramanathan, S., Sikder, S.: Comparing resident cataract surgery outcomes under novice versus experienced attending supervision. Clin. Opthalmology 9, 1675–1681 (2015)
Birkmeyer, J.D., et al.: Surgical skill and complication rates after bariatric surgery. N. Engl. J. Med. 369(15), 1434–1442 (2013)
Forestier, G., Petitjean, F., Senin, P., Riffaud, L., Hénaux, P., Jannin, P.: Finding discriminative and interpretable patterns in sequences of surgical activities. Artif. Intell. Med. 82, 11–19 (2017). https://doi.org/10.1016/j.artmed.2017.09.002
Gao, Y., et al.: The JHU-ISI gesture and skill assessment working set (JIGSAWS): a surgical activity dataset for human motion modeling. In. Modeling and Monitoring of Computer Assisted Interventions (M2CAI), MICCAI (2014)
Bodenstedt, S., et al.: Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery. ArXiv e-prints, May 2018
Sznitman, R., Becker, C., Fua, P.: Fast part-based classification for instrument detection in minimally invasive surgery. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014, Part II. LNCS, vol. 8674, pp. 692–699. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10470-6_86
Rieke, N., et al.: Surgical tool tracking and pose estimation in retinal microsurgery. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015, Part I. LNCS, vol. 9349, pp. 266–273. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24553-9_33
Reiter, A., Allen, P.K., Zhao, T.: Appearance learning for 3D tracking of robotic surgical tools. Int. J. Robot. Res. 33(2), 342–356 (2014). https://doi.org/10.1177/0278364913507796
Malpani, A., Vedula, S.S., Chen, C.C.G., Hager, G.D.: A study of crowdsourced segment-level surgical skill assessment using pairwise rankings. Int. J. Comput. Assist. Radiol. Surg. 10, 1435–1447 (2015)
Maier-Hein, L., et al.: Can masses of non-experts train highly accurate image classifiers? A crowdsourcing approach to instrument segmentation in laparoscopic images. Med. Image Comput. Comput. Assist. Interv. 17(Pt 2), 438–445 (2014)
Little, G., Chilton, L.B., Goldman, M., Miller, R.C.: Turkit: tools for iterative tasks on mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP 2009, pp. 29–30. ACM, New York (2009). http://doi.acm.org/10.1145/1600150.1600159
Fleiss, J.L., Cohen, J.: The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ. Psychol. Measur. 33(3), 613–619 (1973). https://doi.org/10.1177/001316447303300309
Funding
Wilmer Eye Institute Pooled Professor’s Fund and grant to Wilmer Eye Institute from Research to Prevent Blindness.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Kim, T.S., Malpani, A., Reiter, A., Hager, G.D., Sikder, S., Swaroop Vedula, S. (2018). Crowdsourcing Annotation of Surgical Instruments in Videos of Cataract Surgery. In: Stoyanov, D., et al. Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis. LABELS CVII STENT 2018 2018 2018. Lecture Notes in Computer Science(), vol 11043. Springer, Cham. https://doi.org/10.1007/978-3-030-01364-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-01364-6_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01363-9
Online ISBN: 978-3-030-01364-6
eBook Packages: Computer ScienceComputer Science (R0)