Take a shot! Natural language control of intelligent robotic X-ray systems in surgery

Killeen, Benjamin D.; Chaudhary, Shreayan; Osgood, Greg; Unberath, Mathias

doi:10.1007/s11548-024-03120-3

Take a shot! Natural language control of intelligent robotic X-ray systems in surgery

Original Article
Published: 15 April 2024

(2024)
Cite this article

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Benjamin D. Killeen ORCID: orcid.org/0000-0003-2511-7929¹,
Shreayan Chaudhary¹,
Greg Osgood² &
…
Mathias Unberath¹

163 Accesses
Explore all metrics

Abstract

Purpose

The expanding capabilities of surgical systems bring with them increasing complexity in the interfaces that humans use to control them. Robotic C-arm X-ray imaging systems, for instance, often require manipulation of independent axes via joysticks, while higher-level control options hide inside device-specific menus. The complexity of these interfaces hinder “ready-to-hand” use of high-level functions. Natural language offers a flexible, familiar interface for surgeons to express their desired outcome rather than remembering the steps necessary to achieve it, enabling direct access to task-aware, patient-specific C-arm functionality.

Methods

We present an English language voice interface for controlling a robotic X-ray imaging system with task-aware functions for pelvic trauma surgery. Our fully integrated system uses a large language model (LLM) to convert natural spoken commands into machine-readable instructions, enabling low-level commands like “Tilt back a bit,” to increase the angular tilt or patient-specific directions like, “Go to the obturator oblique view of the right ramus,” based on automated image analysis.

Results

We evaluate our system with 212 prompts provided by an attending physician, in which the system performed satisfactory actions 97% of the time. To test the fully integrated system, we conduct a real-time study in which an attending physician placed orthopedic hardware along desired trajectories through an anthropomorphic phantom, interacting solely with an X-ray system via voice.

Conclusion

Voice interfaces offer a convenient, flexible way for surgeons to manipulate C-arms based on desired outcomes rather than device-specific processes. As LLMs grow increasingly capable, so too will their applications in supporting higher-level interactions with surgical assistance systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Tool for the Assessment of Swallowing Safety and Efficiency in Adults: Turkish Adaptation of Boston Residue and Clearance Scale

Article Open access 04 May 2024

Evidence of nonverbal communication between nurses and older adults: a scoping review

Article Open access 16 June 2020

Generative AI and human–robot interaction: implications and future agenda for business, society and ethics

Article 15 March 2024

References

Killeen BD, Gao C, Oguine KJ, Darcy S, Armand M, Taylor RH, Osgood G, Unberath M (2023) An autonomous X-ray image acquisition and interpretation system for assisting percutaneous pelvic fracture fixation. Int J CARS 18(7):1201–1208. https://doi.org/10.1007/s11548-023-02941-y
Article Google Scholar
Kausch L, Thomas S, Kunze H, Privalov M, Vetter S, Franke J, Mahnken AH, Maier-Hein L, Maier-Hein K (2020) Toward automatic C-arm positioning for standard projections in orthopedic surgery. Int J CARS 15(7):1095–1105. https://doi.org/10.1007/s11548-020-02204-0
Article Google Scholar
Hendrix G (1982) Natural-language interface. Am J Comput Linguist 8(2):56–61
Google Scholar
Zhang C, Chen J, Li J, Peng Y, Mao Z (2023) Large language models for human–robot interaction: a review. Biomim Intell Robot 3(4):100131. https://doi.org/10.1016/j.birob.2023.100131
Article Google Scholar
Ye Y, You H, Du J (2023) Improved trust in human–robot collaboration with ChatGPT. IEEE Access 11:55748–55754. https://doi.org/10.1109/ACCESS.2023.3282111
Article Google Scholar
Driess D, Xia F, Sajjadi M.S.M, Lynch C, Chowdhery A, Ichter B, Wahid A, Tompson J, Vuong Q, Yu T, Huang W, Chebotar Y, Sermanet P, Duckworth D, Levine S, Vanhoucke V, Hausman K, Toussaint M, Greff K, Zeng A, Mordatch I, Florence P (2023) PaLM-E: an embodied multimodal language model. arXiv. https://doi.org/10.48550/arXiv.2303.03378arXiv:2303.03378
Brown T.B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry, G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D.M, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. arXiv https://doi.org/10.48550/arXiv.2005.14165arXiv:2005.14165
Radford A, Kim JW, Xu T, Brockman, G, McLeavey C, Sutskever I (2022) Robust speech recognition via large-scale weak supervision. arXiv (2022). https://doi.org/10.48550/arXiv.2212.04356arXiv:2212.04356
OpenAI:GPT-4 Technical Report. arXiv (2023). https://doi.org/10.48550/arXiv.2303.08774arXiv:2303.08774
Shridhar M, Manuelli L, Fox D (2021) CLIPort: what and where pathways for robotic manipulation. arXiv. https://doi.org/10.48550/arXiv.2109.12098arXiv:2109.12098
Hundt A, Killeen B, Greene N, Wu H, Kwon H, Paxton C, Hager GD (2020) “Good Robot!’’: efficient reinforcement learning for multi-step visual tasks with sim to real transfer. IEEE Robot Autom Lett 5(4):6724–6731. https://doi.org/10.1109/LRA.2020.3015448
Article Google Scholar
Hundt A, Murali, A, Hubli P, Liu R, Gopalan N, Gombolay M, Hager GD (2022) Good robot! Now watch this!": repurposing reinforcement learning for task-to-task transfer. In: Conference on robot learning. PMLR, pp 1564–1574. https://proceedings.mlr.press/v164/hundt22a.html
Tellex S, Gopalan N, Kress-Gazit H, Matuszek C (2020) Robots that use language. Annu Rev Control Robot Autonom Syst 3(1):25–55. https://doi.org/10.1146/annurev-control-101119-071628
Article Google Scholar
Lynch C, Wahid A, Tompson J, Ding T, Betker J, Baruch R, Armstrong T, Florence P (2023) Interactive language: talking to robots in real time. IEEE Robot Autom Lett 66:1–8. https://doi.org/10.1109/LRA.2023.3295255
Article Google Scholar
Hazlehurst B, Sittig DF, Stevens VJ, Smith KS, Hollis JF, Vogt TM, Winickoff JP, Glasgow R, Palen TE, Rigotti NA (2005) Natural language processing in the electronic medical record: assessing clinician adherence to tobacco treatment guidelines. Am J Prev Med 29(5):434–439. https://doi.org/10.1016/j.amepre.2005.08.007
Article PubMed Google Scholar
Huang H, Zheng O, Wang D, Yin J, Wang Z, Ding S, Yin H, Xu C, Yang R, Zheng Q, Shi B (2023) ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model. Int J Oral Sci 15(29):1–13. https://doi.org/10.1038/s41368-023-00239-y
Article PubMed PubMed Central Google Scholar
Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW (2023) Large language models in medicine. Nat Med 29(8):1930–1940. https://doi.org/10.1038/s41591-023-02448-8
Article CAS PubMed Google Scholar
Meskó B, Topol EJ (2023) The imperative for regulatory oversight of large language models (orgenerative AI) in healthcare. npj Digit Med 6(120):1–6. https://doi.org/10.1038/s41746-023-00873-0
Article Google Scholar
Killeen BD, Cho SM, Armand M, Taylor RH, Unberath M (2023) In silico simulation: a key enabling technology for next-generation intelligent surgical systems. Prog Biomed Eng 5(3):032001. https://doi.org/10.1088/2516-1091/acd28b
Bier B, Unberath M, Zaech J-N, Fotouhi J, Armand M, Osgood G, Navab N, Maier A (2018) X-ray-transform invariant anatomical landmark detection for pelvic trauma surgery. In: Medical image computing and computer assisted intervention—MICCAI 2018. Springer, Cham, Switzerland, pp 55–63. https://doi.org/10.1007/978-3-030-00937-3_7
Liu W, Wang Y, Jiang T, Chi Y, Zhang L, Hua X-S (2020) Landmarks detection with anatomical constraints for total hip arthroplasty preoperative measurements. In: Medical image computing and computer assisted intervention—MICCAI 2020. Springer,Cham, Switzerland, pp 670–679. https://doi.org/10.1007/978-3-030-59719-1_65
Gao C, Killeen BD, Hu Y, Grupp RB, Taylor RH, Armand M, Unberath M (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for X-ray image analysis. Nat Mach Intell 5(3):294–308. https://doi.org/10.1038/s42256-023-00629-1
Article PubMed PubMed Central Google Scholar
Kügler D, Sehring J, Stefanov A, Stenin I, Kristin J, Klenzner T, Schipper J, Mukhopadhyay A (2020) i3PosNet: instrument pose estimation from X-ray in temporal bone surgery. Int J CARS 15(7):1137–1145. https://doi.org/10.1007/s11548-020-02157-4
Article Google Scholar
Killeen BD, Chakraborty S, Osgood G , Unberath M (2022) Toward perception-based anticipation of cortical breach during K-wire fixation of the pelvis. In: Proceedings Volume 12031, medical imaging 2022: physics of medical imaging. SPIE, pp 410–415. https://doi.org/10.1117/12.2612989
Killeen BD, Zhang H, Mangulabnan J, Armand M, Taylor RH, Osgood G, Unberath M (2023) Pelphix: surgical phase recognition from X-ray images in percutaneous pelvic fixation. arXiv. https://doi.org/10.48550/arXiv.2304.09285arXiv:2304.09285
Arbogast N, Kurzendorfer T, Breininger K, Mountney P, Toth D, Narayan SA, Maier A (2019) Workflow phase detection in fluoroscopic images using convolutional neural networks. In: Bildverarbeitung Fr die Medizin 2019. Springer, Wiesbaden, Germany, pp 191–196. https://doi.org/10.1007/978-3-658-25326-4_41
Kausch L, Thomas S, Kunze H, Norajitra T, Klein A, El Barbari JS, Privalov M, Vetter S, Mahnken A, Maier-Hein L, Maier-Hein KH (2021) C-arm positioning for spinal standard projections in different intra-operative settings. In: Medical image computing and computer assisted intervention—MICCAI 2021. Springer, Cham, Switzerland, pp 352–362. https://doi.org/10.1007/978-3-030-87202-1_34
Grupp RB, Unberath M, Gao C, Hegeman RA, Murphy RJ, Alexander CP, Otake Y, McArthur BA, Armand M, Taylor RH (2020) Automatic annotation of hip anatomy in fluoroscopy for robust and efficient 2D/3D registration. Int J Comput Assist Radiol Surg 15(5):759–769. https://doi.org/10.1007/s11548-020-02162-7. arXiv:3233.3361
Article PubMed PubMed Central Google Scholar
Seshamani S, Chintalapani G, Taylor R (2011) Iterative refinement of point correspondences for 3D statistical shape models. In: Medical image computing and computer-assisted intervention—MICCAI 2011. Springer, Berlin, Germany, pp 417–425. https://doi.org/10.1007/978-3-642-23629-7_51
Cámbara G, López F, Bonet D, Gómez P, Segura C, Farrús M, Luque J (2022) TASE: task-aware speech enhancement for wake-up word detection in voice assistants. Appl Sci 12(4):1974. https://doi.org/10.3390/app12041974
Article CAS Google Scholar
Bender EM, Gebru, T, McMillan-Major A, Shmitchell S (2021) On the dangers of stochastic parrots: Can language models be too big? xn–st9h. In: FAccT’21: proceedings of the 2021 ACM conference on fairness, accountability, and transparency. Association for Computing Machinery, New York, NY, USA, pp 610–623. https://doi.org/10.1145/3442188.3445922
Mialon G, Dessì R, Lomeli M, Nalmpantis C, Pasunuru R, Raileanu R, Rozière B, Schick T, Dwivedi-Yu J, Celikyilmaz A, Grave E, LeCun Y, Scialom T (2023) Augmented language models: a survey. arXiv. https://doi.org/10.48550/arXiv.2302.07842. arXiv:2302.07842
Semnani S, Yao V, Zhang H, Lam M (2023) WikiChat: stopping the hallucination of large language model chatbots by few-shot grounding on Wikipedia. ACL Anthol. https://doi.org/10.18653/v1/2023.findings-emnlp.157
Sloos M, Ariza García A, Andersson A, Neijmeijer M (2019) Accent-induced bias in linguistic transcriptions. Lang Sci 76:101176. https://doi.org/10.1016/j.langsci.2018.06.002
Huang L, Yu W, Ma W, Zhong W, Feng Z, Wang H, Chen Q, Peng W, Feng X, Qin B, Liu T (2023) A survey on hallucination in large language models: principles,taxonomy, challenges, and open questions. arXiv. https://doi.org/10.48550/arXiv.2311.05232arXiv:2311.05232
Chen M, Nikolaidis S, Soh H, Hsu D, Srinivasa S (2020) Trust-aware decision making for human–robot collaboration: model learning and planning. J Hum–Robot Interact 9(2):1–23. https://doi.org/10.1145/3359616
Article CAS Google Scholar
Cuadra A, Li S, Lee H, Cho, J, Ju W (2021) My bad! Repairing intelligent voice assistant errors improves interaction. Proc ACM Hum–Comput Interact 5(CSCW1):1–24. https://doi.org/10.1145/3449101

Download references

Funding

This work was supported by the Link Foundation Fellowship for Modeling, Training, and Simulation; the NIH under Grant No. R21EB028505, the NSF under Award No. 2239077, and Johns Hopkins University Internal Funds.

Author information

Authors and Affiliations

Laboratory for Computational Sensing and Robotics, Johns Hopkins University, Baltimore, MD, 21218, USA
Benjamin D. Killeen, Shreayan Chaudhary & Mathias Unberath
Department of Orthopaedic Surgery, Johns Hopkins University, Baltimore, MD, 212187, USA
Greg Osgood

Authors

Benjamin D. Killeen
View author publications
You can also search for this author in PubMed Google Scholar
Shreayan Chaudhary
View author publications
You can also search for this author in PubMed Google Scholar
Greg Osgood
View author publications
You can also search for this author in PubMed Google Scholar
Mathias Unberath
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin D. Killeen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

This article does not contain patient data collected by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Full instructions

The following instructions are provided to the LLM in every interaction.

Each interaction also includes a set of 35 example episodes, such as:

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Killeen, B.D., Chaudhary, S., Osgood, G. et al. Take a shot! Natural language control of intelligent robotic X-ray systems in surgery. Int J CARS (2024). https://doi.org/10.1007/s11548-024-03120-3

Download citation

Received: 04 March 2024
Accepted: 22 March 2024
Published: 15 April 2024
DOI: https://doi.org/10.1007/s11548-024-03120-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Take a shot! Natural language control of intelligent robotic X-ray systems in surgery