Skip to main content
Log in

Take a shot! Natural language control of intelligent robotic X-ray systems in surgery

  • Original Article
  • Published:
International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Abstract

Purpose

The expanding capabilities of surgical systems bring with them increasing complexity in the interfaces that humans use to control them. Robotic C-arm X-ray imaging systems, for instance, often require manipulation of independent axes via joysticks, while higher-level control options hide inside device-specific menus. The complexity of these interfaces hinder “ready-to-hand” use of high-level functions. Natural language offers a flexible, familiar interface for surgeons to express their desired outcome rather than remembering the steps necessary to achieve it, enabling direct access to task-aware, patient-specific C-arm functionality.

Methods

We present an English language voice interface for controlling a robotic X-ray imaging system with task-aware functions for pelvic trauma surgery. Our fully integrated system uses a large language model (LLM) to convert natural spoken commands into machine-readable instructions, enabling low-level commands like “Tilt back a bit,” to increase the angular tilt or patient-specific directions like, “Go to the obturator oblique view of the right ramus,” based on automated image analysis.

Results

We evaluate our system with 212 prompts provided by an attending physician, in which the system performed satisfactory actions 97% of the time. To test the fully integrated system, we conduct a real-time study in which an attending physician placed orthopedic hardware along desired trajectories through an anthropomorphic phantom, interacting solely with an X-ray system via voice.

Conclusion

Voice interfaces offer a convenient, flexible way for surgeons to manipulate C-arms based on desired outcomes rather than device-specific processes. As LLMs grow increasingly capable, so too will their applications in supporting higher-level interactions with surgical assistance systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Killeen BD, Gao C, Oguine KJ, Darcy S, Armand M, Taylor RH, Osgood G, Unberath M (2023) An autonomous X-ray image acquisition and interpretation system for assisting percutaneous pelvic fracture fixation. Int J CARS 18(7):1201–1208. https://doi.org/10.1007/s11548-023-02941-y

    Article  Google Scholar 

  2. Kausch L, Thomas S, Kunze H, Privalov M, Vetter S, Franke J, Mahnken AH, Maier-Hein L, Maier-Hein K (2020) Toward automatic C-arm positioning for standard projections in orthopedic surgery. Int J CARS 15(7):1095–1105. https://doi.org/10.1007/s11548-020-02204-0

    Article  Google Scholar 

  3. Hendrix G (1982) Natural-language interface. Am J Comput Linguist 8(2):56–61

    Google Scholar 

  4. Zhang C, Chen J, Li J, Peng Y, Mao Z (2023) Large language models for human–robot interaction: a review. Biomim Intell Robot 3(4):100131. https://doi.org/10.1016/j.birob.2023.100131

    Article  Google Scholar 

  5. Ye Y, You H, Du J (2023) Improved trust in human–robot collaboration with ChatGPT. IEEE Access 11:55748–55754. https://doi.org/10.1109/ACCESS.2023.3282111

    Article  Google Scholar 

  6. Driess D, Xia F, Sajjadi M.S.M, Lynch C, Chowdhery A, Ichter B, Wahid A, Tompson J, Vuong Q, Yu T, Huang W, Chebotar Y, Sermanet P, Duckworth D, Levine S, Vanhoucke V, Hausman K, Toussaint M, Greff K, Zeng A, Mordatch I, Florence P (2023) PaLM-E: an embodied multimodal language model. arXiv. https://doi.org/10.48550/arXiv.2303.03378arXiv:2303.03378

  7. Brown T.B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry, G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D.M, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. arXiv https://doi.org/10.48550/arXiv.2005.14165arXiv:2005.14165

  8. Radford A, Kim JW, Xu T, Brockman, G, McLeavey C, Sutskever I (2022) Robust speech recognition via large-scale weak supervision. arXiv (2022). https://doi.org/10.48550/arXiv.2212.04356arXiv:2212.04356

  9. OpenAI:GPT-4 Technical Report. arXiv (2023). https://doi.org/10.48550/arXiv.2303.08774arXiv:2303.08774

  10. Shridhar M, Manuelli L, Fox D (2021) CLIPort: what and where pathways for robotic manipulation. arXiv. https://doi.org/10.48550/arXiv.2109.12098arXiv:2109.12098

  11. Hundt A, Killeen B, Greene N, Wu H, Kwon H, Paxton C, Hager GD (2020) “Good Robot!’’: efficient reinforcement learning for multi-step visual tasks with sim to real transfer. IEEE Robot Autom Lett 5(4):6724–6731. https://doi.org/10.1109/LRA.2020.3015448

    Article  Google Scholar 

  12. Hundt A, Murali, A, Hubli P, Liu R, Gopalan N, Gombolay M, Hager GD (2022) Good robot! Now watch this!": repurposing reinforcement learning for task-to-task transfer. In: Conference on robot learning. PMLR, pp 1564–1574. https://proceedings.mlr.press/v164/hundt22a.html

  13. Tellex S, Gopalan N, Kress-Gazit H, Matuszek C (2020) Robots that use language. Annu Rev Control Robot Autonom Syst 3(1):25–55. https://doi.org/10.1146/annurev-control-101119-071628

    Article  Google Scholar 

  14. Lynch C, Wahid A, Tompson J, Ding T, Betker J, Baruch R, Armstrong T, Florence P (2023) Interactive language: talking to robots in real time. IEEE Robot Autom Lett 66:1–8. https://doi.org/10.1109/LRA.2023.3295255

    Article  Google Scholar 

  15. Hazlehurst B, Sittig DF, Stevens VJ, Smith KS, Hollis JF, Vogt TM, Winickoff JP, Glasgow R, Palen TE, Rigotti NA (2005) Natural language processing in the electronic medical record: assessing clinician adherence to tobacco treatment guidelines. Am J Prev Med 29(5):434–439. https://doi.org/10.1016/j.amepre.2005.08.007

    Article  PubMed  Google Scholar 

  16. Huang H, Zheng O, Wang D, Yin J, Wang Z, Ding S, Yin H, Xu C, Yang R, Zheng Q, Shi B (2023) ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model. Int J Oral Sci 15(29):1–13. https://doi.org/10.1038/s41368-023-00239-y

    Article  PubMed  PubMed Central  Google Scholar 

  17. Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW (2023) Large language models in medicine. Nat Med 29(8):1930–1940. https://doi.org/10.1038/s41591-023-02448-8

    Article  CAS  PubMed  Google Scholar 

  18. Meskó B, Topol EJ (2023) The imperative for regulatory oversight of large language models (orgenerative AI) in healthcare. npj Digit Med 6(120):1–6. https://doi.org/10.1038/s41746-023-00873-0

    Article  Google Scholar 

  19. Killeen BD, Cho SM, Armand M, Taylor RH, Unberath M (2023) In silico simulation: a key enabling technology for next-generation intelligent surgical systems. Prog Biomed Eng 5(3):032001. https://doi.org/10.1088/2516-1091/acd28b

  20. Bier B, Unberath M, Zaech J-N, Fotouhi J, Armand M, Osgood G, Navab N, Maier A (2018) X-ray-transform invariant anatomical landmark detection for pelvic trauma surgery. In: Medical image computing and computer assisted intervention—MICCAI 2018. Springer, Cham, Switzerland, pp 55–63. https://doi.org/10.1007/978-3-030-00937-3_7

  21. Liu W, Wang Y, Jiang T, Chi Y, Zhang L, Hua X-S (2020) Landmarks detection with anatomical constraints for total hip arthroplasty preoperative measurements. In: Medical image computing and computer assisted intervention—MICCAI 2020. Springer,Cham, Switzerland, pp 670–679. https://doi.org/10.1007/978-3-030-59719-1_65

  22. Gao C, Killeen BD, Hu Y, Grupp RB, Taylor RH, Armand M, Unberath M (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for X-ray image analysis. Nat Mach Intell 5(3):294–308. https://doi.org/10.1038/s42256-023-00629-1

    Article  PubMed  PubMed Central  Google Scholar 

  23. Kügler D, Sehring J, Stefanov A, Stenin I, Kristin J, Klenzner T, Schipper J, Mukhopadhyay A (2020) i3PosNet: instrument pose estimation from X-ray in temporal bone surgery. Int J CARS 15(7):1137–1145. https://doi.org/10.1007/s11548-020-02157-4

    Article  Google Scholar 

  24. Killeen BD, Chakraborty S, Osgood G , Unberath M (2022) Toward perception-based anticipation of cortical breach during K-wire fixation of the pelvis. In: Proceedings Volume 12031, medical imaging 2022: physics of medical imaging. SPIE, pp 410–415. https://doi.org/10.1117/12.2612989

  25. Killeen BD, Zhang H, Mangulabnan J, Armand M, Taylor RH, Osgood G, Unberath M (2023) Pelphix: surgical phase recognition from X-ray images in percutaneous pelvic fixation. arXiv. https://doi.org/10.48550/arXiv.2304.09285arXiv:2304.09285

  26. Arbogast N, Kurzendorfer T, Breininger K, Mountney P, Toth D, Narayan SA, Maier A (2019) Workflow phase detection in fluoroscopic images using convolutional neural networks. In: Bildverarbeitung Fr die Medizin 2019. Springer, Wiesbaden, Germany, pp 191–196. https://doi.org/10.1007/978-3-658-25326-4_41

  27. Kausch L, Thomas S, Kunze H, Norajitra T, Klein A, El Barbari JS, Privalov M, Vetter S, Mahnken A, Maier-Hein L, Maier-Hein KH (2021) C-arm positioning for spinal standard projections in different intra-operative settings. In: Medical image computing and computer assisted intervention—MICCAI 2021. Springer, Cham, Switzerland, pp 352–362. https://doi.org/10.1007/978-3-030-87202-1_34

  28. Grupp RB, Unberath M, Gao C, Hegeman RA, Murphy RJ, Alexander CP, Otake Y, McArthur BA, Armand M, Taylor RH (2020) Automatic annotation of hip anatomy in fluoroscopy for robust and efficient 2D/3D registration. Int J Comput Assist Radiol Surg 15(5):759–769. https://doi.org/10.1007/s11548-020-02162-7. arXiv:3233.3361

    Article  PubMed  PubMed Central  Google Scholar 

  29. Seshamani S, Chintalapani G, Taylor R (2011) Iterative refinement of point correspondences for 3D statistical shape models. In: Medical image computing and computer-assisted intervention—MICCAI 2011. Springer, Berlin, Germany, pp 417–425. https://doi.org/10.1007/978-3-642-23629-7_51

  30. Cámbara G, López F, Bonet D, Gómez P, Segura C, Farrús M, Luque J (2022) TASE: task-aware speech enhancement for wake-up word detection in voice assistants. Appl Sci 12(4):1974. https://doi.org/10.3390/app12041974

    Article  CAS  Google Scholar 

  31. Bender EM, Gebru, T, McMillan-Major A, Shmitchell S (2021) On the dangers of stochastic parrots: Can language models be too big? xn–st9h. In: FAccT’21: proceedings of the 2021 ACM conference on fairness, accountability, and transparency. Association for Computing Machinery, New York, NY, USA, pp 610–623. https://doi.org/10.1145/3442188.3445922

  32. Mialon G, Dessì R, Lomeli M, Nalmpantis C, Pasunuru R, Raileanu R, Rozière B, Schick T, Dwivedi-Yu J, Celikyilmaz A, Grave E, LeCun Y, Scialom T (2023) Augmented language models: a survey. arXiv. https://doi.org/10.48550/arXiv.2302.07842. arXiv:2302.07842

  33. Semnani S, Yao V, Zhang H, Lam M (2023) WikiChat: stopping the hallucination of large language model chatbots by few-shot grounding on Wikipedia. ACL Anthol. https://doi.org/10.18653/v1/2023.findings-emnlp.157

  34. Sloos M, Ariza García A, Andersson A, Neijmeijer M (2019) Accent-induced bias in linguistic transcriptions. Lang Sci 76:101176. https://doi.org/10.1016/j.langsci.2018.06.002

  35. Huang L, Yu W, Ma W, Zhong W, Feng Z, Wang H, Chen Q, Peng W, Feng X, Qin B, Liu T (2023) A survey on hallucination in large language models: principles,taxonomy, challenges, and open questions. arXiv. https://doi.org/10.48550/arXiv.2311.05232arXiv:2311.05232

  36. Chen M, Nikolaidis S, Soh H, Hsu D, Srinivasa S (2020) Trust-aware decision making for human–robot collaboration: model learning and planning. J Hum–Robot Interact 9(2):1–23. https://doi.org/10.1145/3359616

    Article  CAS  Google Scholar 

  37. Cuadra A, Li S, Lee H, Cho, J, Ju W (2021) My bad! Repairing intelligent voice assistant errors improves interaction. Proc ACM Hum–Comput Interact 5(CSCW1):1–24. https://doi.org/10.1145/3449101

Download references

Funding

This work was supported by the Link Foundation Fellowship for Modeling, Training, and Simulation; the NIH under Grant No. R21EB028505, the NSF under Award No. 2239077, and Johns Hopkins University Internal Funds.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin D. Killeen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

This article does not contain patient data collected by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Full instructions

Appendix: Full instructions

The following instructions are provided to the LLM in every interaction.

figure b

Each interaction also includes a set of 35 example episodes, such as:

figure c

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Killeen, B.D., Chaudhary, S., Osgood, G. et al. Take a shot! Natural language control of intelligent robotic X-ray systems in surgery. Int J CARS (2024). https://doi.org/10.1007/s11548-024-03120-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11548-024-03120-3

Keywords

Navigation