skip to main content
10.1145/3490099.3511112acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article
Open Access

Towards Efficient Annotations for a Human-AI Collaborative, Clinical Decision Support System: A Case Study on Physical Stroke Rehabilitation Assessment

Authors Info & Claims
Published:22 March 2022Publication History

ABSTRACT

Artificial intelligence (AI) and machine learning (ML) algorithms are increasingly being explored to support various decision-making tasks in health (e.g. rehabilitation assessment). However, the development of such AI/ML-based decision support systems is challenging due to the expensive process to collect an annotated dataset. In this paper, we describe the development process of a human-AI collaborative, clinical decision support system that augments an ML model with a rule-based (RB) model from domain experts. We conducted its empirical evaluation in the context of assessing physical stroke rehabilitation with the dataset of three exercises from 15 post-stroke survivors and therapists. Our results bring new insights on the efficient development and annotations of a decision support system: when an annotated dataset is not available initially, the RB model can be used to assess post-stroke survivor’s quality of motion and identify samples with low confidence scores to support efficient annotations for training an ML model. Specifically, our system requires only 22 - 33% of annotations from therapists to train an ML model that achieves equally good performance with an ML model with all annotations from a therapist. Our work discusses the values of a human-AI collaborative approach for effectively collecting an annotated dataset and supporting a complex decision-making task.

References

  1. Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105–120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 3.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. PK Anooj. 2012. Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules. Journal of King Saud University-Computer and Information Sciences 24, 1(2012), 27–40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2019. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2(2019), 423–443.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mark T Bayley, Amanda Hurdowar, Carol L Richards, Nicol Korner-Bitensky, Sharon Wood-Dauphinee, Janice J Eng, Marilyn McKay-Lyons, Edward Harrison, Robert Teasell, Margaret Harrison, 2012. Barriers to implementation of stroke rehabilitation evidence: findings from a multi-site pilot project. Disability and rehabilitation 34, 19 (2012), 1633–1638.Google ScholarGoogle Scholar
  6. Emma Beede, Elizabeth Baylor, Fred Hersch, Anna Iurchenko, Lauren Wilcox, Paisan Ruamviboonsuk, and Laura M Vardoulakis. 2020. A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Edmon Begoli, Tanmoy Bhattacharya, and Dimitri Kusnezov. 2019. The need for uncertainty quantification in machine-assisted medical decision making. Nature Machine Intelligence 1, 1 (2019), 20–23.Google ScholarGoogle ScholarCross RefCross Ref
  8. Or Biran and Courtenay Cotton. 2017. Explanation and justification in machine learning: A survey. In IJCAI-17 workshop on explainable AI (XAI), Vol. 8. 1.Google ScholarGoogle Scholar
  9. Norbert Buch, Sergio A Velastin, and James Orwell. 2011. A review of computer vision techniques for the analysis of urban traffic. IEEE Transactions on intelligent transportation systems 12, 3(2011), 920–939.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bruce G Buchanan and Richard O Duda. 1983. Principles of rule-based expert systems. In Advances in computers. Vol. 22. Elsevier, 163–216.Google ScholarGoogle Scholar
  11. Federico Cabitza, Raffaele Rasoini, and Gian Franco Gensini. 2017. Unintended consequences of machine learning in medicine. Jama 318, 6 (2017), 517–518.Google ScholarGoogle ScholarCross RefCross Ref
  12. Carrie J Cai, Emily Reif, Narayan Hegde, Jason Hipp, Been Kim, Daniel Smilkov, Martin Wattenberg, Fernanda Viegas, Greg S Corrado, Martin C Stumpe, 2019. Human-centered tools for coping with imperfect algorithms during medical decision-making. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Carrie J Cai, Samantha Winter, David Steiner, Lauren Wilcox, and Michael Terry. 2019. ” Hello AI”: Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019), 1–24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 1721–1730.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Po-Hsuan Cameron Chen, Yun Liu, and Lily Peng. 2019. How to develop machine learning models for healthcare. Nature materials 18, 5 (2019), 410.Google ScholarGoogle Scholar
  16. Samarjit Das, Laura Trutoiu, Akihiko Murai, Dunbar Alcindor, Michael Oh, Fernando De la Torre, and Jessica Hodgins. 2011. Quantitative measurement of motor symptoms in Parkinson’s disease: A study with full-body motion capture data. In 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 6789–6792.Google ScholarGoogle ScholarCross RefCross Ref
  17. Maria De-Arteaga, Riccardo Fogliato, and Alexandra Chouldechova. 2020. A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous Algorithmic Scores. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Maria De-Arteaga, Alexey Romanov, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, and Adam Tauman Kalai. 2019. Bias in bios: A case study of semantic representation bias in a high-stakes setting. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 120–128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. TT Dhivyaprabha, P Subashini, and Marimuthu Krishnaveni. 2016. Computational intelligence based machine learning methods for rule-based reasoning in computer vision applications. In 2016 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 1–8.Google ScholarGoogle ScholarCross RefCross Ref
  20. Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and Sebastian Thrun. 2017. Dermatologist-level classification of skin cancer with deep neural networks. nature 542, 7639 (2017), 115–118.Google ScholarGoogle Scholar
  21. Ben Green and Yiling Chen. 2019. The principles and limits of algorithm-in-the-loop decision making. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019), 1–24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Eren Gultepe, Jeffrey P Green, Hien Nguyen, Jason Adams, Timothy Albertson, and Ilias Tagkopoulos. 2014. From vital signs to clinical outcomes for patients with sepsis: a machine learning basis for a clinical decision support system. Journal of the American Medical Informatics Association 21, 2(2014), 315–325.Google ScholarGoogle ScholarCross RefCross Ref
  23. Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. 2017. On calibration of modern neural networks. In International Conference on Machine Learning. PMLR, 1321–1330.Google ScholarGoogle Scholar
  24. Jatinder ND Gupta, Guisseppi A Forgionne, and Manuel Mora. 2007. Intelligent decision-making support systems: foundations, applications and challenges. Springer Science & Business Media.Google ScholarGoogle Scholar
  25. Fred Hohman, Minsuk Kahng, Robert Pienta, and Duen Horng Chau. 2018. Visual analytics in deep learning: An interrogative survey for the next frontiers. IEEE transactions on visualization and computer graphics 25, 8(2018), 2674–2693.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Fred Hohman, Kanit Wongsuphasawat, Mary Beth Kery, and Kayur Patel. 2020. Understanding and visualizing data iteration in machine learning. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Mark Jones, Karen Grimmer, Ian Edwards, Joy Higgs, and Franziska Trede. 2006. Challenges in applying best evidence to physiotherapy. Internet Journal of Allied Health Sciences and Practice 4, 3(2006), 11.Google ScholarGoogle Scholar
  28. Mayank Kabra, Alice A Robie, Marta Rivera-Alba, Steven Branson, and Kristin Branson. 2013. JAABA: interactive machine learning for automatic annotation of animal behavior. Nature methods 10, 1 (2013), 64–67.Google ScholarGoogle Scholar
  29. Ashish Kapoor, Bongshin Lee, Desney Tan, and Eric Horvitz. 2010. Interactive optimization for steering machine classification. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1343–1352.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Danielle Leah Kehl and Samuel Ari Kessler. 2017. Algorithms in the criminal justice system: Assessing the use of risk assessments in sentencing. (2017).Google ScholarGoogle Scholar
  31. Saif Khairat, David Marc, William Crosby, and Ali Al Sanousi. 2018. Reasons for physicians not adopting clinical decision support systems: critical analysis. JMIR medical informatics 6, 2 (2018), e24.Google ScholarGoogle Scholar
  32. Bongjun Kim and Bryan Pardo. 2018. A human-in-the-loop system for sound event detection and annotation. ACM Transactions on Interactive Intelligent Systems (TiiS) 8, 2(2018), 1–23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Been Kim, Julie A Shah, and Finale Doshi-Velez. 2015. Mind the gap: A generative approach to interpretable feature selection and extraction. In Advances in Neural Information Processing Systems. 2260–2268.Google ScholarGoogle Scholar
  34. Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. 2018. Human decisions and machine predictions. The quarterly journal of economics 133, 1 (2018), 237–293.Google ScholarGoogle Scholar
  35. Jan-Christoph Klie, Michael Bugert, Beto Boullosa, Richard Eckart de Castilho, and Iryna Gurevych. 2018. The inception platform: Machine-assisted and knowledge-oriented interactive annotation. In Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations. 5–9.Google ScholarGoogle Scholar
  36. Amanda Kube, Sanmay Das, and Patrick J Fowler. 2019. Allocating interventions based on predicted outcomes: A case study on homelessness services. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 622–629.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Todd Kulesza, Margaret Burnett, Weng-Keen Wong, and Simone Stumpf. 2015. Principles of explanatory debugging to personalize interactive machine learning. In Proceedings of the 20th international conference on intelligent user interfaces. 126–137.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Peter Langhorne, Julie Bernhardt, and Gert Kwakkel. 2011. Stroke rehabilitation. The Lancet 377, 9778 (2011), 1693–1702.Google ScholarGoogle Scholar
  39. Min Hun Lee, Daniel P Siewiorek, Asim Smailagic, Alexandre Bernadino, 2019. Learning to assess the quality of stroke rehabilitation exercises. In Proceedings of the 24th International Conference on intelligent user interfaces. ACM, 218–228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Min Hun Lee, Daniel P Siewiorek, Asim Smailagic, Alexandre Bernardino, and Sergi Bermúdez i Badia. 2020. Co-Design and Evaluation of an Intelligent Decision Support System for Stroke Rehabilitation Assessment. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2(2020), 1–27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Min Hun Lee, Daniel P Siewiorek, Asim Smailagic, Alexandre Bernardino, and Sergi Bermúdez i Badia. 2020. An exploratory study on techniques for quantitative assessment of stroke rehabilitation exercises. In Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization. 303–307.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Min Hun Lee, Daniel P. Siewiorek, Asim Smailagic, Alexandre Bernardino, and Sergi Bermúdez i Badia. 2021. A Human-AI Collaborative Approach for Clinical Decision Making on Rehabilitation Assessment. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Benjamin Letham, Cynthia Rudin, Tyler H McCormick, and David Madigan. 2015. Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model. The Annals of Applied Statistics 9, 3 (2015), 1350–1371.Google ScholarGoogle ScholarCross RefCross Ref
  44. Mingkun Li and Ishwar K Sethi. 2006. Confidence-based active learning. IEEE transactions on pattern analysis and machine intelligence 28, 8(2006), 1251–1261.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Andrew F Long, Rosie Kneafsey, and Julia Ryan. 2003. Rehabilitation practice: challenges to effective team working. International journal of nursing studies 40, 6 (2003), 663–673.Google ScholarGoogle ScholarCross RefCross Ref
  46. Alexandru Niculescu-Mizil and Rich Caruana. 2005. Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning. 625–632.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Susan B O’Sullivan, Thomas J Schmitz, and George Fulk. 2019. Physical rehabilitation. FA Davis.Google ScholarGoogle Scholar
  48. Madhuri Panwar, Dwaipayan Biswas, Harsh Bajaj, Michael Jöbges, Ruth Turk, Koushik Maharatna, and Amit Acharyya. 2019. Rehab-Net: Deep Learning Framework for Arm Movement Classification Using Wearable Sensors for Stroke Rehabilitation. IEEE Transactions on Biomedical Engineering 66, 11 (2019), 3026–3037.Google ScholarGoogle ScholarCross RefCross Ref
  49. Nathan Peiffer-Smadja, Timothy Miles Rawson, Raheelah Ahmad, Albert Buchard, P Georgiou, F-X Lescure, Gabriel Birgand, and Alison Helen Holmes. 2020. Machine learning for clinical decision support in infectious diseases: a narrative review of current applications. Clinical Microbiology and Infection 26, 5 (2020), 584–595.Google ScholarGoogle ScholarCross RefCross Ref
  50. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 1135–1144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Brandon Rohrer, Susan Fasoli, Hermano Igo Krebs, Richard Hughes, Bruce Volpe, Walter R Frontera, Joel Stein, and Neville Hogan. 2002. Movement smoothness changes during stroke recovery. Journal of Neuroscience 22, 18 (2002), 8297–8304.Google ScholarGoogle ScholarCross RefCross Ref
  52. Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206–215.Google ScholarGoogle ScholarCross RefCross Ref
  53. Saima Safdar, Saad Zafar, Nadeem Zafar, and Naurin Farooq Khan. 2018. Machine learning based decision support systems (DSS) for heart disease diagnosis: a review. Artificial Intelligence Review 50, 4 (2018), 597–623.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Mark Sendak, Madeleine Clare Elish, Michael Gao, Joseph Futoma, William Ratliff, Marshall Nichols, Armando Bedoya, Suresh Balu, and Cara O’Brien. 2020. ” The human body is a black box” supporting clinical decision-making with deep learning. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 99–109.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Emily Seto, Kevin J Leonard, Joseph A Cafazzo, Jan Barnsley, Caterina Masino, and Heather J Ross. 2012. Developing healthcare rule-based expert systems: case study of a heart failure telemonitoring system. International journal of medical informatics 81, 8(2012), 556–565.Google ScholarGoogle Scholar
  56. Philip J Smith, Norman D Geddes, and Roger Beatty. 2009. Human-centered design of decision-support systems. In Human-Computer Interaction. CRC Press, 263–292.Google ScholarGoogle Scholar
  57. Katherine J Sullivan, Julie K Tilson, Steven Y Cen, Dorian K Rose, Julie Hershberg, Anita Correa, Joann Gallichio, Molly McLeod, Craig Moore, Samuel S Wu, 2011. Fugl-Meyer assessment of sensorimotor function after stroke: standardized training procedure for clinical practice and clinical trials. Stroke 42, 2 (2011), 427–432.Google ScholarGoogle ScholarCross RefCross Ref
  58. Edward Taub, David M Morris, Jean Crago, Danna Kay King, Mary Bowman, Camille Bryson, Staci Bishop, Sonya Pearson, and Sharon E Shaw. 2011. Wolf motor function test (WMFT) manual. Birmingham: University of Alabama, CI Therapy Research Group (2011).Google ScholarGoogle Scholar
  59. David Webster and Ozkan Celik. 2014. Systematic review of Kinect applications in elderly care and stroke rehabilitation. Journal of neuroengineering and rehabilitation 11, 1(2014), 108.Google ScholarGoogle ScholarCross RefCross Ref
  60. Qian Yang, Aaron Steinfeld, and John Zimmerman. 2019. Unremarkable AI: Fitting Intelligent Decision Support into Critical, Clinical Decision-Making Processes. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 238.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Bianca Zadrozny and Charles Elkan. 2002. Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 694–699.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards Efficient Annotations for a Human-AI Collaborative, Clinical Decision Support System: A Case Study on Physical Stroke Rehabilitation Assessment
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader

              HTML Format

              View this article in HTML Format .

              View HTML Format