Abstract
Telehealth helps to facilitate access to medical professionals by enabling remote medical services for the patients. These services have become gradually popular over the years with the advent of necessary technological infrastructure. The benefits of telehealth have been even more apparent since the beginning of the COVID-19 crisis, as people have become less inclined to visit doctors in person during the pandemic. In this paper, we focus on facilitating chat sessions between a doctor and a patient. We note that the quality and efficiency of the chat experience can be critical as the demand for telehealth services increases. Accordingly, we develop a smart auto-response generation mechanism for medical conversations that helps doctors respond to consultation requests efficiently, particularly during busy sessions. We explore over 900,000 anonymous, historical online messages between doctors and patients collected over 9 months. We implement clustering algorithms to identify the most frequent responses by doctors and manually label the data accordingly. We then train machine learning algorithms using this preprocessed data to generate the responses. The considered algorithm has two steps: a filtering (i.e., triggering) model to filter out infeasible patient messages and a response generator to suggest the top-3 doctor responses for the ones that successfully pass the triggering phase. Among the models utilized, BERT provides an accuracy of 85.41% for precision@3 and shows robustness to its parameters.









Similar content being viewed by others
References
Charlton G (2013) Consumers prefer live chat for customer service: stats https://econsultancy.com/consumers-prefer-live-chat-for-customer-service-stats/
AAMC (2019) Physician Supply and Demand. A 15-Year Outlook: Key Findings. https://www.aamc.org/media/45976/download
Hawkins M (2017) Survey of physician appointment wait times and medicare and medicaid acceptance rates. https://www.aristamd.com/wp-content/uploads/2018/11/mha2017waittimesurveyPDF-1.pdf
Mehrotra A, Chernew M, Linetsky D, Hatch H, Cutler D (2020) The impact of the COVID-19 pandemic on outpatient visits: a rebound emerges https://www.commonwealthfund.org/publications/2020/apr/impact-covid-19-outpatient-visits
Epstein H-AB (2020) Texting thumb. J Hosp Librariansh 20 (1):82–86
Kannan A, Kurach K, Ravi S, Kaufmann T, Tomkins A, Miklos B, Corrado G, Lukacs L, Ganea M, Young P et al (2016) Smart reply: automated response suggestion for email. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 955–964
Weng Y, Zheng H, Bell F, Tur G (2019) OCC: a smart reply system for efficient in-app communications. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 2596–2603
Galke L, Gerstenkorn G, Scherp A (2018) A case study of closed-domain response suggestion with limited training data. In: International conference on database and expert systems applications. Springer, pp 218–229
Zhou L, Gao J, Li D, Shum H-Y (2020) The design and implementation of XiaoIce, an empathetic social Chatbot. Comput Ling 46(1):53–93
Yan R (2018) Chitty-Chitty-Chat Bot: deep learning for conversational AI. In: IJCAI, vol 18, pp 5520–5526
Yan R, Zhao D, W E (2017) Joint learning of response ranking and next utterance suggestion in human-computer conversation system. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, pp 685–694
Yan R, Zhao D (2018) Coupled context modeling for deep chit-chat: towards conversations between human and computer. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 2574–2583
Li R, Jiang J-Y, Ju CJ-T, Flynn C, Hsu W-l, Wang J, Wang W, Xu T (2018) Enhancing response generation using chat flow identification. In: KDD’18: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1–6
Kim J-G, Wu C-W, Chiang A, Ko J, Lee S-J (2016) A picture is worth a thousand words: improving mobile messaging with real-time autonomous image suggestion. In: Proceedings of the 17th international workshop on mobile computing systems and applications, HotMobile ’16. ISBN 9781450341455. Association for Computing Machinery, New York, pp 51–56
Jain M, Kumar P, Kota R, Patel SN (2018) Evaluating and informing the design of chatbots. In: Proceedings of the 2018 designing interactive systems conference, pp 895–906
Lee S-C, Song J, Ko E-Y, Park S, Kim J, Kim J (2020) SolutionChat: real-time moderator support for chat-based structured discussion. In: Proceedings of the 2020 CHI conference on human factors in computing systems, CHI ’20. ISBN 9781450367080. Association for Computing Machinery, New York, pp 1–12
Hamet P, Tremblay J (2017) Artificial intelligence in medicine. Metabolism 69:S36–S40
Topol EJ (2019) High-performance medicine: the convergence of human and artificial intelligence. Nat Med 25(1):44–56
He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K (2019) The practical implementation of artificial intelligence technologies in medicine. Nat Med 25(1):30–36
Tudor Car L, Dhinagaran DA, Kyaw BM, Kowatsch T, Joty S, Theng Y-L, Atun R (2020) Conversational agents in health care: scoping review and conceptual analysis. J Med Internet Res 22(8):e17158. ISSN 1438-8871
Oh K-J, Lee D, Ko B, Choi H-J (2017) A chatbot for psychiatric counseling in mental healthcare service based on emotional dialogue analysis and sentence generation. In: 2017 18th IEEE International conference on mobile data management (MDM). IEEE, pp 371–375
Kowatsch T, Nißen M, Shih C-HI, Rüegger D, Volland D, Filler A, Künzler F, Barata F, Büchter D, Brogle B, Heldt K, Gindrat P, Farpour-Lambert N, l’Allemand D (2017) Text-based healthcare Chatbots supporting patient and health professional teams: preliminary results of a randomized controlled trial on childhood obesity. In: Persuasive embodied agents for behavior change (PEACH2017) Workshop, co-located with the 17th international conference on intelligent virtual agents (IVA 2017), pp 1–10
Cuffy C, Hagiwara N, Vrana S, McInnes BT (2020) Measuring the quality of patient–physician communication. J Biomed Inform 112:103589. ISSN 1532-0464
Davenport T, Kalakota R (2019) The potential for artificial intelligence in healthcare. Fut Healthcare J 6(2):94
Hancock JT, Naaman M, Levy K (2020) AI-mediated communication: definition, research agenda, and ethical considerations. J Comput-Mediated Commun 25(1):89–100. ISSN 1083-6101
Nadarzynski T, Miles O, Cowie A, Ridge D (2019) Acceptability of artificial intelligence (AI)-led chatbot services in healthcare: a mixed-methods study. Digit Health 5:2055207619871808
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16. ISBN 9781450342322. Association for Computing Machinery, New York, pp 785–794
Stein RA, Jaques PA, Valiati JF (2019) An analysis of hierarchical text classification using word embeddings. Inform Sci 471:216–232. ISSN 0020-0255
Zhao J, Lan M, Tian JF (2015) ECNU: using traditional similarity measurements and word embedding for semantic textual similarity estimation. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015). Association for Computational Linguistics, Denver, pp 117–122
Chen Q, Sokolova M (2021) Specialists, scientists, and sentiments: Word2Vec and Doc2Vec in analysis of scientific and medical texts. SN Comput Sci 2(5):1–11
Shao Y, Taylor S, Marshall N, Morioka C, Zeng-Treitler Q (2018) Clinical text classification with word embedding features vs. bag-of-words features. In: 2018 IEEE International conference on big data (big data), pp 2874–2878. https://doi.org/10.1109/BigData.2018.8622345
Hughes M, Li I, Kotoulas S, Suzumura T (2017) Medical text classification using convolutional neural networks. In: Informatics for health: connected citizen-led wellness and population health. IOS Press, pp 246–250
Zhu W, Zhang W, Li G-Z, He C, Zhang L (2016) A study of damp-heat syndrome classification using Word2vec and TF-IDF. In: 2016 IEEE International conference on bioinformatics and biomedicine (BIBM), pp 1415–1420. https://doi.org/10.1109/BIBM.2016.7822730
Qi Z (2020) The text classification of theft crime based on TF-IDF and XGBoost model. In: 2020 IEEE International conference on artificial intelligence and computer applications (ICAICA), 1241–1246
Hartmann J, Huppertz J, Schamp C, Heitmann M (2019) Comparing automated text classification methods. Int J Res Market 36(1):20–38. ISSN 0167-8116
Günal S (2011) Hybrid feature selection for text classification
Ren F, Sohrab MG (2013) Class-indexing-based term weighting for automatic text classification. Inform Sci 236:109–125. ISSN 0020-0255
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Machine learning: ECML-98. ISBN 978-3-540-69781-7. Springer, Berlin, pp 137–142
Du J, Vong CM, Chen CLP (2020) Novel efficient RNN and LSTM-like architectures: recurrent and gated broad learning systems and their applications for text classification. Recurrent and Gated Broad Learning Systems and Their Applications for Text Classification. IEEE Trans Cybern, 1–12
Liu G, Guo J (2019) Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 337:325–338. ISSN 0925-2312
Sachan DS, Zaheer M, Salakhutdinov R (2019) Revisiting LSTM networks for semi-supervised text classification via mixed objective function. Proc AAAI Conf Artif Intell 33(01):6940–6948
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ (eds) Advances in neural information processing systems, vol 27. Curran Associates, Inc., pp 3104–3112
Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, pp 1412–1421
Kumar A, Vembu S, Menon AK, Elkan C (2013) Beam search algorithms for multilabel learning. Mach Learn 92(1):65–89
Devlin J, Chang M-W, Lee K, Toutanova K Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Kalyan KS, Sangeetha S (2021) BertMCN: mapping colloquial phrases to standard medical concepts using BERT and highway network. Artif Intell Med 112:102008. ISSN 0933-3657
Ameri K, Hempel M, Sharif H, Lopez J Jr, Perumalla K (2021) CyBERT: cybersecurity claim classification by fine-tuning the BERT language model. J Cybersecur Privacy 1(4):615–637. ISSN 2624-800X
Bataa E, Wu J (2019) An investigation of transfer learning-based sentiment analysis in Japanese. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, pp 4652–4657
Zahra El-Alami F, Ouatik El Alaoui S, En Nahnahi N Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization. Journal of King Saud University - Computer and Information Sciences ISSN 1319-1578
Wang Y, Hou Y, Che W, Liu T (2020) From static to dynamic word representations: a survey. Int J Mach Learn Cybern, 1–20
Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, Naumann T, Gao J, Poon H Domain-specific language model pretraining for biomedical natural language processing. arXiv:2007.15779
Hernández-Orallo J, Flach P, Ferri C (2012) A unified view of performance metrics: translating threshold choice into expected classification loss. J Mach Learn Res 13(91):2813–2869
Acknowledgements
The authors would like to thank Your Doctors Online for funding and supporting this research. This work was also funded and supported by Mitacs through the Mitacs Accelerate Program. The authors would also like to thank Gagandip Chane for his help with the data labeling.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: . Finding the Best Pipeline
Appendix: . Finding the Best Pipeline
In this section, we discuss the evaluation results of the end-to-end pipelines obtained by combining different models for triggering and response generation. Table 4 demonstrates the performance of various combinations of the triggering and response generation models. We use Precision@3 as the representative performance metric as we consider top-3 proposed responses in a generic chat application. These results show that using LSTM for triggering and BERT for response generation outperforms other combinations in terms of average Precision@3 value of 85.58%. We observe that using BERT for both phases leads to a very similar performance with an average Precision@3 value of 85.42%, coming second among all the tested combinations. The higher performance for using LSTM in the first phase can be attributed to the particularly good performance of LSTM for the triggering task. That is, LSTM outperforms BERT in three performance metrics out of seven that we report in Table 2, with precision values for the feasible messages exceeding that of the BERT model by 2.55% on average. We also find that rule-based approaches and their combinations have significantly lower performance, pointing to the benefits of employing machine learning algorithms to create this end-to-end pipeline.
Rights and permissions
About this article
Cite this article
Jahanshahi, H., Kazmi, S. & Cevik, M. Auto Response Generation in Online Medical Chat Services. J Healthc Inform Res 6, 344–374 (2022). https://doi.org/10.1007/s41666-022-00118-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41666-022-00118-x