Recurrent convolutional neural network based multimodal disease risk prediction

https://doi.org/10.1016/j.future.2018.09.031Get rights and content

Highlights

  • We propose a new MD-RCNN for disease risk prediction.

  • We propose a feature fusion scheme based on DBN.

  • We experiment with the medical big data of a Chinese grade-A hospital of second class.

Abstract

With the rapid growth of biomedical and healthcare data, machine learning methods are used in more and more work to predict disease risk. However, most works use single-mode data to predict disease risk and only few works use multimodal data to predict disease risk. Thus, a new multimodal data-based recurrent convolutional neural network (MD-RCNN) for disease risk prediction is proposed. This model not only can use patient’s structured data and text data, but also can extract structured and unstructured features in fine-grained. Furthermore, in order to obtain the highly non-linear relationships between structured data and unstructured data, we use deep belief network (DBN)to fuse the features. Finally, we experiment with the medical big data of a Chinese two grade hospital during 2013–2015. Experimental results show that the accuracy of MD-RCNN algorithm can reaches 96% and outperforms several state-of-the-art methods.

Introduction

According to McKinsey Report [1], half of Americans suffer from chronic diseases. Furthermore, America spends 27 thousand billion dollars on chronic diseases every year. That accounts for 18% of the annual GDP of America [2]. This situation is also common in other countries. For example, in china, the rate of death rises continuously, accounting for 86.6% of total number of dead people [3]. Therefore, it is crucial to predict the risk of chronic diseases [4].

With the development of deep learning [5] and wearable computing [6], [7], more and more researches predict disease risks from the perspective of big data analysis [8], [9]. Especially, electronic medical record (EHR) [10] becomes more and more convenient. The statistical information [11], detection result and medical history of patients are all recorded in EHR [12]. With some advanced information fusion mechanisms based on distributed systems like [13], data collected from different types of medical sensors and wearables can be used to enhance the information accuracy [14]. Therefore, the potential opportunity centering on data is offered for us to research medical treatment cases and decrease expenditure. For the risk prediction of chronic diseases, many researchers improve the accuracy of risk classification by automatically extracting features from a large amount of structured data [15], rather than the previously selected characteristics [16], [17]. Such as, Yang et al. [18] introduced a multi-label learning algorithm based on convolutional neural network (CNN). Under this algorithm, features can be extracted automatically from EHR for diagnosing diseases.

Furthermore, some research tried to use unstructured text data to predict the risk of chronic diseases [19], [20]. Weng et al. [21] used machine learning technology based on natural language processingto extract features for classification of clinical diagnosis. Jonnagaddala et al. [22] used unstructured electronic medical treatment record to operate text mining for predicting the risk of coronary artery disease. Framingham risk scoring was used to evaluate and calculate the risk of coronary heart disease in ten years, and imputation strategy was used to compensate the lost information. Khalifa et al. [23] suggested using clinical record to evaluate cardiovascular danger factors. However, most existing disease risk assessment schemes only considered structured data or unstructured data [24]. Few works considered multimodal disease risk assessment. For example, Chen et al. [25] propose multimodal disease risk prediction algorithm based on CNN (CNN-MDRP). The result indicated that the combination of structured data and unstructured data could improve the accuracy of model comparing with the only use of text data.

However, existing methods still has the following two problems: (i) As for extracting the features of medical text data, the existing most schemes do not consider context information of text. Therefore, the fine-grained feature of text cannot be extracted [26]. (ii) As for fusing the features of multimodal data, most existing fusion schemes do not consider the highly non-linear relationships among different modes of data [27]. Therefore, the degree of fusion is low.

In order to solve these challenges, we process the structured data and unstructured data of cerebral infarction patients to predict the patients’ disease risk. In details, we propose MD-RCNN for disease risk prediction. We first use RCNN to process unstructured text data to get the textual features of patients’ disease risk. At the same time, referring to the data of patients’ physical examination, the structured features of patients can be got, such as patients’ stature, weight, age, gender, disease and blood type. Then, the deep belief network (DBN) is used to combine structured features and unstructured features deeply. Finally, patients’ disease risk can be predicted. The main contributions of the paper are summarized as follows.

  • We propose a MD-RCNN algorithm based on the structured and unstructured data. The RCNN is used to effectively extract fine-grained features of unstructured textual data.

  • In order to deal with the highly nonlinear relationship between multimodal data and better fuse the features of structured data and unstructured text data, we propose a feature fusion scheme based on DBN.

  • We experiment with the medical big data of a Chinese grade-A hospital of second class. Experimental results show that the proposed MD-RCNN algorithm is better than other prediction algorithms. The accuracy of MD-RCNN can reach 96%.

The rest of the paper is organized as follow. The framework of disease risk prediction is given in Section 2. The MD-RCNN for disease risk prediction model is describe in Section 3. Our result and discussions are given in Section 4. Finally, Section 5 concludes this paper.

Section snippets

Disease risk prediction framework

In this section, we give the framework of disease risk prediction, as shown in Fig. 1. The framework contains the following two aspects: presentation of patients’ unstructured textual data and MD-RCNN algorithm. In details, (i) for the structured data which include patients’ physical examination data and assay data, relevant data is extracted from data set according to the discussion with doctors. Then supplement missing data and make correlation analysis to look for the relation among data.

Multimodal data-based RCNN for disease risk prediction

In this section, we propose the MD-RCNN algorithm for disease risk prediction. Specially, we first introduce RCNN algorithm for processing unstructured medical textual data. Then, we give the detailed procedures of MD-RCNN algorithm.

Experimental results

In this section, we give experimental results. We run the MD-RCNN in data center containing 84core CPU and 336G RAM. We give the experiment analysis from datasets, evaluation methods and experimental results.

Conclusions

In this paper, we predict risk of disease aiming at cerebral infarction and propose MD-RCNN model using collected patients’ structured data and unstructured data. This algorithm can not only use RCNN to extract the features of unstructured textual data in fine-grained but also use DBN to fuse structured features and unstructured features. The non-linear relationship between the two kinds of data can be well reflected. Furthermore, we experiment with the healthcare big data of a Chinese grade-A

Acknowledgment

The authors extend their appreciation to the Deanship of Scientific Research at King Saud University, Riyadh, Saudi Arabia for funding this work through the research group project no. RGP-229.

Yixue Hao received the B.E. degree in Henan University, China, and his Ph.D. degree in computer science from Huazhong University of Science and Technology (HUST), China, 2017. He is currently working as a post-doctoral scholar in School of Computer Science and Technology at Huazhong University of Science and Technology. His research includes 5G network, internet of things, edge computing and healthcare.

References (36)

  • ChenM. et al.

    Wearable 2.0: Enable human-cloud integration in next generation healthcare system

    IEEE Commun.

    (2017)
  • WanJ. et al.

    Cloud-enabled wireless body area networks for pervasive healthcare

    IEEE Netw.

    (2013)
  • JiangH. et al.

    Smart home based on wifi sensing: A survey

    IEEE Access

    (2018)
  • JensenP.B. et al.

    Mining electronic health records: towards better research applications and clinical care

    Nature Rev. Genet.

    (2012)
  • ZhouY. et al.

    Statistical study of view preferences for online videos with cross-platform information

    IEEE Trans. Multimed.

    (2018)
  • TianD. et al.

    An adaptive fusion strategy for distributed information estimation over cooperative multi-agent networks

    IEEE Trans. Inform. Theory

    (2017)
  • ChenM. et al.

    Task offloading for mobile edge computing in software defined ultra-dense network

    IEEE J. Sel. Areas Commun.

    (2018)
  • ChenJ. et al.

    Dominating set and network coding-based routing in wireless mesh networks

    IEEE Trans. Parallel Distrib. Syst.

    (2015)
  • Cited by (58)

    • Big Data in Forecasting Research: A Literature Review

      2022, Big Data Research
      Citation Excerpt :

      According to Fig. 5, the big data in forecasting research can be generally categorized into three major types by source: UGC data (generated by the users on social media or other web platforms), including online textual data, online photo data, etc.; device data (monitored by devices), including meteorological data, smart meter data, traffic flow data, etc.; log data (recording activities or operations), including bio-medical data, web search data, stock exchange data, online marketing data, etc. Different types of big data have provided different new, rich knowledge to prediction: for example, users' emotions, opinions and attention toward prediction target-related events or issues from UGC data [16,57,58]; the sensor-level, real-time dynamics of weather environment (e.g., real-time temperature and wind speed) [25], electricity consumption [39] and traffic behaviors [59] from device data; medical insights (e.g., regarding pathological features and gene expressions) [60,61], public attention and preferences (in prediction-related activities) [62], market dynamics [63] and marketing behaviors and preferences [64] from log data. This informative knowledge facilitated various challenging forecasting tasks: in the domain of society, the forecasting hotspots were the dynamics of human behaviors [13], market factors [19], social events [65] and transportation [66]; in the nature, big data primarily served the prediction for weather factors [23], environmental factors [25], engineering issues [28] and material properties [29]; in the biology, popular research fields were biomedicine [67], biotechnology [35] and animal and plant science [37].

    View all citing articles on Scopus

    Yixue Hao received the B.E. degree in Henan University, China, and his Ph.D. degree in computer science from Huazhong University of Science and Technology (HUST), China, 2017. He is currently working as a post-doctoral scholar in School of Computer Science and Technology at Huazhong University of Science and Technology. His research includes 5G network, internet of things, edge computing and healthcare.

    Mohd Usama is a Ph.D. candidate in Embedded and Pervasive Computing Lab of Huazhong University of Science and Technology (HUST), China. His research includes Pervasive Computing, The Internet of things, Edge Computing, healthcare. etc.

    Jun Yang received Bachelor and Master degree in Software Engineering from HUST, China in 2008 and 2011, respectively. Currently, he does Postdoc research at Embedded and Pervasive Computing (EPIC) Lab in School of Computer Science and Technology, HUST. His research interests include cognitive computing, software intelligence, Internet of Things, cloud computing and big data analytics, etc.

    M. Shamim Hossain is a Professor at the Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia. Prof. Shamim is also an Adjunct Professor, School of Electrical Engineering and Computer Science (EECS), University of Ottawa, Canada. Prof. Shamim received his Ph.D. in Electrical and Computer Engineering from the University of Ottawa, Canada. His research interests include Cloud networking, smart environment (smart city, smart health), social media, IoT, edge computing and multimedia for healthcare, deep learning approach for multimedia processing, and multimedia big data. He has authored and coauthored approximately 190 publications including refereed IEEE/ACM/Springer/Elsevier journals, conference papers, books, and book chapters. Recently, his publication has been recognized as the ESI Highly Cited Paper. He has served as a member of the organizing and technical committees of several international conferences and workshops. He has served as co-chair, general chair, workshop chair, publication chair, and TPC for over 12 IEEE and ACM conferences and workshops. Currently, he is the co-chair of the 1st IEEE ICME workshop on Multimedia Services and Tools for smart-health (MUST-SH 2018). He is a recipient of a number of awards, including, the Best Conference Paper Award, the 2016 ACM Transactions on Multimedia Computing, Communications and Applications (TOMM) Nicolas D. Georganas Best Paper Award, and the Research in Excellence Award from the College of Computer and Information Sciences (CCIS), King Saud University (3 times in a row). He is on the editorial board of IEEE Network, IEEE Multimedia, IEEE Access, Journal of Network and Computer Applications (Elsevier), Computers and Electrical Engineering (Elsevier), Human-centric Computing and Information Sciences (Springer), Games for Health Journal, and International Journal of Multimedia Tools and Applications (Springer). Currently, he serves as a lead guest editor of Future Generation Computer Systems (Elsevier), IEEE Network Magazine, and IEEE Access. Previously, he served as a guest editor of IEEE Communication Magazine, IEEE Transactions on Information Technology in Biomedicine (currently JBHI), IEEE Transactions on Cloud Computing, International Journal of Multimedia Tools and Applications (Springer), Cluster Computing (Springer), Future Generation Computer Systems (Elsevier), Computers and Electrical Engineering (Elsevier), Sensors (MDPI), and International Journal of Distributed Sensor Networks. Prof. Shamim is a Senior Member of IEEE, a Senior member of ACM and ACM SIGMM.

    Ahmed Ghoneim [M’10] received his M.Sc. degree in software modeling from the University of Menoufia, Egypt, and the Ph.D. degree from the University of Magdeburg, Germany in the area of software engineering, in 1999 and 2007, respectively. He is currently an assistant professor in the department of Software Engineering, College of Computer and Information Sciences (CCIS), King Saud University. His research activities address software evolution; service oriented engineering, software development methodologies, quality of services, net-centric computing, and human computer interaction (HCI). He is a member of the IEEE.

    View full text