1 Introduction

Since the arrival of Data Science, the application of its advantages to health is a growing desire. The analysis of large amount of data available in hospitals to support the optimization of clinical processes is one of current challenges [19]. These technologies are very useful to support health professionals in the creation of better care processes that allows the improvement of the Quality of Care to patients and the effectiveness of the treatments, and also it will allow a better management of the cost of patients, making the health system sustainable [4]. Improving the management of the clinical processes will not only save lives, but can also bring a better and personalized care to more patients.

One of the cases where the accurate coordination of clinicians is crucial, is the case of stroke. Stroke is one of the illnesses that has a higher morbidity and mortality impact. It is specially significant because of the high sociosanitary and associated disability cost that it can cause [9, 15]. That supposition not only has a strong impact in the quality life of patients, but can also increase costs for the health system [14]. The adequate diagnosis, and the timely and coordinated action of health professionals is decisive to save the live of the patient, and also stops the cognitive decline of the patients [2]. The prognosis of stroke depends largely of the possibility of reducing, to the maximum, the brain injury. One of the success keys is the creation and optimization of primary care and emergency protocols for quick diagnosis and treatment, to shorten the time between the stroke event and the application of the adequate treatment. In this way, the creation of Data Science tools for the continuous analysis of the primary care and emergency protocols will be a clear advantage for the improvement of clinical processes for critical diseases like stroke.

Process Mining Technology can be a good option for supporting health professionals in the understanding of the clinical process of emergencies. Process Mining [1] is a relatively new technology that have been used successfully in different fields. Process Mining use machine learning technologies for infer and analyze flows in a human understandable way. This can be used by health professionals for a better understanding of the clinical process, enabling the application of interactive models [12] that have natural application in the medical domain [10].

The aim of this paper is to evaluate the capabilities of Process Mining to analyze the hospital flow of emergencies via the analysis of the stroke processes. In order to do that we have applied a Question Driven methodology [24] based in two main questions:

  • Q1: Can Process Mining detect and measure the special characteristics of the stroke emergency processes?

  • Q2: Is Process Mining able for measuring organizational changes that affects the emergency process?

The objective of this paper is to show how Process Mining can offer solutions to these questions in the medical domain offering information about the statistical significance of the processes. In medical domain, it is not enough to provide information about the processes differences to demonstrate findings. To discover medical knowledge it is mandatory to evaluate the statistical significance, in other case the findings are not conclusive [6]. In that way, we have analyzed a real log of 9046 Emergency episodes of 2145 patients that suffer at least one stroke event between January of 2010 and June of 2017. This log was used to evaluate the questions using process mining technologies and measure their statistical significance.

The paper is organized as follows. First, a related work section to analyze the field and following the emergency flow is presented in more detail. After that, the results proposed and the selected experiments performed are explained. Finally, a discussion part concludes the paper.

2 Related Work

Despite the great advantages of the application of Big Data to healthcare, health professionals have some suspicions about how the current expert clinical knowledge should be integrated into the automatically learned clinical models [16]. So, it is needed to use new models to incorporate, in a better way, this working knowledge into data science models. In this line, Interactive Pattern Recognition (IPR) [12] is a formal framework that introduces the medical expert in the middle of the learning process allowing them to correct the hypothesis model in each iteration avoiding undesirable errors, and to converge to a solution in an iterative way. However, this framework requires human understandable machine learning frameworks to take advantage of this possibilities. The application of Process Mining within this framework can be a good solution to solve this gap.

Process Mining [1] is a research discipline area that uses existing information in clinical databases and Hospital Information Systems (HIS) to create human understandable views that support healthcare stakeholders in the better understanding of the clinical process. In last years Process Mining has been applied in the medical domain [18, 23]. There are some applications where the applications of Process Mining has successfully demonstrated how the medical experts can discover the clinical protocols in different disciplines like dental treatments [17]; surgery flow [11]; or chemotherapy [3]. Also, Process Mining Interactive methodologies has been proposed for supporting the application of these technologies in the medical domain. This is the case of the Question Driven methodology [24]. This methodology propose the formulation of research questions, based on daily problems of physicians, and use Process Mining technologies to solve it.

In the case of medical emergencies there are some recent studies that apply Process Mining. In [8, 21] the authors apply Process Mining control-flow discovery and clustering techniques for inferring the most common emergency unit flows. In [20], different hospitals has been compared attending to triage protocols measuring the patients flows using discovery techniques. In [22], the emergency flow is analyzed based on a Question Driven methodology, which is an interactive [12] methodology that are intended to support health professionals via solving their specific questions in an iterative way.

In addition to discovering the flows, in order to use these techniques in the case of stroke where the time is crucial, it is also necessary to analyze the time spent in each one of the emergency stages, to properly characterize and compare the processes.

In addition, the concept of statistical significance has a critical importance to create new medical knowledge. To evaluate and measure medical processes there is a need to show the differences between them, besides is mandatory to show the statistical significance of the findings. Although there are some suspicions with the interpretation and use of statistical significance indexes like P-Value [5, 13], it is clear that most of clinical literature is focused on measuring the statistical significance using P-Value [6]. For that, in order to provide trustable information to healthcare professionals, it is desirable to provide a measure of statistical significance within the flow, to allow acceptance of the results achieved by Process Mining algorithms in the medical domain.

3 The Emergency Room Treatment Process

Figure 1 illustrates the Emergency Process in a hospital. The process starts when a patient arrives to the hospital and being administrative admitted. Then, the patient waits until the clinical staff performs the triage. The triage step is guided by a software that provides questions to be asked to the patient in order to determine a level of priority (classified into one of the 5 existing levels, from less (5) to more (1) priority according to the classical Manchester codification [25]). Next, the patient waits until the system assigns a physician to the case based on the physician discipline, the patient priority, and the availability of the resources). Then, the patient receives medical attention and the case is discharged. Given that this work focuses on the stroke emergency process, we distinguish three possible discharges: Ordinary Discharge (the patient is sent home), Stroke (the patient goes directly to the special unit that treats this cases), and Hospital Admission (to treat other complex cases out of the scope of this work).

In this moment, everything is ready to receive medical attention, depending on the seriousness priority. When a physician is free, he selects a patient in the computer system depending on his specialty and the patient’s priority. This starts the Medical Attention process that finishes with the discharge of the patient. Depending on the final assessment, the patient is identified as a: Stroke case, Ordinary Discharge, or a Hospital Admission.

Fig. 1.
figure 1

General flow of medical emergencies.

In this work, we have used real data from 2145 patients that have suffered a stroke episode between the period since January 2010 to June 2017. An emergency episode is related to the process followed by a patient in the emergency area. The log information is acquired from the Hospital Information System (HIS) with a time stamp granularity in seconds. In this log, we have a set of 9046 emergency episodes that can be divided in three kind of episodes depending on the discharge destiny (available in the HIS), 5536 (54%) are Ordinary episodes, 2265 (35%) corresponds to Stroke episodes, and 1222 (11%) are episodes with a hospital admission non directly related to stroke.

4 Statistical Significance and Time Maps

For the Process Mining analysis we have used a Desktop version of PALIA Suite called PMCode [11]. The main characteristic of PALIA Suite/PMCode is that is focused on the creation of custom Process Mining dashboards for their use in the medical domain. This framework has been successfully tested in the application of Question Driven methodology over emergency data [22] and other medical domains like surgery [11] or diabetes [7]. As the discovery algorithm, we have selected PALIA Light algorithm, implemented on PMCode, a version of the PALIA algorithm [11] for Non-Parallel Logs, because is implemented on PMCode and more efficient for the problem than the complete version of PALIA.

In PMCode it is possible to create render maps for customizing the colors and features of the nodes of discovered model. This allows to apply specific enhancement maps over the process previously discovered in order to highlight the nodes depending on a customized formulation. In this way, for the experiments performed in this paper we have used two main custom maps: Time and Statistical Significance maps.

  • Time Maps: These maps provide a gradient color view representing the time spent in each one of the nodes. The time spent is represented by the average of the duration time spent in each one of the stages of the Emergency flow. Figure 3 is an example of how the stages duration is presented. In this view, a gradient from green to red is used to represent the time spent. The greener is the color, the quicker is the activity and, on the contrary, the redder is the node, the more time is spent in the stage. The green color represents the minimum time observed and the red color represents the maximum one.

  • Statistical Significance Maps: When comparing two flows, it is not only needed to see the difference of colors in the time maps, but also to evaluate if the distance between these nodes is statistically significant. To solve that, we have designed an enhancement map that compute the hypothesis test between each one of the nodes of the two workflows. Each node is represented for the set of durations for each of the executions associated. For evaluating the statistical significance between two nodes, we calculated the P-Value.

    The P-Value is calculated according to the sequence of tests defined in Fig. 2. A Kolmogorov-Smirnov Test has been used for evaluating the normality of the distribution of the time values of the nodes of the two flows. If the set of values pass the normality test, we applied a classical T-Student Test for the computation of P-Value. On the contrary, if the result is not normal we apply a Mann-Whitney-Wilcoxon Test. The P-Value threshold for statistical significance is fixed at 0.05.

Fig. 2.
figure 2

Calculation of P-Value in the statistical significance map.

Figure 4 shows an example of the applications of time and statistical significance Maps. While the colors represents the time spent in each one of the stages, the nodes which statistical significance are below the threshold are highlighted in with a yellow line. Using this view, the health professionals not only can evaluate the changes in the flow, but also distinguish which of those changes are statistically significant. This measure will support health professionals in the detection of differences with a strong evidences. A comparison between two nodes can have a high median/average differences, but a high P-Value. This means that there are not real evidences that the behavior of patients at this point are actually different. On the contrary, if the P-Value is lower than the threshold (typically 0.05) it is assumed that there are strong reasons to think that the behavior of the patients in that point of the flow is different. This information is crucial to discover and demonstrate medical evidence.

5 Experiments

In this section, we evaluated the proposed questions using Process Mining Techniques.

5.1 Q1: Can Process Mining Detect and Measure the Special Characteristics of the Stroke Emergency Processes?

The aim of this question is to evaluate how Process Mining can show the differences in the stroke emergency process. Although topologically, the stages followed by stroke episodes are the same than ordinary or other hospital admissions, it is expected that it should show differences in the time spent in some of the stages due to the special characteristics of the problem. To show the differences among the time spent, depending on the level of emergency, we have labeled the nodes of wait and attention time with the level of triage selected. Also, we have labeled the events with the most common discharge destinies (Exitus (Death), Home, Primary Care,...).

Fig. 3.
figure 3

Flow of the ordinary discharge episodes for Q1. The colours in nodes represents the average time spent in each one of the activities. (Color figure online)

Figure 3 shows the ordinary episodes flow after applying discovery and time maps. In this map, it is possible to see the time spent in each one of the stages in a qualitative way. As expected, the time of waiting is inversely proportional to the emergency priority, while the time of attention is directly proportional. That means most complex emergencies have lower waiting time but take more time to be treated.

Table 1. Analysis of statistical significance between ordinary and stroke unit admission nodes (Interquartile range in minutes). Bold rows are the one that have statistical significance.

Figure 4 shows the comparison between the Brain Stroke Unit, the Ordinary flow and Table 1 show the numerical stats. The differences are more significant than in the hospital admission care. The Admission and the Triage times are significantly lower, as well as the Attention times for high priority emergencies. Also, Low priority emergencies have a higher time to attention in stroke case, increasing significantly level 4 episodes time stay in 212 min (8,67 times worst than ordinary episodes).

Fig. 4.
figure 4

Flow of Stroke episodes with statistical significance map. The colours in nodes represents the average time spent in each one of the activities. (Color figure online)

Table 2. Analysis of statistical significance between single and double triage Nodes (Interquartile range in minutes). Bold rows are the one that have statistical significance.
Fig. 5.
figure 5

Flow for single triage (January to March 2017). The colours in nodes represents the average time spent in each one of the activities, and colours in edges represents the quantity of patients that follows this path

Fig. 6.
figure 6

Flow for double triage (March to June 2017). The colours in nodes represents the average time spent in each one of the activities, and colours in edges represents the quantity of patients that follows this path

5.2 Q2: Process Mining Is Able to Measure Organizational Changes in the Stroke Emergency Process?

In March of 2017 an organizational change in the Emergency protocol of the Hospital was deployed, enabling a second place for triage. The aim of modification was to improve the time of admission of, at least, most complex emergencies. This question is oriented to evaluate if Process Mining is able to detect this organizational change and quantify how affects to the stroke emergency process. In this way, we have performed a study from January to June of 2017 over stroke episodes, splitting the log in single triage (before March) and double triage (after March). In this log we have 284 (40%) episodes of single triage and 425 (60%) of double triage.

Figure 5 shows the flow inferred with double triage. The arrows colors quantify the number of patients over the flow. This highlights the most common paths. Figure 6 shows the comparison between the single and double triage flow and Table 2 show the numerical stats. According to that, there is a significant decrease of admission times in 3,26 min (30% of improvement). Also, there is a significant decreasing of time in Wait3 node in 16 min (28% of improvement), that is the most populated waiting stage.

6 Discussion and Conclusions

In this paper, we have analyzed how Process Mining can support health Professionals in the analysis of emergency processes taking as example the stroke problem. We have and compared different process and we have provide a tool to state the statistical significance of these differences using P-Value method. The measure of the statistical significance is crucial for achieving medical trust ability. In clinical world, if the results are not supported by an evaluation of the statistical evidence, we can’t trust on them as an actual medical evidence.

On one hand, we have evaluated if Process Mining technologies can discover the characteristics of processes followed for specific diseases. In this way, we have stated the differences between the ordinary emergency process with the stroke emergency process. We have observed there is a clear difference in the time of triage, admission and attention with the stroke emergency process. The stroke process requires a specific treatment that should be covered by the stroke unit of the hospital and the emergency physicians should stabilize and derive the patient to the unit as soon as possible. In this process, the triage is crucial, the selection of a correct emergency level decreases significantly the time of stay in emergencies. In our study we have detected a set of under-triaged stroke patients that were incorrectly classified as level 4 according to the emergency classification. This is probably caused by a confusion with a typical level 4 disease. This can dramatically increase the time of stay in a 867% and this should be considered. This increase of time can be decisive for the survival or cognitive decline of the patient.

On the other hand, we have analyzed how Process Mining can measure the impact of organizational changes in the triage process. Specifically, we have compared the differences between a triage with two nurses and the triage with just one nurse. As expected, we have demonstrated that there is a significant change in the time of admission of brain stroke patients. Also, we have discovered that this change affects positively the waiting time of level 3 patients, that, in fact, are the most common. In addition, there is no evidence that this change affects the quality of the triage, because there is not significant changes observed in the time of attention. Our findings demonstrate that the use of Process Mining, not only allows health professionals to understand the clinical processes, but also can support the optimization of the process in an interactive way by measuring the impact of the organizational changes in critical diseases like stroke. The statistical significance maps provides a layer of trustability to health professionals enabling them in paying attention to specially significant differences.

In that way, Process Mining can be an exceptional tool for analyzing widely the processes, to detect special circumstances that experts should pay attention, and, after that, can support them in the impact analysis in the posterior correctional actions in an iterative and interactive way.