Workflow mining: A survey of issues and approaches
Introduction
During the last decade workflow management technology [2], [4], [21], [35], [41] has become readily available. Workflow management systems such as Staffware, IBM MQSeries, COSA, etc. offer generic modeling and enactment capabilities for structured business processes. By making process definitions, i.e., models describing the life-cycle of a typical case (workflow instance) in isolation, one can configure these systems to support business processes. These process definitions need to be executable and are typically graphical. Besides pure workflow management systems many other software systems have adopted workflow technology. Consider for example Enterprise Resource Planning (ERP) systems such as SAP, PeopleSoft, Baan and Oracle, Customer Relationship Management (CRM) software, Supply Chain Management (SCM) systems, Business to Business (B2B) applications, etc. which embed workflow technology. Despite its promise, many problems are encountered when applying workflow technology. One of the problems is that these systems require a workflow design, i.e., a designer has to construct a detailed model accurately describing the routing of work. Modeling a workflow is far from trivial: It requires deep knowledge of the business process at hand (i.e., lengthy discussions with the workers and management are needed) and the workflow language being used.
To compare workflow mining with the traditional approach towards workflow design and enactment, consider the workflow life cycle shown in Fig. 1. The workflow life cycle consists of four phases: (A) workflow design, (B) workflow configuration, (C) workflow enactment, and (D) workflow diagnosis. In the traditional approach the design phase is used for constructing a workflow model. This is typically done by a business consultant and is driven by ideas of management on improving the business processes at hand. If the design is finished, the workflow system (or any other system that is “process aware”) is configured as specified in the design phase. In the configuration phases one has to deal with limitation and particularities of the workflow management system being used (cf. [5], [65]). In the enactment phase, cases (i.e., workflow instances) are handled by the workflow system as specified in the design phase and realized in the configuration phase. Based on a running workflow, it is possible to collect diagnostic information which is analyzed in the diagnosis phase. The diagnosis phase can again provide input for the design phase thus completing the workflow life cycle. In the traditional approach the focus is on the design and configuration phases. Less attention is paid to the enactment phase and few organizations systematically collect runtime data which is analyzed as input for redesign (i.e., the diagnosis phase is typically missing).
The goal of workflow mining is to reverse the process and collect data at runtime to support workflow design and analysis. Note that in most cases, prior to the deployment of a workflow system, the workflow was already there. Also note that in most information systems transactional data is registered (consider for example the transaction logs of ERP systems like SAP). The information collected at run-time can be used to derive a model explaining the events recorded. Such a model can be used in both the diagnosis phase and the (re)design phase.
Modeling an existing process is influenced by perceptions, e.g., models are often normative in the sense that they state what “should” be done rather than describing the actual process. As a result models tend to be rather subjective. A more objective way of modeling is to use data related to the actual events that took place. Note that workflow mining is not biased by perceptions or normative behavior. However, if people bypass the system doing things differently, the log can still deviate from the actual work being done. Nevertheless, it is useful to confront man-made models with models discovered through workflow mining.
Closely monitoring the events taking place at runtime also enables Delta analysis, i.e., detecting discrepancies between the design constructed in the design phase and the actual execution registered in the enactment phase. Workflow mining results in an “a posteriori” process model which can be compared with the “a priori” model. Workflow technology is moving into the direction of more operational flexibility to deal with workflow evolution and workflow exception handling [2], [7], [10], [13], [20], [30], [39], [40], [64]. As a result workers can deviate from the prespecified workflow design. Clearly one wants to monitor these deviations. For example, a deviation may become common practice rather than being a rare exception. In such a case, the added value of a workflow system becomes questionable and an adaptation is required. Clearly, workflow mining techniques can be used to create a feedback loop to adapt the workflow model to changing circumstances and detect imperfections of the design.
The topic of workflow mining is related to management trends such as Business Process Reengineering (BPR), Business Intelligence (BI), Business Process Analysis (BPA), Continuous Process Improvement (CPI), and Knowledge Management (KM). Workflow mining can be seen as part of the BI, BPA, and KM trends. Moreover, workflow mining can be used as input for BPR and CPI activities. Note that workflow mining seems to be more appropriate for BPR than for CPI. Recall that one of the basic elements of BPR is that it is radical and should not be restricted by the existing situation [23]. Also note that workflow mining is not a tool to (re)design processes. The goal is to understand what is really going on as indicated in Fig. 1. Despite the fact that workflow mining is not a tool for designing processes, it is evident that a good understanding of the existing processes is vital for any redesign effort.
This paper is a joint effort of a number of researchers using different approaches to workflow mining and is a spin-off of the “Workflow Mining Workshop”.1 The goal of this paper is to introduce the concept of workflow mining, to identify scientific and practical problems, to present a common format to store workflow logs, to provide an overview of existing approaches, and to present a number of mining techniques in more detail.
The remainder of this paper is organized as follows. First, we summarize related work. In Section 3 we define workflow mining and present some of the challenging problems. In Section 4 we propose a common XML-based format for storing and exchanging workflow logs. This format is used by the mining tools developed by the authors and interfaces with some of the leading workflow management systems (Staffware, MQSeries Workflow, and InConcert). 5 Which class of workflow processes can be rediscovered?––An approach based on Petri net theory, 6 How to deal with noise and incomplete logs: Heuristic approaches, 7 How to measure the quality of a mined workflow model?––An experimental approach, 8 How to mine workflow processes with duplicate tasks?––An inductive approach, 9 How to mine block-structured workflows?––A data mining approach introduce five approaches to workflow mining focusing on different aspects. These sections give an overview of some of the ongoing work on workflow mining. Section 10 compares the various approaches and list a number of open problems. Section 11 concludes the paper.
Section snippets
Related work
The idea of process mining is not new [8], [11], [15], [16], [17], [24], [25], [26], [27], [28], [29], [42], [43], [44], [53], [54], [55], [56], [57], [61], [62], [63]. Cook and Wolf have investigated similar issues in the context of software engineering processes. In [15] they describe three methods for process discovery: one using neural networks, one using a purely algorithmic approach, and one Markovian approach. The authors consider the latter two the most promising approaches. The purely
Workflow mining
The goal of workflow mining is to extract information about processes from transaction logs. Instead of starting with a workflow design, we start by gathering information about the workflow processes as they take place. We assume that it is possible to record events such that (i) each event refers to a task (i.e., a well-defined step in the workflow), (ii) each event refers to a case (i.e., a workflow instance), and (iii) events are totally ordered. Any information system using transactional
Workflow logs: A common XML format
In this section we focus on the syntax and semantics of the information stored in the workflow log. We will do this by presenting a tool independent XML format that is used by each of the mining approaches/tools described in the remainder. Fig. 4 shows that this XML format connects transactional systems such as workflow management systems, ERP systems, CRM systems, and case handling systems. In principle, any system that registers events related to the execution of tasks for cases can use this
Which class of workflow processes can be rediscovered?––An approach based on Petri net theory
The first approach we would like to discuss in more detail uses a specific class of Petri nets, named workflow nets (WF-nets), as a theoretical basis [1], [4]. Some of the results have been reported in [3], [8] and there are two tools to support this approach: EMiT [3] and MiMo [8]. Note that the tool Little Thumb (see Section 6) also support this approach but in addition is able to deal with noise.
In this more theoretical approach, we do not focus on issues such as noise. We assume that there
How to deal with noise and incomplete logs: Heuristic approaches
The formal approach presented in the preceding section presupposes perfect information: (i) the log must be complete (i.e., if a task can follow another task directly, the log should contain an example of this behavior) and (ii) we assume that there is no noise in the log (i.e., everything that is registered in the log is correct). However, in practical situations logs are rarely complete and/or noise free. Therefore, in practical situations, it becomes more difficult to decide if between two
How to measure the quality of a mined workflow model?––An experimental approach
As we already mentioned in Section 5, there are classes of Petri nets for which we can formally prove that the mined model is equivalent or has a behavior similar to the original Petri net. In this section we search for more general methods to measure the quality of mined workflow models.
An important criterion for the quality of a mined workflow model is the consistency between the mined model and the traces in the workflow log. Therefore, a standard check for a mined model, is to try to
How to mine workflow processes with duplicate tasks?––An inductive approach
The approaches presented in the preceding sections assume that a task name should be a unique identifier within a process, i.e., in the graphical models it is not possible to have multiple building blocks referring to the same task. For some processes this requirement does not hold. There may be more than one task sharing the same name. An example of such a process is the part release process for the development of passenger car from [25], which is shown in Fig. 10. Although one may find unique
How to mine block-structured workflows?––A data mining approach
The last approach discussed in this paper is tailored towards mining block-structured workflows. There are two notable differences with the approaches presented in the preceding four sections. First of all, only block structured workflow patterns are considered. Second, the mining algorithm is based on rewriting techniques rather than graph-based techniques. In addition, the objective of this approach is to mine complete and minimal models: Complete in the sense that all recorded cases are
Comparison and open problems
As indicated in 5 Which class of workflow processes can be rediscovered?––An approach based on Petri net theory, 6 How to deal with noise and incomplete logs: Heuristic approaches, 7 How to measure the quality of a mined workflow model?––An experimental approach, 8 How to mine workflow processes with duplicate tasks?––An inductive approach, 9 How to mine block-structured workflows?––A data mining approach tools such as EMiT, Little Thumb, InWoLvE, and Process Miner are driven by different
Conclusion
In this paper, we presented an overview of the various problems, techniques, tools, and approaches for workflow mining. It is quite interesting to see how the five approaches presented in 5 Which class of workflow processes can be rediscovered?––An approach based on Petri net theory, 6 How to deal with noise and incomplete logs: Heuristic approaches, 7 How to measure the quality of a mined workflow model?––An experimental approach, 8 How to mine workflow processes with duplicate tasks?––An
Wil van der Aalst is a full professor of Information Systems and head of the section of Information and Technology of the Department of Technology Management at Eindhoven University of Technology. He is also a part-time full professor at the Computing Science faculty at the department of Mathematics and Computer Science at the same university. His research interests include information systems, simulation, Petri nets, process models, workflow management systems, verification techniques,
References (65)
The application of petri nets to workflow management
Journal of Circuits, Systems and Computers
(1998)- W.M.P. van der Aalst, J. Desel, A. Oberweis (Eds.), Business Process Management: Models, Techniques, and Empirical...
- W.M.P. van der Aalst, B.F. van Dongen, Discovering Workflow Performance Models from Timed Logs, in: Y. Han, S. Tai, D....
- et al.
Workflow Management: Models, Methods, and Systems
(2002) - W.M.P. van der Aalst, A.H.M. ter Hofstede, B. Kiepuszewski, A.P. Barros, Workflow Patterns. QUT Technical report,...
- W.M.P. van der Aalst, A.H.M. ter Hofstede, B. Kiepuszewski, A.P. Barros, Workflow Patterns. QUT Technical report,...
- et al.
Dealing with workflow change: Identification of issues and solutions
International Journal of Computer Systems, Science, and Engineering
(2000) - W.M.P. van der Aalst, A.J.M.M. Weijters, L. Maruster, Workflow Mining: Which Processes can be Rediscovered? BETA...
- et al.
X-tra––KLeinduimpje in Workflowland: Op zoek naar procesdata
Scope
(2002) - A. Agostini, G. De Michelis, Improving Flexibility of Workflow Management Systems, in: W.M.P. van der Aalst, J. Desel,...
Discovering models of software processes from event-based data
ACM Transactions on Software Engineering and Methodology
Software process validation: Quantitatively measuring the correspondence of a process to a model
ACM Transactions on Software Engineering and Methodology
Reengineering the corporation
Integrating machine learning and workflow management to support acquisition and adaptation of workflow models
International Journal of Intelligent Systems in Accounting, Finance and Management
Semistructured models are surprisingly useful for user-centered design
Personeelsinformatiesystemen: De Wet Persoonsregistraties toegepast
Cited by (887)
Understanding the stumbling blocks of Italian higher education system: A process mining approach
2024, Expert Systems with ApplicationsOn the Co-authorship network analysis in the Process Mining research Community: A social network analysis perspective
2022, Expert Systems with ApplicationsCitation Excerpt :The applications of the process discovery were developed in order to overcome the problems in this field (Cook & Wolf, 1995). The demands for management of the business processes in various industries and, on the other hand, enhancement of the generation of inexplicable data removed its restriction to only scientific papers and pushed it toward industrial applications (van der Aalst et al., 2003; Van Der Aalst, 2012). Wil Van Der Aalst, a Dutch computer scientist who contributed to more than 900 articles and books, made dramatic changes in Process Mining field during this transition period.
A framework for inferring and analyzing pharmacotherapy treatment patterns
2024, BMC Medical Informatics and Decision MakingVisualization, transformation, and analysis of execution traces with the eclipse TRACE4CPS trace tool
2024, International Journal on Software Tools for Technology TransferMining Data Wrangling Workflows for Design Patterns Discovery and Specification
2024, Information Systems FrontiersPutting the SWORD to the Test: Finding Workarounds with Process Mining
2024, Business and Information Systems Engineering
Wil van der Aalst is a full professor of Information Systems and head of the section of Information and Technology of the Department of Technology Management at Eindhoven University of Technology. He is also a part-time full professor at the Computing Science faculty at the department of Mathematics and Computer Science at the same university. His research interests include information systems, simulation, Petri nets, process models, workflow management systems, verification techniques, enterprise resource planning systems, computer supported cooperative work, and interorganizational business processes.
Boudewijn van Dongen is a student at the Department of Computer Science and Mathematics at Eindhoven University of Technology, Eindhoven, The Netherlands. In 2002 he conducted a project on workflow mining and developed the workflow mining tool EMiT. Currently, he is doing his master thesis at the Department of Computer Science and Mathematics, after which he will become a Ph.D. candidate at the Department of Technology Management at Eindhoven University of Technology.
Joachim Herbst studied computer science at the University of Ulm, where he also did his Ph.D. in the area of workflow mining. Since 1995 he has been working for DaimlerChrysler Research and Technology. His research interests include machine learning, workflow management, enterprise application integration and concurrent engineering.
Laura Maruster received her B.S. degree in 1994 and M.S. in 1995, both in Computer Science Department at West University of Timisoara, Romania. At present she is a Ph.D. candidate of the Department of Technology Management of Eindhoven University of Technology, Eindhoven, The Netherlands. Her research interests include induction of machine learning and statistical models, process mining and knowledge discovery.
Guido Schimm studied computer science and business economy at the University of Wernigerode, Germany. He joined the Oldenburg Institute for Computer Science Tools and Systems (OFFIS) in 1999 as a member of the business intelligence and knowledge management team. Guido has been engaged in many ERP and workflow projects. Currently, his research interest is focused on theoretical foundation and practical implementation of workflow mining technologies.
Ton Weijters is associate professor at the Department of Technology Management of the Eindhoven University of Technology (TUE), and member of the BETA research group. Currently he is working on (i) the application of Knowledge Engineering and Machine Learning techniques for planning, scheduling, and process mining (ii) fundamental research in the domain of Machine Learning and Knowledge Discovering. He is the author of many scientific publications in the mentioned research field.