Abstract
Within the process mining domain, research on comparing control-flow (CF) discovery techniques has gained importance. A crucial building block of empirical analysis of CF discovery techniques is obtaining the appropriate evaluation data. Currently, there is no answer to the question of how to collect such evaluation data. The paper introduces a methodology for generating artificial event data (GED) and an implementation called the Process Tree and Log Generator. The GED methodology and its implementation provide users with full control over the characteristics of the generated event data and an integration within the ProM framework. Unlike existing approaches, there is no tradeoff between including long-term dependencies and soundness of the process. The contributions of the paper provide a solution for a necessary step in the empirical analysis of CF discovery algorithms.
Similar content being viewed by others
Notes
Noise is defined in this paper as incorrect behavior in the log (see Sect. 4.4).
Invisible activities are endpoints in the tree and hence are never selected to be replaced.
Reducing parent and child loop nodes could cause the parent loop node to have more than three children.
A descendant is a node reachable by repeatedly going from parent to child.
Tree \(PT_1 = \circlearrowleft ^{k}(\times (a,b),c,d)\) is not trace equivalent to tree \(PT_2 = \times (\circlearrowleft ^{k}(a,c,d),\circlearrowleft ^{k}(b,c,d))\).
Notice that activity e can be repeated once.
These mean relative frequencies of the operator types were calculated before the trees were reduced.
The scalability of the log generation is outside the scope of this paper as the focus is on model generation with LT dependencies.
References
Benavoli A, Corani G, Mangili F (2016) Should we really use post-hoc tests based on mean-ranks? J Mach Learn Res 17:1–10. http://jmlr.org/papers/v17/benavoli16a.html. Accessed 21 Oct 2017
Box GE, Hunter JS, Hunter WG (2005) Statistics for experimenters: design, innovation, and discovery, vol 2. Wiley, New York. http://stats.cwslive.wiley.com/details/book/2791421/Statistics-for-Experimenters-Design-Innovation-and-Discovery-2nd-Edition.html. Accessed 16 Jan 2017
Buijs JCAM (2014) Flexible evolutionary algorithms for mining structured process models. PhD thesis, Technische Universiteit Eindhoven, Eindhoven. http://alexandria.tue.nl/extra2/780920.pdf. Accessed 23 Feb 2015
Burattin A (2015) PLG2: Multiperspective processes randomization and simulation for online and offline settings. Tech. rep., University of Innsbruck. https://arxiv.org/abs/1506.08415. Accessed 28 July 2016
Burattin A, Sperduti A (2011) PLG: a framework for the generation of business process models and their execution logs. In: business process management workshops, Springer, Heidelberg, pp 214–219. http://link.springer.com/chapter/10.1007/978-3-642-20511-8sps20. Accessed 06 Jan 2015
De Weerdt J, De Backer M, Vanthienen J, Baesens B (2012) A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs. Inf Syst 37(7):654–676. https://doi.org/10.1016/j.is.2012.02.004. http://www.sciencedirect.com/science/article/pii/S0306437912000464. Accessed 10 Dec 2013
de Leoni M, van der Aalst WM, Dees M (2016) A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs. Inf Syst 56:235–257. http://www.sciencedirect.com/science/article/pii/S0306437915001313. Accessed 19 Dec 2016
de Medeiros AKA, Weijters AJ, van der Aalst WM (2007) Genetic process mining: an experimental evaluation. Data Min Knowl Discov 14(2):245–304. http://link.springer.com/article/10.1007/s10618-006-0061-7. Accessed 26 May 2014
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30. http://dl.acm.org/citation.cfm?id=1248548. Accessed 26 May 2014
Dumas M, La Rosa M, Mendling J, Reijers HA (2013) Fundamentals of business process management. Springer, Heidelberg. http://link.springer.com/content/pdf/10.1007/978-3-642-33143-5.pdf. Accessed 15 Sept 2015
Günther CW (2009) Process mining in flexible environments. PhD thesis, Technische Universiteit Eindhoven. http://www.narcis.nl/publication/RecordID/oai:library.tue.nl:644335. Accessed 04 Apr 2016
Jensen K, Kristensen LM, Wells L (2007) Coloured petri nets and CPN tools for modelling and validation of concurrent systems. Int J Softw Tools Technol Transf 9(3-4):213–254. http://link.springer.com/article/10.1007/s10009-007-0038-x. Accessed 26 May 2014
Jin T, Wang J, Wen L (2011) Efficiently querying business process models with BeehiveZ. In: BPM (Demos), http://ceur-ws.org/Vol-820/Demo1.pdf. Accessed 06 Nov 2013
Johannesson P, Perjons E (2014) An introduction to design science. Springer, Heidelberg. http://books.google.be/books?hl=nl&lr=&id=ovvFBAAAQBAJ&oi=fnd&pg=PR5&dq=an+introduction+to+design+science&ots=r45U7mRMl4&sig=FuiaKJ1PoRCdgTev-IQAIebgcfE. Accessed 05 Dec 2014
Jouck T, Depaire B (2016) PTandLogGenerator: a generator for artificial event data. In: Proceedings of the BPM Demo Track 2016 (BPMD 2016), CEUR workshop proceedings, Rio de Janeiro, vol 1789, pp 23–27. http://ceur-ws.org/Vol-1789/. Accessed 05 Jan 2018
Jouck T, Depaire B (2017) Simulating process trees using discrete-event simulation. Technical Report, Hasselt University. https://uhdspace.uhasselt.be/dspace/handle/1942/23130. Accessed 05 Jan 2018
Kataeva V, Moscow RF, Kalenkova AA (2014) Applying graph grammars for the generation of process models and their logs. In: Proceedings of the spring/summer young researchers colloquium on software engineering, http://syrcose.ispras.ru/2014/files/submissions/12spssyrcose2014.pdf. Accessed 19 Dec 2014
Leemans SJ, Fahland D, van der Aalst WM (2014) Discovering block-structured process models from event logs containing infrequent behaviour. In: Business process management workshops, Springer, Heidelberg, pp 66–78. http://link.springer.com/chapter/10.1007/978-3-319-06257-0sps6. Accessed 27 Oct 2015
Matloff N (2008) Introduction to discrete-event simulation and the simpy language. Davis, CA Dept of Computer Science University of California at Davis Retrieved on August 2:2009. http://web.cs.ucdavis.edu/~matloff/matloff/publicspshtml/SimCourse/PLN/DESimIntro.pdf. Accessed 20 Mar 2016
Mitsyuk AA, Shugurov IS, Kalenkova AA, van der Aalst WM (2017) Generating event logs for high-level process models. Simul Model Pract Theor 74:1–16. http://www.sciencedirect.com/science/article/pii/S1569190X17300047. Accessed 05 Jan 2018
Robinson S (2014) Simulation: the practice of model development and use. Palgrave Macmillan, Basingstoke. https://books.google.be/books?hl=nl&lr=&id=TEMdBQAAQBAJ&oi=fnd&pg=PP1&dq=Simulation+%E2%80%93+The+practice+of+model+development+and+use&ots=XIP9NsOH2J&sig=ASAxxwYB2hSCFVqaAAuJFe4nBbs. Accessed 25 Aug 2016
Rozinat A, de Medeiros AA, Gnther CW, Weijters A, van der Aalst WM (2007) Towards an evaluation framework for process mining algorithms. BPM Center Report BPM-07-06, http://alexandria.tue.nl/repository/books/630086.pdf. Accessed 04 Feb 2014
Russell N, ter Hofstede AHM, van der Aalst WMP, Mulyar N (2006) Workflow controlflow patterns: a revised view. Tech. Rep. 06-22. https://www.bpmcenter.org/. Accessed 20 Feb 2015
Shannon RE (1977) Introduction to simulation languages. In: Proceedings of the 9th conference on winter simulation-Volume 1, winter simulation conference, pp 14–20. http://dl.acm.org/citation.cfm?id=807515. Accessed 22 Nov 2016
Stocker T, Accorsi R (2013) Secsy: security-aware synthesis of process event logs. In: Workshop on enterprise modelling and information systems architectures, pp 71–84. https://pdfs.semanticscholar.org/fa29/18da96fa73fe6430233ea6b9403c86fd6797.pdf. Accessed 05 Jan 2018
Van der Aalst W, Adriansyah A, van Dongen B (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip Rev Data Min Knowl Discov 2(2):182–192. http://onlinelibrary.wiley.com/doi/10.1002/widm.1045/full. Accessed 06 Feb 2015
van derWerf JME, van Dongen BF, Hurkens CA, Serebrenik A (2009) Process discovery using integer linear programming. Fund Inf 94(3):387–412. http://iospress.metapress.com/index/CX85102T26280611.pdf. Accessed 26 May 2014
van Dongen BF, De Medeiros AA, Wen L (2009) Process mining: overview and outlook of petri net discovery algorithms. In: Transactions on petri nets and other models of concurrency II, Springer, Heidelberg, pp 225–242. http://link.springer.com/chapter/10.1007/978-3-642-00899-3sps13. Accessed 01 June 2014
van Hee KM, Liu Z (2010) Generating benchmarks by random stepwise refinement of petri nets. In: ACSD/Petri Nets Workshops, pp 403–417. http://ceur-ws.org/Vol-827/31spsKeesHeespsarticle.pdf?origin=publicationspsdetail. Accessed 25 Nov 2014
van der Aalst W (2016) Process mining: data science in action. Springer, Heidelberg. https://books.google.be/books?hl=nl&lr=&id=hUEGDAAAQBAJ&oi=fnd&pg=PR5&dq=process+mining+data+science+in+action&ots=ZBhPEo-BpL&sig=Ahy9qBgJGES4kWX3NnsNeGu6ekY. Accessed 17 Nov 2016
van der Aalst WMP (1998) The application of Petri nets to workflow management. J Circ Syst Comput 8(01):21–66. http://www.worldscientific.com/doi/abs/10.1142/S0218126698000043. Accessed 20 Feb 2015
van der Aalst W, Buijs J, Van Dongen B (2012) Towards improving the representational bias of process mining. In: Data-driven process discovery and analysis, Springer, Heidelberg, pp 39–54. http://link.springer.com/chapter/10.1007/978-3-642-34044-4_3. Accessed 20 Feb 2015
vanden Broucke SK, Delvaux C, Freitas J, Rogova T, Vanthienen J, Baesens B (2014) Uncovering the relationship between event log characteristics and process discovery techniques. In: Business process management workshops, Springer, Heidelberg, pp 41–53. http://link.springer.com/chapter/10.1007/978-3-319-06257-0sps4. Accessed 19 Sept 2014
Verbeek HMW, Buijs JCAM, Van Dongen BF, Van Der Aalst WMP (2011) Xes, xesame, and prom 6. In: Soffer P, Proper E (eds) Information systems evolution, Springer, Heidelberg, pp 60–75. http://link.springer.com/chapter/10.1007/978-3-642-17722-4sps5. Accessed 20 Feb 2015
Wang J, Wong RK, Ding J, Guo Q, Wen L (2012) On recommendation of process mining algorithms. In: IEEE 19th International conference on web services (ICWS), IEEE, pp 311–318. http://ieeexplore.ieee.org/xpls/absspsall.jsp?arnumber=6257822. Accessed 30 Dec 2015
Weber P, Bordbar B, Tino P (2013) A framework for the analysis of process mining algorithms. IEEE Transact Syst Man Cybern Syst 43(2):303–317. http://ieeexplore.ieee.org/xpls/absspsall.jsp?arnumber=6202711. Accessed 16 Jan 2017
Weijters A, Ribeiro J (2011) Flexible heuristics miner (FHM). In: 2011 IEEE symposium on computational intelligence and data mining (CIDM), pp 310–317
Wen L, van der Aalst WM, Wang J, Sun J (2007) Mining process models with non-free-choice constructs. Data Min Knowl Discov 15(2):145–180. http://link.springer.com/article/10.1007/s10618-007-0065-y. Accessed 26 May 2014
Acknowledgements
The authors would like to thank Massimiliano de Leoni and Alfredo Bolt for their advice and support to implement the PTandLogGenerator.
Author information
Authors and Affiliations
Corresponding author
Additional information
Accepted after two revisions by Prof. Dr. Bichler.
Rights and permissions
About this article
Cite this article
Jouck, T., Depaire, B. Generating Artificial Data for Empirical Analysis of Control-flow Discovery Algorithms. Bus Inf Syst Eng 61, 695–712 (2019). https://doi.org/10.1007/s12599-018-0541-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12599-018-0541-5