Skip to main content
Log in

Generating Artificial Data for Empirical Analysis of Control-flow Discovery Algorithms

A Process Tree and Log Generator

  • Research Paper
  • Published:
Business & Information Systems Engineering Aims and scope Submit manuscript

Abstract

Within the process mining domain, research on comparing control-flow (CF) discovery techniques has gained importance. A crucial building block of empirical analysis of CF discovery techniques is obtaining the appropriate evaluation data. Currently, there is no answer to the question of how to collect such evaluation data. The paper introduces a methodology for generating artificial event data (GED) and an implementation called the Process Tree and Log Generator. The GED methodology and its implementation provide users with full control over the characteristics of the generated event data and an integration within the ProM framework. Unlike existing approaches, there is no tradeoff between including long-term dependencies and soundness of the process. The contributions of the paper provide a solution for a necessary step in the empirical analysis of CF discovery algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Noise is defined in this paper as incorrect behavior in the log (see Sect. 4.4).

  2. Invisible activities are endpoints in the tree and hence are never selected to be replaced.

  3. Reducing parent and child loop nodes could cause the parent loop node to have more than three children.

  4. A descendant is a node reachable by repeatedly going from parent to child.

  5. Tree \(PT_1 = \circlearrowleft ^{k}(\times (a,b),c,d)\) is not trace equivalent to tree \(PT_2 = \times (\circlearrowleft ^{k}(a,c,d),\circlearrowleft ^{k}(b,c,d))\).

  6. Notice that activity e can be repeated once.

  7. https://github.com/tjouck/PTandLogGenerator.

  8. These mean relative frequencies of the operator types were calculated before the trees were reduced.

  9. The scalability of the log generation is outside the scope of this paper as the focus is on model generation with LT dependencies.

References

Download references

Acknowledgements

The authors would like to thank Massimiliano de Leoni and Alfredo Bolt for their advice and support to implement the PTandLogGenerator.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Toon Jouck.

Additional information

Accepted after two revisions by Prof. Dr. Bichler.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jouck, T., Depaire, B. Generating Artificial Data for Empirical Analysis of Control-flow Discovery Algorithms. Bus Inf Syst Eng 61, 695–712 (2019). https://doi.org/10.1007/s12599-018-0541-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12599-018-0541-5

Keywords

Navigation