Skip to main content

Modelling Provenance Collection Points and Their Impact on Provenance Graphs

  • Conference paper
  • First Online:
Provenance and Annotation of Data and Processes (IPAW 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9672))

Included in the following conference series:

Abstract

As many domains employ ever more complex systems-of-systems, capturing provenance among component systems is increasingly important. Applications such as intrusion detection, load balancing, traffic routing, and insider threat detection all involve monitoring and analyzing the data provenance. Implicit in these applications is the assumption that “good” provenance is captured (e.g. complete provenance graphs, or one full path). When attempting to provide “good” provenance for a complex system of systems, it is necessary to know “how hard” the provenance-enabling will be and the likely quality of the provenance to be produced. In this work, we provide analytical results and simulation tools to assist in the scoping of the provenance enabling process. We provide use cases of complex systems-of-systems within which users wish to capture provenance. We describe the parameters that must be taken into account when undertaking the provenance-enabling of a system of systems. We provide a tool that models the interactions and types of capture agents involved in a complex systems-of-systems, including the set of known and unknown systems in the environment. The tool provides an estimation of quantity and type of capture agents that will need to be deployed for provenance-enablement in a complex system that is not completely known.

Approved for Public Release #16-0858. The authors’ affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITRE’s concurrence with, or support for, the positions, opinions or viewpoints expressed by the author.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    www.palantir.com.

References

  1. North American Profile of ISO19115:2003 - Geographic Information - Metadata. NAP Metadata Working Group (2005)

    Google Scholar 

  2. Allen, M.D., Chapman, A., Blaustein, B., Seligman, L.: Capturing provenance in the wild. In: McGuinness, D.L., Michaelis, J.R., Moreau, L. (eds.) IPAW 2010. LNCS, vol. 6378, pp. 98–101. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  3. Allen, M.D., Chapman, A., Seligman, L., Blaustein, B.: Provenance for collaboration: detecting suspicious behaviors and assessing trust in information. In: CollabCom (2011)

    Google Scholar 

  4. Altintas, I., Barney, O., Jaeger-Frank, E.: Provenance collection support in the Kepler scientific workflow system. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 118–132. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Asuncion, H.U.: Automated data provenance capture in spreadsheets, with case studies. Future Gener. Comput. Syst. 29, 2169–2181 (2013)

    Article  Google Scholar 

  6. Bankes, S.C.: Tools and techniques for developing policies for complex and uncertain systems. Proc. Natl. Acad. Sci. 99, 7263–7266 (2002)

    Article  Google Scholar 

  7. K. Belhajjame, J. Zhao, D. Garijo, A. Garrido, S. Soiland-Reyes, P. Alper, O. Corcho: A workflow PROV-corpus based on taverna and wings. In: Khalid Belhajjame, J.M.G.-P., Sahoo, S. (eds.) ProvBench (2013)

    Google Scholar 

  8. Caron, C., Amann, B., Constantin, C., Giroux, P.: WePIGE: the WebLab provenance information generator and explorer. In: EDBT (2014)

    Google Scholar 

  9. Dai, C., Lin, D., Kantarcioglu, M., Bertino, E., Celikel, E., Thuraisingham, B.: Query processing techniques for compliance with data confidence policies. In: Jonker, W., Petković, M. (eds.) SDM 2009. LNCS, vol. 5776, pp. 49–67. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Coe, G.B., Doty, R.C., Allen, M.D., Chapman, A.: Provenance capture disparities highlighted through datasets. In: Theory and Practice of Provenance (2014)

    Google Scholar 

  11. Conover, H., Ramachandran, R., Beaumont, B., Kulkarni, A., McEniry, M., Regner, K., Graves, S.: Introducing provenance capture into a legacy data system. IEEE Trans. Geosci. Remote Sens. 51, 5098–5104 (2013)

    Article  Google Scholar 

  12. Gammack, D., Chapman, A.: Provenance tipping point. In: Theory and Practice of Provenance (2015)

    Google Scholar 

  13. Gilbert, N., Terna, P.: How to build and use agent-based models in social science. Mind Soc. 1, 57–72 (2000)

    Article  Google Scholar 

  14. Gode, D., Sunder, S.: Allocative efficiency of markets with zero-intelligence traders: market as a partial substitute for individual rationality. J. Polit. Econ. 101, 119–137 (1993)

    Article  Google Scholar 

  15. A. Goderis, D. De Roure, C. Goble, J. Bhagat, D. Cruickshank, P. Fisher, D. Michaelides, F. Tanoh: Discovering scientific workflows: the myExperiment benchmarks. In: IEEE Transactions on Automation Science and Engineering (2008)

    Google Scholar 

  16. Groth, P., Gil, Y., Magliacane, S.: Automatic metadata annotation through reconstructing provenance. In: Third International Workshop on the role of Semantic Web in Provenance Management (2012)

    Google Scholar 

  17. Jackson, M.: The stability and efficiency of economic and social networks. In: Jackson, M.O. (ed.) Advances in Economic Design, pp. 319–361. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  18. Jackson, M., Watts, A.: The evolution of social and economic networks. J. Econ. Theor. 106, 265–295 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  19. Lerner, B., Boose, E.: RDataTracker: collecting provenance in an interactive scripting environment. In: Theory and Practice of Provenance (2014)

    Google Scholar 

  20. McPhillips, T., Song, T., Kolisnik, T., Aulenbach, S., Belhajjame, K., Bocinsky, K., Cao, Y., Chirigati, F., Dey, S., Freire, J., Huntzinger, D., Jones, C., Koop, D., Missier, P., Schildhauer, M., Schwalm, C., Wei, Y., Cheney, J., Bieda, M., Ludaescher, B.: YesWorkflow: a user-oriented, language-independent tool for recovering workflow information from scripts. Int. J. Digit. Curation 7, 92–100 (2015)

    Google Scholar 

  21. Missier, P., Chen, Z.: Extracting PROV provenance traces from Wikipedia history pages. In: EDBT (2013)

    Google Scholar 

  22. Muniswamy-Reddy, K.-K., Holland, D.A., Braun, U., Seltzer, M.I.: Provenance-aware storage systems. In: USENIX, pp. 43–56 (2006)

    Google Scholar 

  23. De Nies, T., Magliacane, S., Verborgh, R., Coppens, S., Groth, P., Mannens, E., Van de Walle, R.: Git2PROV: exposing version control system content as W3C PROV. In: Proceedings of the 12th International Semantic Web Conference (2013)

    Google Scholar 

  24. Park, H., Ikeda, R., Widom, J.: RAMP: a system for capturing and tracing provenance in MapReduce workflows. VLDB 4, 1351–1354 (2011)

    Google Scholar 

  25. Scheidegger, C.E., Vo, H.T., Koop, D., Freire, J. Silva, C.: Querying and re-using workflows with VisTrails. In: SIGMOD (2008)

    Google Scholar 

  26. Stamatogiannakis, M., Groth, P., Bos, H.: Looking inside the black-box: capturing data provenance using dynamic instrumentation. In: Ludaescher, B., Plale, B. (eds.) IPAW 2014. LNCS, vol. 8628, pp. 155–167. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  27. Tesfatsion, L.: Agent-based computational economics: modeling economies as complex adaptive systems. Inf. Sci. 149, 262–268 (2003)

    Article  Google Scholar 

  28. Wilensky, U.: NetLogo. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL (1999). http://ccl.northwestern.edu/netlogo

  29. Wolstencroft, K., Haines, R., et al.: The taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Res. 41, w557–w561 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adriane P. Chapman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Gammack, D., Scott, S., Chapman, A.P. (2016). Modelling Provenance Collection Points and Their Impact on Provenance Graphs. In: Mattoso, M., Glavic, B. (eds) Provenance and Annotation of Data and Processes. IPAW 2016. Lecture Notes in Computer Science(), vol 9672. Springer, Cham. https://doi.org/10.1007/978-3-319-40593-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40593-3_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40592-6

  • Online ISBN: 978-3-319-40593-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics