Skip to main content

SPADE: Support for Provenance Auditing in Distributed Environments

  • Conference paper
Book cover Middleware 2012 (Middleware 2012)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7662))

Abstract

SPADE is an open source software infrastructure for data provenance collection and management. The underlying data model used throughout the system is graph-based, consisting of vertices and directed edges that are modeled after the node and relationship types described in the Open Provenance Model. The system has been designed to decouple the collection, storage, and querying of provenance metadata. At its core is a novel provenance kernel that mediates between the producers and consumers of provenance information, and handles the persistent storage of records. It operates as a service, peering with remote instances to enable distributed provenance queries. The provenance kernel on each host handles the buffering, filtering, and multiplexing of incoming metadata from multiple sources, including the operating system, applications, and manual curation. Provenance elements can be located locally with queries that use wildcard, fuzzy, proximity, range, and Boolean operators. Ancestor and descendant queries are transparently propagated across hosts until a terminating expression is satisfied, while distributed path queries are accelerated with provenance sketches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abraham, J., Brazier, P., Chebotko, A., Navarro, J., Piazza, A.: Distributed storage and querying techniques for a semantic Web of scientific workflow provenance. In: IEEE International Conference on Services Computing (2010)

    Google Scholar 

  2. Nedim Alpdemir, M., Mukherjee, A., Paton, N.W., Fernandes, A.A.A., Watson, P., Glover, K., Greenhalgh, C., Oinn, T., Tipney, H.: Contextualised Workflow Execution in MyGrid. In: Sloot, P.M.A., Hoekstra, A.G., Priol, T., Reinefeld, A., Bubak, M. (eds.) EGC 2005. LNCS, vol. 3470, pp. 444–453. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research 25 (1997)

    Google Scholar 

  4. Apache Web Server (Version 2.2.22), http://httpd.apache.org/

  5. BaBar, http://www-public.slac.stanford.edu/babar/

  6. Bhagwat, D., Chiticariu, L., Tan, W.-C., Vijayvargiya, G.: An annotation management system for relational databases. In: 30th ACM International Conference on Very Large Data Bases (2004)

    Google Scholar 

  7. Bose, R., Frew, J.: Lineage retrieval for scientific data processing: A survey. ACM Computing Surveys 37(1) (2005)

    Google Scholar 

  8. Callahan, S., Freire, J., Santos, E., Scheidegger, C., Silva, C., Vo, H.: VisTrails: Visualization meets data management. In: ACM SIGMOD International Conference on Management of Data (2006)

    Google Scholar 

  9. Chang, F., Dean, J., Ghemawat, S., Hsieh, W., Wallach, D., Burrows, M., Chandra, T., Fikes, A., Gruber, R.: BigTable: A distributed storage system for structured data. 7th USENIX Symposium on Operating Systems Design and Implementation (2006)

    Google Scholar 

  10. Event Tracing for Windows, http://msdn.microsoft.com/en-us/library/bb968803.aspx

  11. Foster, I.T., Vckler, J.-S., Wilde, M., Zhao, Y.: A virtual data system for representing, querying, and automating data derivation. In: Scientific and Statistical Database Management Conference (2002)

    Google Scholar 

  12. Frew, J., Bose, R.: Earth System Science Workbench: A data management infrastructure for earth science products. In: Scientific and Statistical Database Management Conference (2001)

    Google Scholar 

  13. Frew, J., Metzger, D., Slaughter, P.: Automatic capture and reconstruction of computational provenance. Concurrency and Computation 20(5) (2008)

    Google Scholar 

  14. Filesystem in Userspace, http://fuse.sourceforge.net

  15. Gehani, A., Lindqvist, U.: Bonsai: Balanced lineage authentication. In: 23rd Annual Computer Security Applications Conference. IEEE Computer Society (2007)

    Google Scholar 

  16. Gehani, A., Kim, M., Zhang, J.: Steps toward managing lineage metadata in Grid clusters. In: 1st Workshop on the Theory and Practice of Provenance (2009)

    Google Scholar 

  17. Gehani, A., Kim, M., Malik, T.: Efficient querying of distributed provenance stores. In: 8th ACM Workshop on the Challenges of Large Applications in Distributed Environments (2010)

    Google Scholar 

  18. Gehani, A., Kim, M.: Mendel: Efficiently verifying the lineage of data modified in multiple trust domains. In: 19th ACM International Symposium on High Performance Distributed Computing (2010)

    Google Scholar 

  19. Gehani, A., Tariq, D., Baig, B., Malik, T.: Policy-based integration of provenance metadata. In: 12th IEEE International Symposium on Policies for Distributed Systems and Networks (2011)

    Google Scholar 

  20. Glavic, B., Alonso, G.: Perm: Processing provenance and data on the same data model through query rewriting. In: 25th International Conference on Data Engineering (2009)

    Google Scholar 

  21. Graphviz, http://www.graphviz.org/

  22. Green, T., Karvounarakis, G., Tannen, V.: Provenance semirings. In: 26th ACM Symposium on Principles of Database Systems (2007)

    Google Scholar 

  23. Groth, P., Moreau, L.: Representing distributed systems using the Open Provenance Model. Future Generation Computer Systems 27(6) (2011)

    Google Scholar 

  24. Heydon, A., Levin, R., Mann, T., Yu, Y.: The Vesta Approach to Software Configuration Management. Technical Report 168, Compaq Systems Research Center (2001)

    Google Scholar 

  25. Heinis, T., Alonso, G.: Efficient lineage tracking for scientific workflows. In: ACM SIGMOD International Conference on Management of Data (2008)

    Google Scholar 

  26. Holland, D.A., Braun, U., Maclean, D., Muniswamy-Reddy, K., Seltzer, M.: Choosing a data model and query language for provenance. In: 2nd International Provenance and Annotation Workshop (2008)

    Google Scholar 

  27. H2, http://www.h2database.com

  28. Installable File System, http://msdn.microsoft.com/en-us/windows/hardware/gg463062.aspx

  29. Influenza Data, National Institutes of Health, ftp://ftp.ncbi.nlm.nih.gov/genomes/INFLUENZA/influenza.faa

  30. Java Data Base Connectivity, http://www.oracle.com/technetwork/java/overview-141217.html

  31. Java Native Interface, http://java.sun.com/docs/books/jni/

  32. Kementsietsidis, A., Wang, M.: On the efficiency of provenance queries. In: 25th International Conference on Data Engineering (2009)

    Google Scholar 

  33. Linux Audit, http://people.redhat.com/sgrubb/audit/

  34. LLVM, http://llvm.org

  35. lsof, ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof

  36. Apache Lucene, http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/queryparsersyntax.html

  37. MacFUSE, http://code.google.com/p/macfuse/

  38. Macko, P., Seltzer, M.: A general-purpose provenance library. In: 4th USENIX Workshop on the Theory and Practice of Provenance (2012)

    Google Scholar 

  39. Malik, T., Gehani, A., Tariq, D., Zaffar, F.: Sketching Distributed Data Provenance. In: Liu, Q., Bai, Q., Giugni, S., Williamson, D., Taylor, J. (eds.) Data Provenance and Data Management in eScience. SCI, vol. 426, pp. 85–108. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  40. MATLAB, http://www.mathworks.com/products/matlab/

  41. Miles, S., Deelman, E., Groth, P., Vahi, K., Mehta, G., Moreau, L.: Connecting scientific data to scientific experiments with provenance. In: 3rd IEEE International Conference on e-Science and Grid Computing (2007)

    Google Scholar 

  42. Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., Kwasnikowska, N., Miles, S., Missier, P., Myers, J., Plale, B., Simmhan, Y., Stephan, E., Van den Bussche, J.: The Open Provenance Model core specification (v1.1). Future Generation Computer Systems (2010)

    Google Scholar 

  43. MySQL, http://www.mysql.com/

  44. Neo4j, http://neo4j.org/

  45. Novel Information Gathering and Harvesting Techniques for Intelligence in Global Autonomous Language Exploitation, http://www.speech.sri.com/projects/GALE/

  46. OpenBSM, http://www.trustedbsd.org/openbsm.html

  47. Open Provenance Model, http://openprovenance.org/

  48. Pancerella, C., Hewson, J., Koegler, W., Leahy, D., Lee, M., Rahn, L., Yang, C., Myers, J.D., Didier, B., McCoy, R., Schuchardt, K., Stephan, E., Windus, T., Amin, K., Bittner, S., Lansing, C., Minkoff, M., Nijsure, S., van. Laszewski, G., Pinzon, R., Ruscic, B., Wagner, A., Wang, B., Pitz, W., Ho, Y.L., Montoya, D., Xu, L., Allison, T.C., Green Jr., W.H., Frenklach, M.: Metadata in the collaboratory for multi-scale chemical science. In: Dublin Core Conference (2003)

    Google Scholar 

  49. Process Monitor, Windows Sysinternals, http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx

  50. Rajgarhia, A., Gehani, A.: Performance and extension of user space file systems. In: 25th ACM Symposium on Applied Computing (2010)

    Google Scholar 

  51. Muniswamy-Reddy, K.-K., Holland, D.A., Braun, U., Seltzer, M.: Provenance-aware storage systems. In: USENIX Annual Technical Conference (2006)

    Google Scholar 

  52. Muniswamy-Reddy, K.-K, Braun, U., Holland, D.A., Macko, P., Maclean, D., Margo, D., Seltzer, M., Smogor, R.: Layering in provenance systems. In: USENIX Annual Technical Conference (2009)

    Google Scholar 

  53. Muniswamy-Reddy, K.-K., Macko, P., Seltzer, M.: Making a Cloud provenance-aware. In: 1st USENIX Workshop on the Theory and Practice of Provenance (2009)

    Google Scholar 

  54. Muniswamy-Reddy, K.-K., Macko, P., Seltzer, M.: Provenance for the Cloud. In: 8th USENIX Conference on File and Storage Technologies (2010)

    Google Scholar 

  55. Lineage File System, http://crypto.stanford.edu/~cao/lineage.html

  56. Scalable Authentication of Grid Data Provenance, http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0722068

  57. Silva, C.T., Freire, J., Callahan, S.: Provenance for visualizations: Reproducibility and beyond. Computing in Science and Engineering 9(5) (2007)

    Google Scholar 

  58. SLAC National Accelerator Laboratory, http://www.slac.stanford.edu/

  59. Support for Provenance Auditing in Distributed Environments, http://spade.csl.sri.com/

  60. Szomszor, M., Moreau, L.: Recording and Reasoning over Data Provenance in Web and Grid Services. In: Meersman, R., Schmidt, D.C. (eds.) CoopIS/DOA/ODBASE 2003. LNCS, vol. 2888, pp. 603–620. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  61. Tariq, D., Ali, M., Gehani, A.: Towards Automated Collection of Application-Level Data Provenance. In: 4th USENIX Workshop on the Theory and Practice of Provenance (2012)

    Google Scholar 

  62. Tupelo project, NCSA, http://tupeloproject.ncsa.uiuc.edu/node/2

  63. Windows Driver Kit, http://msdn.microsoft.com/en-us/windows/hardware/gg487428.aspx

  64. WebDAV, http://www.webdav.org/

  65. Widom, J.: Trio: A system for integrated management of data, accuracy and lineage. In: 2nd Conference on Innovative Data Systems Research (2005)

    Google Scholar 

  66. Windows Management Instrumentation, http://msdn.microsoft.com/en-us/library/aa394582(v=VS.85).aspx

    Google Scholar 

  67. Zhao, J., Goble, C.A., Stevens, R., Bechhofer, S.: Semantically Linking and Browsing Provenance Logs for E-science. In: Bouzeghoub, M., Goble, C.A., Kashyap, V., Spaccapietra, S. (eds.) ICSNW 2004. LNCS, vol. 3226, pp. 158–176. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  68. Zhou, W., Sherr, M., Tao, T., Li, X., Loo, B., Mao, Y.: Efficient querying and maintenance of network provenance at Internet-scale. In: ACM SIGMOD International Conference on Management of Data (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 IFIP International Federation for Information Processing

About this paper

Cite this paper

Gehani, A., Tariq, D. (2012). SPADE: Support for Provenance Auditing in Distributed Environments. In: Narasimhan, P., Triantafillou, P. (eds) Middleware 2012. Middleware 2012. Lecture Notes in Computer Science, vol 7662. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35170-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35170-9_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35169-3

  • Online ISBN: 978-3-642-35170-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics