Skip to main content

A Functional Model for Dataspace Management Systems

  • Chapter
Advanced Query Processing

Abstract

Dataspace management systems (DSMSs) hold the promise of pay-as-you-go data integration. We describe a comprehensive model of DSMS functionality using an algebraic style. We begin by characterizing a dataspace life cycle and highlighting opportunities for both automation and user-driven improvement techniques. Building on the observation that many of the techniques developed in model management are of use in data integration contexts as well, we briefly introduce the model management area and explain how previous work on both data integration and model management needs extending if the full dataspace life cycle is to be supported.We show that many model management operators already enable important functionalities (e.g., the merging of schemas, the composition of mappings, etc.) and formulate these capabilities in an algebraic structure, thereby giving rise to the notion of the core functionality of a DSMS as a many-sorted algebra. Given this view, we show how core tasks in the dataspace life cycle can be enacted by means of algebraic programs. An extended case study illustrates how such algebraic programs capture a challenging, practical scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alexe, B., Chiticariu, L., Miller, R.J., Tan, W.C.: Muse: Mapping Understanding and deSign by Example. In: ICDE, pp. 10–19. IEEE (2008)

    Google Scholar 

  2. Atzeni, P., Bellomarini, L., Bugiotti, F., Gianforme, G.: MISM: A Platform for Model-Independent Solutions to Model Management Problems. In: Spaccapietra, S., Delcambre, L. (eds.) Journal on Data Semantics XIV. LNCS, vol. 5880, pp. 133–161. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  3. Atzeni, P., Cappellari, P., Torlone, R., Bernstein, P.A., Gianforme, G.: Model-Independent Schema Translation. VLDB J. 17(6), 1347–1370 (2008)

    Article  Google Scholar 

  4. Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and Ontology Matching with COMA++. In: Özcan, F. (ed.) SIGMOD Conference, pp. 906–908. ACM (2005)

    Google Scholar 

  5. Batini, C., Lenzerini, M., Navathe, S.B.: A Comparative Analysis of Methodologies for Database Schema Integration. ACM Comput. Surv. 18(4), 323–364 (1986)

    Article  Google Scholar 

  6. Belhajjame, K., Paton, N.W., Embury, S.M., Fernandes, A.A.A., Hedeler, C.: Feedback-based Annotation, Selection and Refinement of Schema Mappings for Dataspaces. In: EDBT, pp. 573–584 (2010)

    Google Scholar 

  7. Belhajjame, K., Paton, N.W., Fernandes, A.A.A., Hedeler, C., Embury, S.M.: User Feedback as a First Class Citizen in Information Integration Systems. In: CIDR (2011)

    Google Scholar 

  8. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L.: GenBank. Nucleic Acids Research 31(1), 23–27 (2003); Databases in biology: Genbank

    Article  Google Scholar 

  9. Bernstein, P.A., Halevy, A.Y., Pottinger, R.: A Vision of Management of Complex Models. SIGMOD Record 29(4), 55–63 (2000)

    Article  Google Scholar 

  10. Bernstein, P.A., Melnik, S.: Model Management 2.0: Manipulating Richer Mappings. In: Chan, C.Y., Ooi, B.C., Zhou, A. (eds.) SIGMOD Conference, pp. 1–12. ACM (2007)

    Google Scholar 

  11. Blunschi, L., Dittrich, J.-P., Girard, O.R., Karakashian, S.K., Salles, M.A.V.: A Dataspace Odyssey: The iMeMex Personal Dataspace Management System (Demo). In: CIDR, pp. 114–119 (2007)

    Google Scholar 

  12. Boyd, M., Kittivoravitkul, S., Lazanitis, C., Mçbrien, P., Rizopoulos, N.: AutoMed: A BAV Data Integration System for Heterogeneous Data Sources. In: Persson, A., Stirna, J. (eds.) CAiSE 2004. LNCS, vol. 3084, pp. 82–97. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  13. Boyd, M., Mçbrien, P.: Comparing and Transforming Between Data Models Via an Intermediate Hypergraph Data Model. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 69–109. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  14. Bult, C., Eppig, J., Kadin, J., Richardson, J., Blake, J., the members of the Mouse Genome Database Group: The Mouse Genome Database (MGD): Mouse Biology and Model Systems. Nucleic Acids Research 36(Database issue), D724–D728 (2008)

    Google Scholar 

  15. Cafarella, M.J., Halevy, A.Y., Khoussainova, N.: Data Integration for the Relational Web. PVLDB 2(1), 1090–1101 (2009)

    Google Scholar 

  16. Cao, H., Qi, Y., Candan, K.S., Sapino, M.L.: Feedback-driven Result Ranking and Query Refinement for Exploring Semi-structured Data Collections. In: EDBT, pp. 3–14 (2010)

    Google Scholar 

  17. Chiticariu, L., Hernández, M.A., Kolaitis, P.G., Popa, L.: Semi-Automatic Schema Integration in Clio. In: VLDB, pp. 1326–1329 (2007)

    Google Scholar 

  18. Das Sarma, A., Dong, X., Halevy, A.: Bootstrapping Pay-As-You-Go Data Integration Systems. In: SIGMOD, pp. 861–874 (2008)

    Google Scholar 

  19. Dittrich, J.-P., Vaz Salles, M.A.: iDM: A Unified and Versatile Data Model for Personal Dataspace Management. In: VLDB, pp. 367–378 (2006)

    Google Scholar 

  20. Do, H.-H., Rahm, E.: COMA: A System for Flexible Combination of Schema Matching Approaches. In: VLDB, pp. 610–621 (2002)

    Google Scholar 

  21. Doan, A., Halevy, A.Y.: Semantic Integration Research in the Database Community: A Brief Survey. AI Magazine 26(1), 83–94 (2005)

    Google Scholar 

  22. Doan, A., Ramakrishnan, R., Chen, F., DeRose, P., Lee, Y., McCann, R., Sayyadian, M., Shen, W.: Community Information Management. IEEE Data Eng. Bull. 29(1), 64–72 (2006)

    Google Scholar 

  23. Dong, X., Halevy, A.Y.: A Platform for Personal Information Management and Integration. In: CIDR, pp. 119–130 (2005)

    Google Scholar 

  24. Dong, X., Halevy, A.Y., Yu, C.: Data Integration with Uncertainty. In: VLDB, pp. 687–698 (2007)

    Google Scholar 

  25. Dong, X.L., Halevy, A.Y., Yu, C.: Data Integration with Uncertainty. VLDB J. 18(2), 469–500 (2009)

    Article  Google Scholar 

  26. Flicek, P., Aken, B.L., Ballester, B., et al.: Ensembl’s 10th Year. Nucleic Acids Research 38(Database issue), D557–D562 (2010)

    Google Scholar 

  27. Franklin, M.J., Halevy, A.Y., Maier, D.: From Databases to Dataspaces: A New Abstraction for Information Management. SIGMOD Record 34(4), 27–33 (2005)

    Article  Google Scholar 

  28. Haas, L.M., Lin, E.T., Roth, M.A.: Data Integration through Database Federation. IBM Systems Journal 41(4), 578–596 (2002)

    Article  Google Scholar 

  29. Halevy, A.Y.: Answering Queries using Views: A Survey. The VLDB Journal 10(4), 270–294 (2001)

    Article  MATH  Google Scholar 

  30. Halevy, A.Y., Franklin, M.J., Maier, D.: Principles of Dataspace Systems. In: Vansummeren, S. (ed.) PODS, pp. 1–9. ACM (2006)

    Google Scholar 

  31. Halevy, A.Y., Rajaraman, A., Ordille, J.J.: Data Integration: The Teenage Years. In: Dayal, U., Whang, K.Y., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y.K. (eds.) VLDB, pp. 9–16. ACM (2006)

    Google Scholar 

  32. Hedeler, C., Belhajjame, K., Fernandes, A.A.A., Embury, S.M., Paton, N.W.: Dimensions of Dataspaces. In: Sexton, A.P. (ed.) BNCOD 26. LNCS, vol. 5588, pp. 55–66. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  33. Hedeler, C., Belhajjame, K., Mao, L., Paton, N.W., Fernandes, A.A.A., Guo, C., Embury, S.M.: Flexible Dataspace Management Through Model Management. In: EDBT/ICDT Workshops (2010)

    Google Scholar 

  34. Hedeler, C., Belhajjame, K., Paton, N.W., Campi, A., Fernandes, A.A.A., Embury, S.M.: Dataspaces. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 5950, pp. 114–134. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  35. Hedeler, C., Belhajjame, K., Paton, N.W., Fernandes, A.A.A., Embury, S.M., Mao, L., Guo, C.: Pay-As-You-Go Mapping Selection in Dataspaces. In: SIGMOD, pp. 1279–1282 (2011)

    Google Scholar 

  36. Hedeler, C., Paton, N.W.: Utilising the MISM Model Independent Schema Management Platform for Query Evaluation. In: Fernandes, A.A.A., Gray, A.J.G., Belhajjame, K. (eds.) BNCOD 2011. LNCS, vol. 7051, pp. 108–117. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  37. Hernández, M.A., Ho, H., Popa, L., Fuxman, A., Miller, R.J., Fukuda, T., Papotti, P.: Creating Nested Mappings with Clio. In: ICDE, pp. 1487–1488 (2007)

    Google Scholar 

  38. Howe, B., Maier, D., Rayner, N., Rucker, J.: Quarrying Dataspaces: Schemaless Profiling of Unfamiliar Information Sources. In: ICDE Workshops, pp. 270–277 (2008)

    Google Scholar 

  39. Ives, Z.G., Green, T.J., Karvounarakis, G., Taylor, N.E., Tannen, V., Talukdar, P.P., Jacob, M., Pereira, F.: The ORCHESTRA Collaborative Data Sharing System. SIGMOD Record 37(3), 26–32 (2008)

    Article  Google Scholar 

  40. Ives, Z.G., Knoblock, C.A., Minton, S., Jacob, M., Talukdar, P.P., Tuchinda, R., Ambite, J.L., Muslea, M., Gazen, C.: Interactive Data Integration through Smart Copy & Paste. In: CIDR (2009), www.crdrdb.org

  41. Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-As-You-Go User Feedback for Dataspace Systems. In: SIGMOD, pp. 847–860. (2008)

    Google Scholar 

  42. Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M., Hirakawa, M.: KEGG for Representation and Analysis of Molecular Networks Involving Diseases and Drugs. Nucleic Acicds Research 38(Database issue), D355–D360 (2010)

    Article  Google Scholar 

  43. Kensche, D., Quix, C., Li, X., Li, Y., Jarke, M.: Generic Schema Mappings for Composition and Query Answering. Data Knowl. Eng 68(7), 599–621 (2009)

    Article  Google Scholar 

  44. Kim, W., Seo, J.: Classifying Schematic and Data Heterogeneity in Multidatabase Systems. IEEE Computer 24(12), 12–18 (1991)

    Article  Google Scholar 

  45. Lenzerini, M.: Data Integration: A Theoretical Perspective. In: Popa, L. (ed.) PODS, pp. 233–246. ACM (2002)

    Google Scholar 

  46. Leser, U., Naumann, F.: (Almost) Hands-off Information Integration for the Life Sciences. In: CIDR, pp. 131–143 (2005)

    Google Scholar 

  47. Liu, J., Dong, X., Halevy, A.: Answering Structured Queries on Unstructured Data. In: WebDB, pp. 25–30 (2006)

    Google Scholar 

  48. Lorenzo, G.D., Hacid, H., Paik, H.Y., Benatallah, B.: Data Integration in Mashups. SIGMOD Record 38(1), 59–66 (2009)

    Article  Google Scholar 

  49. Madhavan, J., Cohen, S., Dong, X.L., Halevy, A.Y., Jeffery, S.R., Ko, D., Yu, C.: Web-scale data integration: You can afford to pay as you go. In: CIDR, pp. 342–350 (2007)

    Google Scholar 

  50. Mao, L., Belhajjame, K., Paton, N.W., Fernandes, A.A.A.: Defining and Using Schematic Correspondences for Automatically Generating Schema Mappings. In: van Eck, P., Gordijn, J., Wieringa, R. (eds.) CAiSE 2009. LNCS, vol. 5565, pp. 79–93. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  51. McCann, R., Shen, W., Doan, A.: Matching Schemas in Online Communities: A Web 2.0 Approach. In: ICDE, pp. 110–119 (2008)

    Google Scholar 

  52. McKusick, V.A.: Mendelian Inheritance in Man and Its Online Version, OMIM. Am. J. Hum. Genet. 80(4), 588–604 (2007), http://www.ncbi.nlm.nih.gov/omim/

    Article  Google Scholar 

  53. Mecca, G., Papotti, P., Raunich, S., Buoncristiano, M.: Concise and Expressive Mappings with +Spicy. PVLDB 2(2), 1582–1585 (2009)

    Google Scholar 

  54. Melnik, S.: Generic Model Management. LNCS, vol. 2967. Springer, Heidelberg (2004)

    MATH  Google Scholar 

  55. Melnik, S., Bernstein, P.A., Halevy, A., Rahm, E.: A Semantics for Model Management Operators. Technical Report MSR-TR-2004-59, Microsoft Research (2004)

    Google Scholar 

  56. Melnik, S., Bernstein, P.A., Halevy, A., Rahm, E.: Supporting Executable Mappings in Model Management. In: SIGMOD, pp. 167–178 (2005)

    Google Scholar 

  57. Melnik, S., Rahm, E., Bernstein, P.A.: Rondo: A Programming Platform for Generic Model Management. In: SIGMOD, pp. 193–204 (2003)

    Google Scholar 

  58. Miller, R.J., Haas, L.M., Hernández, M.A.: Schema Mapping as Query Discovery. In: VLDB, pp. 77–88 (2000)

    Google Scholar 

  59. Miller, R.J., Hernández, M.A., Haas, L.M., Yan, L., Ho, C.T.H., Fagin, R., Popa, L.: The Clio Project: Managing Heterogeneity. SIGMOD Record 30(1), 78–83 (2001)

    Article  Google Scholar 

  60. Parkinson, H., Sarkans, U., Kolesnikov, N., et al.: ArrayExpress Update - an Archive of Microarray and High-Throughput Sequencing-based Functional Genomics Experiments. Nucleic Acids Research (2010)

    Google Scholar 

  61. Poulovassilis, A., McBrien, P.: A General Formal Framework for Schema Transformation. Data Knowl. Eng. 28(1), 47–71 (1998)

    Article  MATH  Google Scholar 

  62. Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic Schema Matching. VLDB J. 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  63. Sarma, A.D., Dong, X. L., Halevy, A.Y.: Data Modeling in Dataspace Support Platforms. In: Borgida, A.T., Chaudhri, V.K., Giorgini, P., Yu, E.S. (eds.) Mylopoulos Festschrift. LNCS, vol. 5600, pp. 122–138. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  64. Talukdar, P.P., Ives, Z.G., Pereira, F.: Automatically Incorporating New Sources in Keyword Search-based Data Integration. In: Elmagarmid, A.K., Agrawal, D. (eds.) SIGMOD Conference, pp. 387–398. ACM (2010)

    Google Scholar 

  65. Talukdar, P.P., Jacob, M., Mehmood, M.S., Crammer, K., Ives, Z.G., Pereira, F., Guha, S.: Learning to Create Data-Integrating Queries. PVLDB 1(1), 785–796 (2008)

    Google Scholar 

  66. The Gene Ontology Consortium: Gene Ontology: Tool for the Unification of Biology. Nature Genetics 25(1), 25–29 (2000); Databases in Biology: Gene Ontology

    Google Scholar 

  67. Vaz Salles, M.A., Dittrich, J.-P., Karakashian, S.K., Girard, O.R., Blunschi, L.: iTrails: Pay-as-you-go Information Integration in Dataspaces. In: VLDB, pp. 663–674 (2007)

    Google Scholar 

  68. Vizcaíno, J.A., Côté, R., Reisinger, F., Foster, J.M., Mueller, M., Rameseder, J., Hermjakob, H., Martens, L.: A Guide to the Proteomics Identifications Database Proteomics Data Repository. Proteomics 9(18), 4276–4283 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cornelia Hedeler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hedeler, C. et al. (2013). A Functional Model for Dataspace Management Systems. In: Catania, B., Jain, L. (eds) Advanced Query Processing. Intelligent Systems Reference Library, vol 36. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28323-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28323-9_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28322-2

  • Online ISBN: 978-3-642-28323-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics