Abstract
Dataspace management systems (DSMSs) hold the promise of pay-as-you-go data integration. We describe a comprehensive model of DSMS functionality using an algebraic style. We begin by characterizing a dataspace life cycle and highlighting opportunities for both automation and user-driven improvement techniques. Building on the observation that many of the techniques developed in model management are of use in data integration contexts as well, we briefly introduce the model management area and explain how previous work on both data integration and model management needs extending if the full dataspace life cycle is to be supported.We show that many model management operators already enable important functionalities (e.g., the merging of schemas, the composition of mappings, etc.) and formulate these capabilities in an algebraic structure, thereby giving rise to the notion of the core functionality of a DSMS as a many-sorted algebra. Given this view, we show how core tasks in the dataspace life cycle can be enacted by means of algebraic programs. An extended case study illustrates how such algebraic programs capture a challenging, practical scenario.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alexe, B., Chiticariu, L., Miller, R.J., Tan, W.C.: Muse: Mapping Understanding and deSign by Example. In: ICDE, pp. 10–19. IEEE (2008)
Atzeni, P., Bellomarini, L., Bugiotti, F., Gianforme, G.: MISM: A Platform for Model-Independent Solutions to Model Management Problems. In: Spaccapietra, S., Delcambre, L. (eds.) Journal on Data Semantics XIV. LNCS, vol. 5880, pp. 133–161. Springer, Heidelberg (2009)
Atzeni, P., Cappellari, P., Torlone, R., Bernstein, P.A., Gianforme, G.: Model-Independent Schema Translation. VLDB J. 17(6), 1347–1370 (2008)
Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and Ontology Matching with COMA++. In: Özcan, F. (ed.) SIGMOD Conference, pp. 906–908. ACM (2005)
Batini, C., Lenzerini, M., Navathe, S.B.: A Comparative Analysis of Methodologies for Database Schema Integration. ACM Comput. Surv. 18(4), 323–364 (1986)
Belhajjame, K., Paton, N.W., Embury, S.M., Fernandes, A.A.A., Hedeler, C.: Feedback-based Annotation, Selection and Refinement of Schema Mappings for Dataspaces. In: EDBT, pp. 573–584 (2010)
Belhajjame, K., Paton, N.W., Fernandes, A.A.A., Hedeler, C., Embury, S.M.: User Feedback as a First Class Citizen in Information Integration Systems. In: CIDR (2011)
Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L.: GenBank. Nucleic Acids Research 31(1), 23–27 (2003); Databases in biology: Genbank
Bernstein, P.A., Halevy, A.Y., Pottinger, R.: A Vision of Management of Complex Models. SIGMOD Record 29(4), 55–63 (2000)
Bernstein, P.A., Melnik, S.: Model Management 2.0: Manipulating Richer Mappings. In: Chan, C.Y., Ooi, B.C., Zhou, A. (eds.) SIGMOD Conference, pp. 1–12. ACM (2007)
Blunschi, L., Dittrich, J.-P., Girard, O.R., Karakashian, S.K., Salles, M.A.V.: A Dataspace Odyssey: The iMeMex Personal Dataspace Management System (Demo). In: CIDR, pp. 114–119 (2007)
Boyd, M., Kittivoravitkul, S., Lazanitis, C., Mçbrien, P., Rizopoulos, N.: AutoMed: A BAV Data Integration System for Heterogeneous Data Sources. In: Persson, A., Stirna, J. (eds.) CAiSE 2004. LNCS, vol. 3084, pp. 82–97. Springer, Heidelberg (2004)
Boyd, M., Mçbrien, P.: Comparing and Transforming Between Data Models Via an Intermediate Hypergraph Data Model. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 69–109. Springer, Heidelberg (2005)
Bult, C., Eppig, J., Kadin, J., Richardson, J., Blake, J., the members of the Mouse Genome Database Group: The Mouse Genome Database (MGD): Mouse Biology and Model Systems. Nucleic Acids Research 36(Database issue), D724–D728 (2008)
Cafarella, M.J., Halevy, A.Y., Khoussainova, N.: Data Integration for the Relational Web. PVLDB 2(1), 1090–1101 (2009)
Cao, H., Qi, Y., Candan, K.S., Sapino, M.L.: Feedback-driven Result Ranking and Query Refinement for Exploring Semi-structured Data Collections. In: EDBT, pp. 3–14 (2010)
Chiticariu, L., Hernández, M.A., Kolaitis, P.G., Popa, L.: Semi-Automatic Schema Integration in Clio. In: VLDB, pp. 1326–1329 (2007)
Das Sarma, A., Dong, X., Halevy, A.: Bootstrapping Pay-As-You-Go Data Integration Systems. In: SIGMOD, pp. 861–874 (2008)
Dittrich, J.-P., Vaz Salles, M.A.: iDM: A Unified and Versatile Data Model for Personal Dataspace Management. In: VLDB, pp. 367–378 (2006)
Do, H.-H., Rahm, E.: COMA: A System for Flexible Combination of Schema Matching Approaches. In: VLDB, pp. 610–621 (2002)
Doan, A., Halevy, A.Y.: Semantic Integration Research in the Database Community: A Brief Survey. AI Magazine 26(1), 83–94 (2005)
Doan, A., Ramakrishnan, R., Chen, F., DeRose, P., Lee, Y., McCann, R., Sayyadian, M., Shen, W.: Community Information Management. IEEE Data Eng. Bull. 29(1), 64–72 (2006)
Dong, X., Halevy, A.Y.: A Platform for Personal Information Management and Integration. In: CIDR, pp. 119–130 (2005)
Dong, X., Halevy, A.Y., Yu, C.: Data Integration with Uncertainty. In: VLDB, pp. 687–698 (2007)
Dong, X.L., Halevy, A.Y., Yu, C.: Data Integration with Uncertainty. VLDB J. 18(2), 469–500 (2009)
Flicek, P., Aken, B.L., Ballester, B., et al.: Ensembl’s 10th Year. Nucleic Acids Research 38(Database issue), D557–D562 (2010)
Franklin, M.J., Halevy, A.Y., Maier, D.: From Databases to Dataspaces: A New Abstraction for Information Management. SIGMOD Record 34(4), 27–33 (2005)
Haas, L.M., Lin, E.T., Roth, M.A.: Data Integration through Database Federation. IBM Systems Journal 41(4), 578–596 (2002)
Halevy, A.Y.: Answering Queries using Views: A Survey. The VLDB Journal 10(4), 270–294 (2001)
Halevy, A.Y., Franklin, M.J., Maier, D.: Principles of Dataspace Systems. In: Vansummeren, S. (ed.) PODS, pp. 1–9. ACM (2006)
Halevy, A.Y., Rajaraman, A., Ordille, J.J.: Data Integration: The Teenage Years. In: Dayal, U., Whang, K.Y., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y.K. (eds.) VLDB, pp. 9–16. ACM (2006)
Hedeler, C., Belhajjame, K., Fernandes, A.A.A., Embury, S.M., Paton, N.W.: Dimensions of Dataspaces. In: Sexton, A.P. (ed.) BNCOD 26. LNCS, vol. 5588, pp. 55–66. Springer, Heidelberg (2009)
Hedeler, C., Belhajjame, K., Mao, L., Paton, N.W., Fernandes, A.A.A., Guo, C., Embury, S.M.: Flexible Dataspace Management Through Model Management. In: EDBT/ICDT Workshops (2010)
Hedeler, C., Belhajjame, K., Paton, N.W., Campi, A., Fernandes, A.A.A., Embury, S.M.: Dataspaces. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 5950, pp. 114–134. Springer, Heidelberg (2010)
Hedeler, C., Belhajjame, K., Paton, N.W., Fernandes, A.A.A., Embury, S.M., Mao, L., Guo, C.: Pay-As-You-Go Mapping Selection in Dataspaces. In: SIGMOD, pp. 1279–1282 (2011)
Hedeler, C., Paton, N.W.: Utilising the MISM Model Independent Schema Management Platform for Query Evaluation. In: Fernandes, A.A.A., Gray, A.J.G., Belhajjame, K. (eds.) BNCOD 2011. LNCS, vol. 7051, pp. 108–117. Springer, Heidelberg (2011)
Hernández, M.A., Ho, H., Popa, L., Fuxman, A., Miller, R.J., Fukuda, T., Papotti, P.: Creating Nested Mappings with Clio. In: ICDE, pp. 1487–1488 (2007)
Howe, B., Maier, D., Rayner, N., Rucker, J.: Quarrying Dataspaces: Schemaless Profiling of Unfamiliar Information Sources. In: ICDE Workshops, pp. 270–277 (2008)
Ives, Z.G., Green, T.J., Karvounarakis, G., Taylor, N.E., Tannen, V., Talukdar, P.P., Jacob, M., Pereira, F.: The ORCHESTRA Collaborative Data Sharing System. SIGMOD Record 37(3), 26–32 (2008)
Ives, Z.G., Knoblock, C.A., Minton, S., Jacob, M., Talukdar, P.P., Tuchinda, R., Ambite, J.L., Muslea, M., Gazen, C.: Interactive Data Integration through Smart Copy & Paste. In: CIDR (2009), www.crdrdb.org
Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-As-You-Go User Feedback for Dataspace Systems. In: SIGMOD, pp. 847–860. (2008)
Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M., Hirakawa, M.: KEGG for Representation and Analysis of Molecular Networks Involving Diseases and Drugs. Nucleic Acicds Research 38(Database issue), D355–D360 (2010)
Kensche, D., Quix, C., Li, X., Li, Y., Jarke, M.: Generic Schema Mappings for Composition and Query Answering. Data Knowl. Eng 68(7), 599–621 (2009)
Kim, W., Seo, J.: Classifying Schematic and Data Heterogeneity in Multidatabase Systems. IEEE Computer 24(12), 12–18 (1991)
Lenzerini, M.: Data Integration: A Theoretical Perspective. In: Popa, L. (ed.) PODS, pp. 233–246. ACM (2002)
Leser, U., Naumann, F.: (Almost) Hands-off Information Integration for the Life Sciences. In: CIDR, pp. 131–143 (2005)
Liu, J., Dong, X., Halevy, A.: Answering Structured Queries on Unstructured Data. In: WebDB, pp. 25–30 (2006)
Lorenzo, G.D., Hacid, H., Paik, H.Y., Benatallah, B.: Data Integration in Mashups. SIGMOD Record 38(1), 59–66 (2009)
Madhavan, J., Cohen, S., Dong, X.L., Halevy, A.Y., Jeffery, S.R., Ko, D., Yu, C.: Web-scale data integration: You can afford to pay as you go. In: CIDR, pp. 342–350 (2007)
Mao, L., Belhajjame, K., Paton, N.W., Fernandes, A.A.A.: Defining and Using Schematic Correspondences for Automatically Generating Schema Mappings. In: van Eck, P., Gordijn, J., Wieringa, R. (eds.) CAiSE 2009. LNCS, vol. 5565, pp. 79–93. Springer, Heidelberg (2009)
McCann, R., Shen, W., Doan, A.: Matching Schemas in Online Communities: A Web 2.0 Approach. In: ICDE, pp. 110–119 (2008)
McKusick, V.A.: Mendelian Inheritance in Man and Its Online Version, OMIM. Am. J. Hum. Genet. 80(4), 588–604 (2007), http://www.ncbi.nlm.nih.gov/omim/
Mecca, G., Papotti, P., Raunich, S., Buoncristiano, M.: Concise and Expressive Mappings with +Spicy. PVLDB 2(2), 1582–1585 (2009)
Melnik, S.: Generic Model Management. LNCS, vol. 2967. Springer, Heidelberg (2004)
Melnik, S., Bernstein, P.A., Halevy, A., Rahm, E.: A Semantics for Model Management Operators. Technical Report MSR-TR-2004-59, Microsoft Research (2004)
Melnik, S., Bernstein, P.A., Halevy, A., Rahm, E.: Supporting Executable Mappings in Model Management. In: SIGMOD, pp. 167–178 (2005)
Melnik, S., Rahm, E., Bernstein, P.A.: Rondo: A Programming Platform for Generic Model Management. In: SIGMOD, pp. 193–204 (2003)
Miller, R.J., Haas, L.M., Hernández, M.A.: Schema Mapping as Query Discovery. In: VLDB, pp. 77–88 (2000)
Miller, R.J., Hernández, M.A., Haas, L.M., Yan, L., Ho, C.T.H., Fagin, R., Popa, L.: The Clio Project: Managing Heterogeneity. SIGMOD Record 30(1), 78–83 (2001)
Parkinson, H., Sarkans, U., Kolesnikov, N., et al.: ArrayExpress Update - an Archive of Microarray and High-Throughput Sequencing-based Functional Genomics Experiments. Nucleic Acids Research (2010)
Poulovassilis, A., McBrien, P.: A General Formal Framework for Schema Transformation. Data Knowl. Eng. 28(1), 47–71 (1998)
Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic Schema Matching. VLDB J. 10(4), 334–350 (2001)
Sarma, A.D., Dong, X. L., Halevy, A.Y.: Data Modeling in Dataspace Support Platforms. In: Borgida, A.T., Chaudhri, V.K., Giorgini, P., Yu, E.S. (eds.) Mylopoulos Festschrift. LNCS, vol. 5600, pp. 122–138. Springer, Heidelberg (2009)
Talukdar, P.P., Ives, Z.G., Pereira, F.: Automatically Incorporating New Sources in Keyword Search-based Data Integration. In: Elmagarmid, A.K., Agrawal, D. (eds.) SIGMOD Conference, pp. 387–398. ACM (2010)
Talukdar, P.P., Jacob, M., Mehmood, M.S., Crammer, K., Ives, Z.G., Pereira, F., Guha, S.: Learning to Create Data-Integrating Queries. PVLDB 1(1), 785–796 (2008)
The Gene Ontology Consortium: Gene Ontology: Tool for the Unification of Biology. Nature Genetics 25(1), 25–29 (2000); Databases in Biology: Gene Ontology
Vaz Salles, M.A., Dittrich, J.-P., Karakashian, S.K., Girard, O.R., Blunschi, L.: iTrails: Pay-as-you-go Information Integration in Dataspaces. In: VLDB, pp. 663–674 (2007)
VizcaÃno, J.A., Côté, R., Reisinger, F., Foster, J.M., Mueller, M., Rameseder, J., Hermjakob, H., Martens, L.: A Guide to the Proteomics Identifications Database Proteomics Data Repository. Proteomics 9(18), 4276–4283 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hedeler, C. et al. (2013). A Functional Model for Dataspace Management Systems. In: Catania, B., Jain, L. (eds) Advanced Query Processing. Intelligent Systems Reference Library, vol 36. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28323-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-28323-9_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28322-2
Online ISBN: 978-3-642-28323-9
eBook Packages: EngineeringEngineering (R0)