Abstract
One of the main steps toward integration or exchange of data is to design the mappings that describe the (often complex) relationships between the source schemas or formats and the desired target schema. In this paper, we introduce a new operator, called MapMerge, that can be used to correlate multiple, independently designed schema mappings of smaller scope into larger schema mappings. This allows a more modular construction of complex mappings from various types of smaller mappings such as schema correspondences produced by a schema matcher or pre-existing mappings that were designed by either a human user or via mapping tools. In particular, the new operator also enables a new “divide-and-merge” paradigm for mapping creation, where the design is divided (on purpose) into smaller components that are easier to create and understand and where MapMerge is used to automatically generate a meaningful overall mapping. We describe our MapMerge algorithm and demonstrate the feasibility of our implementation on several real and synthetic mapping scenarios. In our experiments, we make use of a novel similarity measure between two database instances with different schemas that quantifies the preservation of data associations. We show experimentally that MapMerge improves the quality of the schema mappings, by significantly increasing the similarity between the input source instance and the generated target instance. Finally, we provide a new algorithm that combines MapMerge with schema mapping composition to correlate flows of schema mappings.
Similar content being viewed by others
References
Alexe, B., Gubanov, M., Hernández, M.A., Ho, H., Huang, J.W., Katsis, Y., Popa, L., Saha, B., Stanoi, I.: Simplifying information integration: object-based flow-of-mappings framework for integration. In: BIRTE, pp. 108–121. Springer, Berlin (2009)
Alexe B., Hernández M.A., Popa L., Tan W.C.: MapMerge: correlating independent schema mappings. PVLDB 3(1), 81–92 (2010)
Beeri C., Vardi M.Y.: A proof procedure for data dependencies. JACM 31(4), 718–741 (1984)
Bonifati, A., Chang, E.Q., Ho, T., Lakshmanan, V.S., Pottinger, R.: HePToX: marrying XML and heterogeneity in your P2P databases. In: VLDB, pp. 1267–1270 (2005). http://www.vldb.org/conf/2005/papers/p1267-bonifati.pdf
Dessloch, S., Hernández, M.A., Wisnesky, R., Radwan, A., Zhou, J.: Orchid: integrating schema mapping and ETL. In: ICDE, pp. 1307–1316 (2008). http://doi.ieeecomputersociety.org/10.1109/ICDE.2008.4497540
Eiter T., Mannila H.: Distance measures for point sets and their computation. Acta Inform. 34(2), 109–133 (1997)
Fagin R., Haas L.M., Hernández M.A., Miller R.J., Popa L., Velegrakis Y.: Clio: schema mapping creation and data exchange. In: Borgida, A., Chaudhri, V.K., Giorgini, P., Yu, E.S.K. (eds) Conceptual Modeling: Foundations and Applications, pp. 198–236. Springer, Berlin (2009)
Fagin R., Kolaitis P.G., Miller R.J., Popa L.: Data exchange: semantics and query answering. TCS 336(1), 89–124 (2005)
Fagin R., Kolaitis P.G., Popa L., Tan W.: Composing schema mappings: second-order dependencies to the rescue. TODS 30(4), 994–1055 (2005)
Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Reverse data exchange: coping with nulls. In: PODS, pp. 23–32 (2009). http://doi.acm.org/10.1145/1559795.1559800
Fuxman, A., Hernández, M.A., Ho, C.T.H., Miller, R.J., Papotti, P., Popa, L.: Nested mappings: schema mapping reloaded. In: VLDB, pp. 67–78 (2006). http://www.vldb.org/conf/2006/p67-fuxman.pdf
Galindo-Legaria, C.A.: Outerjoins as disjunctions. In: SIGMOD Conference, pp. 348–358 (1994)
Kolaitis, P.G.: Schema mappings, data exchange, and metadata management. In: PODS, pp. 61–75 (2005). http://doi.acm.org/10.1145/1065167.1065176
Lenzerini, M.: Data integration: a theoretical perspective. In: PODS, pp. 233–246 (2002). http://doi.acm.org/10.1145/543613.543644, http://www.acm.org/sigs/sigmod/pods/proc02/papers/233-Lenzerini.pdf
Madhavan, J., Halevy, A.Y.: Composing mappings among data sources. In: VLDB, pp. 572–583 (2003). http://www.vldb.org/conf/2003/papers/S18P01.pdf
Maier D., Mendelzon A.O., Sagiv Y.: Testing implications of data dependencies. TODS 4(4), 455–469 (1979)
Melnik, S., Bernstein, P.A., Halevy, A.Y., Rahm, E.: Supporting executable mappings in model management. In: SIGMOD, pp. 167–178 (2005). http://doi.acm.org/10.1145/1066157.1066177
Nash, A., Bernstein, P.A., Melnik, S.: Composition of mappings given by embedded dependencies. ACM Trans. Database Syst. 32(1), 4 (2007). http://doi.acm.org/10.1145/1206049.1206053
Popa, L., Velegrakis, Y., Miller, R.J., Hernández, M.A., Fagin, R.: Translating web data. In: VLDB, pp. 598–609 (2002). http://www.vldb.org/conf/2002/S17P02.pdf
Rahm E., Bernstein P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)
Rajaraman, A., Ullman, J.D.: Integrating information by outerjoins and full disjunctions. In: PODS, pp. 238–248 (1996). http://doi.acm.org/10.1145/237661.237717
Simitsis, A., Vassiliadis, P., Sellis, T.K.: Optimizing ETL processes in data warehouses. In: ICDE, pp. 564–575 (2005). http://doi.ieeecomputersociety.org/10.1109/ICDE.2005.103
Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Modeling ETL activities as graphs. In: DMDW, pp. 52–61 (2002). http://SunSITE.Informatik.RWTH-Aachen.de/Publications/CEUR-WS/Vol-58/simitsis.pdf
Velegrakis, Y., Miller, R.J., Popa, L.: Mapping adaptation under evolving schemas. In: VLDB, pp. 584–595 (2003). http://www.vldb.org/conf/2003/papers/S18P02.pdf
Yu, C., Popa, L.: Semantic adaptation of schema mappings when schemas evolve. In: VLDB, pp. 1006–1017 (2005). http://www.vldb2005.org/program/paper/fri/p1006-yu.pdf
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alexe, B., Hernández, M., Popa, L. et al. MapMerge: correlating independent schema mappings. The VLDB Journal 21, 191–211 (2012). https://doi.org/10.1007/s00778-012-0264-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-012-0264-z