Abstract
The increasing demand of matching and mapping tasks in modern integration scenarios has led to a plethora of tools for facilitating these tasks. While the plethora made these tools available to a broader audience, it led to some form of confusion regarding the exact nature, goals, core functionalities, expected features, and basic capabilities of these tools. Above all, it made performance measurements of these systems and their distinction a difficult task. The need for design and development of comparison standards that will allow the evaluation of these tools is becoming apparent. These standards are particularly important to mapping and matching system users, since they allow them to evaluate the relative merits of the systems and take the right business decisions. They are also important to mapping system developers, since they offer a way of comparing the system against competitors, and motivating improvements and further development. Finally, they are important to researchers as they serve as illustrations of the existing system limitations, triggering further research in the area. In this work, we provide a generic overview of the existing efforts on benchmarking schema matching and mapping tasks. We offer a comprehensive description of the problem, list the basic comparison criteria and techniques, and provide a description of the main functionalities and characteristics of existing systems.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Source: Merriam Webster dictionary.
- 2.
Netrics HD blog, April 2010: http://www.netrics.com/blog/a-data-matching-benchmark.
- 3.
http://www.informatik.uni-trier.de/∼ley/db/.
- 4.
biowarehouse.ai.sri.com.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
References
Abiteboul S, Hull R, Vianu V (1995) Foundations of databases. Addison-Wesley, MA
Alexe B, Chiticariu L, Miller RJ, Tan WC (2008a) Muse: Mapping understanding and deSign by example. In: ICDE. IEEE Computer Society, Washington, DC, pp 10–19
Alexe B, Tan WC, Velegrakis Y (2008b) Comparing and evaluating mapping systems with STBenchmark. Proc VLDB 1(2):1468–1471
Alexe B, Tan WC, Velegrakis Y (2008c) STBenchmark: Towards a benchmark for mapping systems. Proc VLDB 1(1):230–244
Alexe B, Hernandez M, Popa L, Tan WC (2010a) MapMerge: Correlating independent schema mappings. Proceedings of VLDB, vol 3(1). VLDB Endowment, pp 81–92
Alexe B, Kolaitis PG, Tan W (2010b) Characterizing schema mappings via data examples. In: PODS. ACM, NY, pp 261–272
Altova (2008) MapForce. http://www.altova.com
Atzeni P, Torlone R (1995) Schema translation between heterogeneous data models in a lattice framework. In: Data semantics conference. Chapman & Hall, London, pp 345–364
Aumueller D, Do HH, Massmann S, Rahm E (2005) Schema and ontology matching with COMA + + . In: SIGMOD. ACM, NY, pp 906–908
Barbosa D, Mendelzon AO, Keenleyside J, Lyons KA (2002) ToXgene: A template-based data generator for XML. In: SIGMOD. ACM, NY, p 616
Batini C, Lenzerini M, Navathe SB (1986) A comparative analysis of methodologies for database schema integration. ACM Comp Surv 18(4):323–364
Batista M, Salgado A (2007) Information Quality Measurement in Data Integration Schemas. In: Workshop on Quality in Databases, pp 61–72
Bergamaschi S, Domnori E, Guerra F, Orsini M, Lado RT, Velegrakis Y (2010) Keymantic: Semantic keyword based searching in data integration systems. Proceedings of VLDB, vol 3(2), pp 1637–1640
Bernstein PA, Melnik S (2007) Model management 2.0: Manipulating richer mappings. In: SIGMOD. ACM, NY, pp 1–12
Bernstein PA, Giunchiglia F, Kementsietsidis A, Mylopoulos J, Serafini L, Zaihrayeu I (2002) Data management for peer-to-peer computing: A vision. In: WebDB, pp 89–94
Bertinoro (ed) (2007) Bertinoro workshop on information integration,http://www.dis.uniroma1.it/∖∼lenzerin/INFINT2007
Bohme T, Rahm E (2001) XMach-1: A benchmark for XML data management. In: BTW. Springer, London, pp 264–273
Bonifati A, Chang EQ, Ho T, Lakshmanan LV, Pottinger R (2005) HePToX: Marrying XML and heterogeneity in your P2P databases. In: VLDB. VLDB Endowment, pp 1267–1270
Bonifati A, Mecca G, Pappalardo A, Raunich S, Summa G (2008a) Schema mapping verification: The spicy way. In: EDBT. ACM, NY, pp 85–96
Bonifati A, Mecca G, Pappalardo A, Raunich S, Summa G (2008b) The spicy system: Towards a notion of mapping quality. In: SIGMOD. ACM, NY, pp 1289–1294
Bonifati A, Chang EQ, Ho T, Lakshmanan LVS, Pottinger R, Chung Y (2010) Schema mapping and query translation in heterogeneous P2P XML databases. VLDB J 19(2): 231–256
Bonifati A, Mecca G, Papotti P, Velegrakis Y (2011) Discovery and correctness of schema mapping transformations. In: Bellahsene Z, Bonifati A, Rahm E (eds) Schema matching and mapping. Data-Centric Systems and Applications Series. Springer, Heidelberg
Bressan S, Dobbie G, Lacroix Z, Lee M, Li YG, Nambiar U, Wadhwa B (2001) X007: Applying 007 benchmark to XML query processing tool. In: CIKM. ACM, NY, pp 167–174
Carey MJ (2006) Data delivery in a service-oriented world: The BEA aquaLogic data services platform. In: SIGMOD. ACM, NY, pp 695–705
ten Cate B, Chiticariu L, Kolaitis P, Tan WC (2009) Laconic schema mappings: Computing core universal solutions by means of SQL queries. Proc VLDB 2(1):1006–1017
Chiticariu L, Tan WC (2006) Debugging schema mappings with routes. In: VLDB. VLDB Endowment, pp 79–90
Do HH, Rahm E (2002) COMA – A system for flexible combination of schema matching approaches. In: VLDB. VLDB Endowment, pp 610–621
Do HH, Melnik S, Rahm E (2002) Comparison of schema matching evaluations. In: Web, web-services, and database systems. ACM, NY, pp 221–237
Do HH, Melnik S, Rahm E (2003) Comparison of schema matching evaluations. In: Revised papers from the NODe 2002 web and database-related workshops on web, web-services, and database systems. Springer, London, pp 221–237
Doan A, Domingos P, Halevy AY (2001) Reconciling schemas of disparate data sources: A machine-learning approach. In: SIGMOD. ACM, NY, pp 509–520
Doan A, Madhavan J, Domingos P, Halevy AY (2004) Ontology matching: A machine learning approach. In: Handbook on ontologies. Springer, Heidelberg, pp 385–404
Duchateau F (2009) Towards a generic approach for schema matcher selection: Leveraging user pre- and post-match effort for improving quality and time performance. PhD thesis, Universite Montpellier II - Sciences et Techniques du Languedoc
Duchateau F, Bellahsene Z, Hunt E (2007) XBenchMatch: A benchmark for XML schema matching tools. In: VLDB. VLDB Endowment, pp 1318–1321
Duchateau F, Bellahsene Z, Roche M (2008) Improving quality and performance of schema matching in large scale. Ingenierie des Systemes d’Information 13(5):59–82
Euzenat J (2004) An API for ontology alignment. In: ISWC, pp 698–712
Euzenat J, Shvaiko P (2007) Ontology matching. Springer, Heidelberg
Euzenat J, Mochol M, Shvaiko P, Stuckenschmidt H, Svab O, Svatek V, van Hage WR, Yatskevich M (2006) Results of the ontology alignment evaluation initiative. In: Proceedings of the 1st International Workshop on Ontology Matching (OM-2006)
Fagin R, Kolaitis PG, Popa L (2003) Data exchange: Getting to the core. In: PODS. ACM, NY, pp 90–101
Fagin R, Kolaitis PG, Miller RJ, Popa L (2005) Data exchange: Semantics and query answering. Theor Comp Sci 336(1):89–124
Fagin R, Haas LM, Hernandez M, Miller RJ, Popa L, Velegrakis Y (2009a) Clio: Schema mapping creation and data exchange. In: Borgida A, Chaudhri V, Giorgini P, Yu E Conceptual modeling: Foundations and applications. Springer, Heidelberg, pp 198–236
Fagin R, Kolaitis PG, Popa L, Tan WC (2009b) Reverse data exchange: Coping with nulls. In: PODS. ACM, NY, pp 23–32
Fagin R, Kolaitis P, Popa L, Tan W (2011) Schema mapping evolution through composition and inversion. In: Bellahsene Z, Bonifati A, Rahm E (eds) Schema matching and mapping. Data-Centric Systems and Applications Series. Springer, Heidelberg
Ferrara A, Lorusso D, Montanelli S, Varese G (2008) Towards a benchmark for instance matching. In: Proceedings of the 3rd International Workshop on Ontology Matching (OM-2008)
Fletcher GHL, Wyss CM (2006) Data mapping as search. In: EDBT. Springer, Heidelberg,pp 95–111
Giunchiglia F, Shvaiko P, Yatskevich M (2004) S-Match: An algorithm and an implementation of semantic matching. In: ESWS. Springer, Heidelberg, pp 61–75
Giunchiglia F, Shvaiko P, Yatskevich M (2005) S-Match: An algorithm and an implementation of semantic matching. In: Dagstuhl seminar proceedings semantic interoperability and integration 2005
Giunchiglia F, Yatskevich M, Avesani P, Shvaiko P (2009) A large dataset for the evaluation of ontology matching. Knowl Eng Rev 24(2):137–157
Halevy AY, Ives ZG, Suciu D, Tatarinov I (2003) Schema mediation in peer data management systems. In: Proceedings of international conference on data engineering (ICDE), pp 505–516
Hammer J, Stonebraker M, Topsakal O (2005) THALIA: Test harness for the assessment of legacy information integration approaches. In: ICDE, pp 485–486
Heinzl S, Seiler D, Unterberger M, Nonenmacher A, Freisleben B (2009) MIRO: A mashup editor leveraging web, grid and cloud services. In: iiWAS. ACM, NY, pp 17–24
IBM (2006) Rational data architect. http://www.ibm.com/software/data/integration/rda
Ioannou E, Nejdl W, Niederée C, Velegrakis Y (2010) On-the-fly entity-aware query processing in the presence of linkage. Proceedings of VLDB, vol 3(1). VLDB Endowment, pp 429–438
Kang J, Naughton JF (2003) On schema matching with opaque column names and data values. In: SIGMOD. ACM, NY, pp 205–216
Kopcke H, Rahm E (2010) Frameworks for entity matching: A comparison. DKE 69(2):197–210
Lee Y, Sayyadian M, Doan A, Rosenthal A (2007) eTuner: Tuning schema matching software using synthetic scenarios. VLDB J 16(1):97–122
Legler F, Naumann F (2007) A classification of schema mappings and analysis of mapping tools. In: Proceedings BTW Conf., Aachen, pp 449–464
Lenzerini M (2002) Data integration: A theoretical perspective. In: PODS. ACM, NY, pp 233–246
Lerner BS (2000) A model for compound type changes encountered in schema evolution. TPCTC 25(1):83–127
MacKenzie IS, Sellen A, Buxton W (1991) A comparison of input devices in elemental pointing and dragging tasks. In: CHI. ACM, NY, pp 161–166
Madhavan J, Bernstein PA, Rahm E (2001) Generic schema matching with cupid. In: VLDB. Morgan Kaufmann, CA, pp 49–58
Mecca G, Papotti P, Raunich S (2009) Core schema mappings. In: SIGMOD. ACM, NY,pp 655–668
Melnik S, Garcia-Molina H, Rahm E (2002) Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: ICDE. IEEE Computer Society, Washington, DC, pp 117–128
Microsoft (2005) Visual studio. Msdn2.microsoft.com/en-us/ie/bb188238.aspx
Miller RJ, Haas LM, Hernandez MA (2000) Schema mapping as query discovery. In: VLDB. Morgan Kaufmann, CA, pp 77–88
Mork P, Seligman L, Rosenthal A, Korb J, Wolf C (2008) The harmony integration workbench. JODS 11:65–93
Naumann F, Ho CT, Tian X, Haas LM, Megiddo N (2002) Attribute classification using feature analysis. In: ICDE. IEEE Computer Society, Washington, DC, p 271
Okawara T, Morishima A, Sugimoto S (2006) An approach to the benchmark development for data exchange tools. In: Databases and applications. ACTA Press, CA, pp 19–25
Palmer C, Faloutsos C (2003) Electricity based external similarity of categorical attributes. In: Proceedings of PAKDD. Springer, Heidelberg, pp 486–500
Popa L, Velegrakis Y, Miller RJ, Hernandez MA, Fagin R (2002) Translating web data. In: VLDB. VLDB Endowment, pp 598–609
Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10(4):334–350
Runapongsa K, Patel JM, Jagadish HV, Al-Khalifa S (2002) The Michigan benchmark: A microbenchmark for XML query processing systems. In: EEXTT. Springer, London, pp 160–161
Schmidt AR, Waas F, Kersten ML, Carey MJ, Manolescu I, Busse R (2002) XMark: A benchmark for XML data management. In: VLDB. VLDB Endowment, pp 974–985
Simitsis A, Vassiliadis P, Dayal U, Karagiannis A, Tziovara V (2009) Benchmarking ETL workflows. In: TPCTC. Springer, Heidelberg, pp 199–220
Smith K, Morse M, Mork P, Li M, Rosenthal A, Allen D, Seligman L (2009) The role of schema matching in large enterprises. In: CIDR
Stylus Studio (2005) XML Enterprise Suite. http://www.stylusstudio.com
Transaction Processing Performance Council (2001) TPC-H Benchmark. Tpc.org
Van-Risbergen C (1979) Information retrieval, 2nd edn. Butterworths, London
Velegrakis Y (2005) Managing schema mappings in highly heterogeneous environments. PhD thesis, University of Toronro
Wun A (2009) Mashups. In: Encyclopedia of database systems. Springer, Heidelberg, pp 1696–1697
Yan L, Miller RJ, Haas LM, Fagin R (2001) Data-driven understanding and refinement of schema mappings. In: Proceedings of SGMOD conf. ACM, NY, pp 485–496
Yao B, Ozsu T, Khandelwal N (2004) XBench benchmark and performance testing of XML DBMSs. In: Proceedings of international conference on data engineering (ICDE). IEEE Computer Society, Washington, DC, pp 621–633
Yatskevich M (2003) Preliminary evaluation of schema matching systems. Tech. Rep. DIT-03-028, University of Trento
Acknowledgements
We are grateful to B. Alexe, L. Chiticariu, A. Kementsietsidis, E. Rahm, and P. Shvaiko for their valuable comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Bellahsene, Z., Bonifati, A., Duchateau, F., Velegrakis, Y. (2011). On Evaluating Schema Matching and Mapping. In: Bellahsene, Z., Bonifati, A., Rahm, E. (eds) Schema Matching and Mapping. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16518-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-16518-4_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16517-7
Online ISBN: 978-3-642-16518-4
eBook Packages: Computer ScienceComputer Science (R0)