Skip to main content

Data Integration

  • 329 Accesses

Synonyms

Information integration

Definitions

The goal of data integration systems is to provide a uniform access to a set of heterogeneous data sources. These sources can differ on the data model (relational, hierarchical, semi-structured), on the schema level, or on the query-processing capabilities. In a data integration architecture, these sources are queried by using a global schema, also called mediated schema, which provides a virtual view of the underlying sources.

Overview

Integrating data between different sources is a crucial step in many real-life applications, and the growth of structured data sources available on the Web is making this problem even more challenging. Consider as an example a Web application where users can query information about sport events planned in a particular day. In a traditional data management application, the information is stored in a database with a fixed schema (e.g., in a relational data management system) and retrieved by using a query....

This is a preview of subscription content, log in via an institution.

References

  • Balakrishnan S, Halevy AY, Harb B, Lee H, Madhavan J, Rostamizadeh A, Shen W, Wilder K, Wu F, Yu C (2015) Applying webtables in practice. In: CIDR

    Google Scholar 

  • Bernstein PA, Madhavan J, Rahm E (2011) Generic schema matching, ten years later. PVLDB 4(11): 695–701

    Google Scholar 

  • Chakrabarti K, Chaudhuri S, Chen Z, Ganjam K, He Y, Redmond W (2016) Data services leveraging bing’s data assets. IEEE Data Eng Bull 39(3):15–28

    Google Scholar 

  • Crescenzi V, Mecca G, Merialdo P (2001) Roadrunner: towards automatic data extraction from large web sites. In: VLDB 2001, proceedings of 27th international conference on very large data bases, Roma, 11–14 Sept 2001, pp 109–118

    Google Scholar 

  • Doan A, Halevy A, Ives Z (2012) Principles of data integration, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  • Franklin M, Halevy A, Maier D (2005) From databases to dataspaces: a new abstraction for information management. ACM SIGMOD Rec 34(4):27–33

    Article  Google Scholar 

  • Golshan B, Halevy AY, Mihaila GA, Tan W (2017) Data integration: after the teenage years. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI symposium on principles of database systems, PODS 2017, Chicago, 14–19 May 2017, pp 101–106

    Google Scholar 

  • Halevy AY, Ives ZG, Suciu D, Tatarinov I (2003) Schema mediation in peer data management systems. In: Proceedings 19th international conference on data engineering, 2003. IEEE, pp 505–516

    Google Scholar 

  • Halevy A, Rajaraman A, Ordille J (2006) Data integration: the teenage years. In: Proceedings of the 32nd international conference on very large data bases, VLDB Endowment, VLDB’06, pp 9–16

    Google Scholar 

  • Ives ZG, Florescu D, Friedman M, Levy A, Weld DS (1999) An adaptive query execution system for data integration. In: Proceedings of the 1999 ACM SIGMOD international conference on management of data, SIGMOD’99. ACM, New York, pp 299–310. https://doi.org/10.1145/304182.304209

    Chapter  Google Scholar 

  • Ives ZG, Halevy AY, Weld DS (2004) Adapting to source properties in processing data integration queries. In: Proceedings of the 2004 ACM SIGMOD international conference on management of data, SIGMOD’04, Paris, 13–18 June 2004. ACM, New York, pp 395–406. https://doi.org/10.1145/1007568.1007613

    Google Scholar 

  • Lenzerini M (2002) Data integration: a theoretical perspective. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS’02. ACM, New York, pp 233–246. https://doi.org/10.1145/543613.543644

    Chapter  Google Scholar 

  • Liu L, Zsu MT (2009) Encyclopedia of database systems, 1st edn. Springer, Incorporated, New York/London

    Book  Google Scholar 

  • Popa L, Velegrakis Y, Miller RJ, Hernández MA, Fagin R (2002) Translating web data. In: Proceedings of 28th international conference on very large data bases, VLDB 2002, Hong Kong, 20–23 Aug 2002, pp 598–609

    Chapter  Google Scholar 

  • Pottinger R, Halevy A (2001) Minicon: a scalable algorithm for answering queries using views. VLDB J 10(2–3):182–198

    MATH  Google Scholar 

  • Tatarinov I, Ives Z, Madhavan J, Halevy A, Suciu D, Dalvi N, Dong XL, Kadiyska Y, Miklau G, Mork P (2003) The piazza peer data management project. ACM SIGMOD Rec 32(3):47–52

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Paolo Papotti or Donatello Santoro .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Papotti, P., Santoro, D. (2018). Data Integration. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_6-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63962-8_6-1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63962-8

  • Online ISBN: 978-3-319-63962-8

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Chapter history

  1. Latest

    Data Integration
    Published:
    16 March 2022

    DOI: https://doi.org/10.1007/978-3-319-63962-8_6-2

  2. Original

    Data Integration
    Published:
    15 March 2018

    DOI: https://doi.org/10.1007/978-3-319-63962-8_6-1