skip to main content
research-article
Artifacts Available / v1.1

BrewER: Entity Resolution On-Demand

Authors Info & Claims
Published:01 August 2023Publication History
Skip Abstract Section

Abstract

The task of entity resolution (ER) aims to detect multiple records describing the same real-world entity in datasets and to consolidate them into a single consistent record. ER plays a fundamental role in guaranteeing good data quality, e.g., as input for data science pipelines. Yet, the traditional approach to ER requires cleaning the entire data before being able to run consistent queries on it; hence, users struggle to tackle common scenarios with limited time or resources (e.g., when the data changes frequently or the user is only interested in a portion of the dataset for the task).

We previously introduced BrewER, a framework to evaluate SQL SP queries on dirty data while progressively returning results as if they were issued on cleaned data, according to a priority defined by the user. In this demonstration, we show how BrewER can be exploited to ease the burden of ER, allowing data scientists to save a significant amount of resources for their tasks.

References

  1. Hotham Altwaijry et al. 2013. Query-Driven Approach to Entity Resolution. PVLDB 6, 14 (2013), 1846--1857.Google ScholarGoogle Scholar
  2. Hotham Altwaijry et al. 2015. QuERy: A Framework for Integrating Entity Resolution with Query Processing. PVLDB 9, 3 (2015), 120--131.Google ScholarGoogle Scholar
  3. Vassilis Christophides et al. 2021. An Overview of End-to-End Entity Resolution for Big Data. CSUR 53, 6 (2021), 127:1--127:42.Google ScholarGoogle Scholar
  4. Valter Crescenzi et al. 2021. Alaska: A Flexible Benchmark for Data Integration Tasks. arXiv preprint arXiv:2101.11259.Google ScholarGoogle Scholar
  5. Xin Luna Dong and Divesh Srivastava. 2015. Big Data Integration. Morgan & Claypool Publishers.Google ScholarGoogle Scholar
  6. Luca Gagliardelli et al. 2019. SparkER: Scaling Entity Resolution in Spark. In EDBT. OpenProceedings.org, 602--605.Google ScholarGoogle Scholar
  7. Mazhar Hameed and Felix Naumann. 2020. Data Preparation: A Survey of Commercial Tools. SIGMOD Record 49, 3 (2020), 18--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Pradap Konda et al. 2016. Magellan: Toward Building Entity Matching Management Systems. PVLDB 9, 12 (2016), 1197--1208.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Yuliang Li et al. 2020. Deep Entity Matching with Pre-Trained Language Models. PVLDB 14, 1 (2020), 50--60.Google ScholarGoogle Scholar
  10. Sidharth Mudgal et al. 2018. Deep Learning for Entity Matching: A Design Space Exploration. In SIGMOD. ACM, 19--34.Google ScholarGoogle Scholar
  11. Thorsten Papenbrock et al. 2015. Progressive Duplicate Detection. TKDE 27, 5 (2015), 1316--1329.Google ScholarGoogle Scholar
  12. Giovanni Simonini et al. 2018. Schema-agnostic Progressive Entity Resolution. In ICDE. IEEE Computer Society, 53--64.Google ScholarGoogle Scholar
  13. Giovanni Simonini et al. 2022. Entity Resolution On-Demand. PVLDB 15, 7 (2022), 1506--1518.Google ScholarGoogle Scholar
  14. Steven Euijong Whang et al. 2013. Pay-As-You-Go Entity Resolution. TKDE 25, 5 (2013), 1111--1124.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Article Metrics

    • Downloads (Last 12 months)62
    • Downloads (Last 6 weeks)8

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader