[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1150402.1150463acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Query-time entity resolution

Published: 20 August 2006 Publication History

Abstract

The goal of entity resolution is to reconcile database references corresponding to the same real-world entities. Given the abundance of publicly available databases where entities are not resolved, we motivate the problem of quickly processing queries that require resolved entities from such 'unclean' databases. We propose a two-stage collective resolution strategy for processing queries. We then show how it can be performed on-the-fly by adaptively extracting and resolving those database references that are the most helpful for resolving the query. We validate our approach on two large real-world publication databases where we show the usefulness of collective resolution and at the same time demonstrate the need for adaptive strategies for query processing. We then show how the same queries can be answered in real time using our adaptive approach while preserving the gains of collective resolution.

References

[1]
R. Ananthakrishna, S. Chaudhuri, and V. Ganti. Eliminating fuzzy duplicates in data warehouses. In VLDB, 2002.
[2]
I. Bhattacharya and L. Getoor. Iterative record linkage for cleaning and integration. In SIGMOD Workshop on Data Mining and Knowledge Discovery, 2004.
[3]
I. Bhattacharya and L. Getoor. Entity Resolution in Graphs, chapter Entity Resolution in Graphs. Wiley, 2006.
[4]
M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg. Adaptive name matching in information integration. IEEE Intelligent Systems, 18(5), 2003.
[5]
S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani. Robust and efficient fuzzy match for online data cleaning. In SIGMOD, 2003.
[6]
X. Dong, A. Halevy, and J. Madhavan. Reference reconciliation in complex information spaces. In SIGMOD, 2005.
[7]
D. Draper and S. Hanks. Localized partial evaluation of belief networks. In UAI, 1994.
[8]
A. Fuxman, E. Fazli, and R. Miller. Conquer: Efficient management of inconsistent databases. In SIGMOD, 2005.
[9]
M. Hernández and S. Stolfo. The merge/purge problem for large databases. In SIGMOD, 1995.
[10]
D. Kalashnikov, S. Mehrotra, and Z. Chen. Exploiting relationships for domain-independent data cleaning. In SIAM SDM, 2005.
[11]
X. Li, P. Morie, and D. Roth. Semantic integration in text: From ambiguous names to identifiable entities. AI Magazine. Special Issue on Semantic Integration, 2005.
[12]
A. McCallum, K. Nigam, and L. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In KDD, 2000.
[13]
A. McCallum and B. Wellner. Conditional models of identity uncertainty with application to noun coreference. In NIPS, 2004.
[14]
A. Monge and C. Elkan. The field matching problem: Algorithms and applications. In KDD, 1996.
[15]
Parag and P. Domingos. Multi-relational record linkage. In KDD Workshop on Multi-Relational Data Mining, 2004.
[16]
H. Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser. Identity uncertainty and citation matching. In NIPS, 2003.
[17]
S. Sarawagi and A. Bhamidipaty. Interactive deduplication using active learning. In KDD, 2002.

Cited By

View all
  • (2023)Heterogeneous Entity Matching with Complex Attribute Associations using BERT and Neural NetworksSSRN Electronic Journal10.2139/ssrn.4577447Online publication date: 2023
  • (2023)A Randomized Blocking Structure for Streaming Record LinkageProceedings of the VLDB Endowment10.14778/3611479.361148716:11(2783-2791)Online publication date: 24-Aug-2023
  • (2021)Modern Privacy-Preserving Record Linkage Techniques: An OverviewIEEE Transactions on Information Forensics and Security10.1109/TIFS.2021.311402616(4966-4987)Online publication date: 2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2006
986 pages
ISBN:1595933395
DOI:10.1145/1150402
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adaptive
  2. entity resolution
  3. query
  4. relations

Qualifiers

  • Article

Conference

KDD06

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Heterogeneous Entity Matching with Complex Attribute Associations using BERT and Neural NetworksSSRN Electronic Journal10.2139/ssrn.4577447Online publication date: 2023
  • (2023)A Randomized Blocking Structure for Streaming Record LinkageProceedings of the VLDB Endowment10.14778/3611479.361148716:11(2783-2791)Online publication date: 24-Aug-2023
  • (2021)Modern Privacy-Preserving Record Linkage Techniques: An OverviewIEEE Transactions on Information Forensics and Security10.1109/TIFS.2021.311402616(4966-4987)Online publication date: 2021
  • (2021)MultiBlock: A Scalable Iterative Approach for Progressive Entity Resolution2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671540(219-228)Online publication date: 15-Dec-2021
  • (2020)Web scale taxonomy cleansingProceedings of the VLDB Endowment10.14778/3402755.34027634:12(1295-1306)Online publication date: 3-Jun-2020
  • (2020)Efficient Record Linkage in Data Streams2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378127(523-532)Online publication date: 10-Dec-2020
  • (2019)Summarizing and linking electronic health recordsDistributed and Parallel Databases10.1007/s10619-019-07263-0Online publication date: 18-Mar-2019
  • (2018)Fast schemes for online record linkageData Mining and Knowledge Discovery10.1007/s10618-018-0563-032:5(1229-1250)Online publication date: 1-Sep-2018
  • (2016)Recent Advances in Object IdentificationData and Information Quality10.1007/978-3-319-24106-7_9(217-277)Online publication date: 24-Mar-2016
  • (2015)On-the-fly entity resolution from distributed social media sources for mobile search and explorationProceedings of the 14th International Conference on Mobile and Ubiquitous Multimedia10.1145/2836041.2836043(14-24)Online publication date: 30-Nov-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media