[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/BigData.2015.7363782guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Parallel meta-blocking: Realizing scalable entity resolution over large, heterogeneous data

Published: 29 October 2015 Publication History

Abstract

Entity resolution constitutes a crucial task for many applications, but has an inherently quadratic complexity. Typically, it scales to large volumes of data through blocking: similar entities are clustered into blocks so that it suffices to perform comparisons only within each block. Meta-blocking further increases efficiency by cleaning the overlapping blocks from unnecessary comparisons. However, even Meta-blocking can be time-consuming: applying it to blocks with 7.4 million entities and 2.21011 comparisons takes almost 8 days on a modern high-end server. In this paper, we parallelize Meta-blocking based on MapReduce. We propose a simple strategy that explicitly creates the core concept of Meta-blocking, the blocking graph. We then describe an advanced strategy that creates the blocking graph implicitly, reducing the overhead of data exchange. We also introduce a load balancing algorithm that distributes the computationally intensive workload evenly among the available compute nodes. Our experimental analysis verifies the superiority of our advanced strategy and demonstrates an almost linear speedup for all meta-blocking techniques with respect to the number of available nodes.

Cited By

View all
  • (2021)Deep learning for blocking in entity matchingProceedings of the VLDB Endowment10.14778/3476249.347629414:11(2459-2472)Online publication date: 27-Oct-2021
  • (2019)A noise tolerant and schema-agnostic blocking technique for entity resolutionProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3299730(422-430)Online publication date: 8-Apr-2019
  • (2019)Unsupervised blocking and probabilistic parallelisation for record matching of distributed big dataThe Journal of Supercomputing10.1007/s11227-017-2008-875:2(623-645)Online publication date: 1-Feb-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
BIG DATA '15: Proceedings of the 2015 IEEE International Conference on Big Data (Big Data)
October 2015
3094 pages
ISBN:9781479999262

Publisher

IEEE Computer Society

United States

Publication History

Published: 29 October 2015

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Deep learning for blocking in entity matchingProceedings of the VLDB Endowment10.14778/3476249.347629414:11(2459-2472)Online publication date: 27-Oct-2021
  • (2019)A noise tolerant and schema-agnostic blocking technique for entity resolutionProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3299730(422-430)Online publication date: 8-Apr-2019
  • (2019)Unsupervised blocking and probabilistic parallelisation for record matching of distributed big dataThe Journal of Supercomputing10.1007/s11227-017-2008-875:2(623-645)Online publication date: 1-Feb-2019
  • (2017)Multi-core Meta-blocking for Big Linked DataProceedings of the 13th International Conference on Semantic Systems10.1145/3132218.3132230(33-40)Online publication date: 11-Sep-2017
  • (2017)FalconProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3035960(1431-1446)Online publication date: 9-May-2017

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media