[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/ICDE.2011.5767835guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Efficient SPectrAl Neighborhood blocking for entity resolution

Published: 11 April 2011 Publication History

Abstract

In many telecom and web applications, there is a need to identify whether data objects in the same source or different sources represent the same entity in the real-world. This problem arises for subscribers in multiple services, customers in supply chain management, and users in social networks when there lacks a unique identifier across multiple data sources to represent a real-world entity. Entity resolution is to identify and discover objects in the data sets that refer to the same entity in the real world. We investigate the entity resolution problem for large data sets where efficient and scalable solutions are needed. We propose a novel unsupervised blocking algorithm, namely SPectrAl Neighborhood (SPAN), which constructs a fast bipartition tree for the records based on spectral clustering such that real entities can be identified accurately by neighborhood records in the tree. There are two major novel aspects in our approach: 1)We develop a fast algorithm that performs spectral clustering without computing pairwise similarities explicitly, which dramatically improves the scalability of the standard spectral clustering algorithm; 2) We utilize a stopping criterion specified by Newman-Girvan modularity in the bipartition process. Our experimental results with both synthetic and real-world data demonstrate that SPAN is robust and outperforms other blocking algorithms in terms of accuracy while it is efficient and scalable to deal with large data sets.

Cited By

View all
  • (2020)An Overview of End-to-End Entity Resolution for Big DataACM Computing Surveys10.1145/341889653:6(1-42)Online publication date: 6-Dec-2020
  • (2020)Blocking and Filtering Techniques for Entity ResolutionACM Computing Surveys10.1145/337745553:2(1-42)Online publication date: 20-Mar-2020
  • (2017)Result Merging for Structured Queries on the Deep Web with Active Relevance Weight EstimationInformation Systems10.1016/j.is.2016.06.00564:C(93-103)Online publication date: 1-Mar-2017
  • Show More Cited By
  1. Efficient SPectrAl Neighborhood blocking for entity resolution

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ICDE '11: Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
    April 2011
    1457 pages
    ISBN:9781424489596

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 11 April 2011

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)An Overview of End-to-End Entity Resolution for Big DataACM Computing Surveys10.1145/341889653:6(1-42)Online publication date: 6-Dec-2020
    • (2020)Blocking and Filtering Techniques for Entity ResolutionACM Computing Surveys10.1145/337745553:2(1-42)Online publication date: 20-Mar-2020
    • (2017)Result Merging for Structured Queries on the Deep Web with Active Relevance Weight EstimationInformation Systems10.1016/j.is.2016.06.00564:C(93-103)Online publication date: 1-Mar-2017
    • (2017)Entity reconciliation in big data sourcesExpert Systems with Applications: An International Journal10.1016/j.eswa.2017.03.01080:C(14-27)Online publication date: 1-Sep-2017
    • (2016)Unsupervised Blocking of Imbalanced Datasets for Record MatchingProceedings of the 17th International Conference on Web Information Systems Engineering - Volume 1004210.1007/978-3-319-48743-4_14(172-186)Online publication date: 7-Nov-2016
    • (2013)MFIBlocksInformation Systems10.1016/j.is.2012.11.00838:6(908-926)Online publication date: 1-Sep-2013
    • (2012)LINDAProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398582(2104-2108)Online publication date: 29-Oct-2012

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media