[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
Skip header Section
Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate DetectionJuly 2012
Publisher:
  • Springer Publishing Company, Incorporated
ISBN:978-3-642-31163-5
Published:05 July 2012
Pages:
289
Skip Bibliometrics Section
Reflects downloads up to 14 Dec 2024Bibliometrics
Skip Abstract Section
Abstract

Data matching (also known as record or data linkage, entity resolution, object identification, or field matching) is the task of identifying, matching and merging records that correspond to the same entities from several databases or even within one database. Based on research in various domains including applied statistics, health informatics, data mining, machine learning, artificial intelligence, database management, and digital libraries, significant advances have been achieved over the last decade in all aspects of the data matching process, especially on how to improve the accuracy of data matching, and its scalability to large databases. Peter Christens book is divided into three parts: Part I, Overview, introduces the subject by presenting several sample applications and their special challenges, as well as a general overview of a generic data matching process. Part II, Steps of the Data Matching Process, then details its main steps like pre-processing, indexing, field and record comparison, classification, and quality evaluation. Lastly, part III, Further Topics, deals with specific aspects like privacy, real-time matching, or matching unstructured data. Finally, it briefly describes the main features of many research and open source systems available today. By providing the reader with a broad range of data matching concepts and techniques and touching on all aspects of the data matching process, this book helps researchers as well as students specializing in data quality or data matching aspects to familiarize themselves with recent research advances and to identify open research challenges in the area of data matching. To this end, each chapter of the book includes a final section that provides pointers to further background and research material. Practitioners will better understand the current state of the art in data matching as well as the internal workings and limitations of current systems. Especially, they will learn that it is often not feasible to simply implement an existing off-the-shelf data matching system without substantial adaption and customization. Such practical considerations are discussed for each of the major steps in the data matching process.

Cited By

  1. Sohail A and Qounain W (2024). Locality sensitive blocking (LSB), Journal of Information Science, 50:6, (1400-1413), Online publication date: 1-Dec-2024.
  2. ACM
    Rasch A (2024). (De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional Homomorphisms, ACM Transactions on Programming Languages and Systems, 46:3, (1-74), Online publication date: 30-Sep-2024.
  3. ACM
    Li H, Li S, Hao F, Zhang C, Song Y and Chen L BoostER: Leveraging Large Language Models for Enhancing Entity Resolution Companion Proceedings of the ACM Web Conference 2024, (1043-1046)
  4. ACM
    Dixon A, Thengo L, Kitsao E, Matiya K, Barasa M, Nyirongo R, Muli J, Kamanga F, Kachimanga C, Munyaneza F, Ngari P, Makungwa H, Chimpukuso J, Amulele M, Karari E and Mbae S (2023). Community and Facility Health Information System Integration in Malawi: A Comparison of Machine Learning and Probabilistic Record Linkage Methods, ACM Journal on Computing and Sustainable Societies, 1:2, (1-16), Online publication date: 31-Dec-2024.
  5. ACM
    Genossar B, Shraga R and Gal A (2023). FlexER: Flexible Entity Resolution for Multiple Intents, Proceedings of the ACM on Management of Data, 1:1, (1-27), Online publication date: 26-May-2023.
  6. ACM
    Wu R, Bendeck A, Chu X and He Y (2023). Ground Truth Inference for Weakly Supervised Entity Matching, Proceedings of the ACM on Management of Data, 1:1, (1-28), Online publication date: 26-May-2023.
  7. ACM
    O’hare K, Jurek-Loughrey A and De Campos C (2021). High-Value Token-Blocking: Efficient Blocking Method for Record Linkage, ACM Transactions on Knowledge Discovery from Data, 16:2, (1-17), Online publication date: 30-Apr-2022.
  8. ACM
    Schouten S, de Boer V, Petram L and van Erp M The Wind in Our Sails: Developing a Reusable and Maintainable Dutch Maritime History Knowledge Graph Proceedings of the 11th Knowledge Capture Conference, (97-104)
  9. ACM
    Xu R, Baracaldo N, Zhou Y, Anwar A, Joshi J and Ludwig H FedV Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security, (181-192)
  10. Thirumuruganathan S, Li H, Tang N, Ouzzani M, Govind Y, Paulsen D, Fung G and Doan A (2021). Deep learning for blocking in entity matching, Proceedings of the VLDB Endowment, 14:11, (2459-2472), Online publication date: 1-Jul-2021.
  11. ACM
    Barlaug N and Gulla J (2021). Neural Networks for Entity Matching: A Survey, ACM Transactions on Knowledge Discovery from Data, 15:3, (1-37), Online publication date: 30-Jun-2021.
  12. ACM
    Özcan F, Lei C, Quamar A and Efthymiou V Semantic enrichment of data for AI applications Proceedings of the Fifth Workshop on Data Management for End-To-End Machine Learning, (1-7)
  13. ACM
    Vretinaris A, Lei C, Efthymiou V, Qin X and Özcan F Medical Entity Disambiguation Using Graph Neural Networks Proceedings of the 2021 International Conference on Management of Data, (2310-2318)
  14. Peeters R and Bizer C (2021). Dual-objective fine-tuning of BERT for entity matching, Proceedings of the VLDB Endowment, 14:10, (1913-1921), Online publication date: 1-Jun-2021.
  15. ACM
    Loster M, Koumarelas I and Naumann F (2021). Knowledge Transfer for Entity Resolution with Siamese Neural Networks, Journal of Data and Information Quality, 13:1, (1-25), Online publication date: 31-Mar-2021.
  16. ACM
    Primpeli A and Bizer C Profiling Entity Matching Benchmark Tasks Proceedings of the 29th ACM International Conference on Information & Knowledge Management, (3101-3108)
  17. ACM
    Teong K, Soon L and Su T Schema-Agnostic Entity Matching using Pre-trained Language Models Proceedings of the 29th ACM International Conference on Information & Knowledge Management, (2241-2244)
  18. ACM
    Koumarelas I, Jiang L and Naumann F (2020). Data Preparation for Duplicate Detection, Journal of Data and Information Quality, 12:3, (1-24), Online publication date: 30-Sep-2020.
  19. ACM
    Doan A, Konda P, Suganthan G. C. P, Govind Y, Paulsen D, Chandrasekhar K, Martinkus P and Christie M (2020). Magellan, Communications of the ACM, 63:8, (83-91), Online publication date: 22-Jul-2020.
  20. Zhong Y, Matsubara M, Kobayashi M and Morishima A Effects of Cognitive Consistency in Microtask Design with only Auditory Information Universal Access in Human-Computer Interaction. Applications and Practice, (466-476)
  21. Saeedi A, Peukert E and Rahm E Incremental Multi-source Entity Resolution for Knowledge Graph Completion The Semantic Web, (393-408)
  22. Nascimento D, Santos Pires C and Nóbrega T (2020). Configurable assembly of classification rules for enhancing entity resolution results, Information Processing and Management: an International Journal, 57:3, Online publication date: 1-May-2020.
  23. Sarkhi A and Talburt J (2020). A scalable, hybrid entity resolution process for unstandardized entity references, Journal of Computing Sciences in Colleges, 35:9, (19-29), Online publication date: 1-Apr-2020.
  24. ACM
    Draisbach U, Christen P and Naumann F (2019). Transforming Pairwise Duplicates to Entity Clusters for High-quality Duplicate Detection, Journal of Data and Information Quality, 12:1, (1-30), Online publication date: 23-Jan-2020.
  25. Kim M, Paini D and Jurdak R (2018). Real-world diffusion dynamics based on point process approaches: a review, Artificial Intelligence Review, 53:1, (321-350), Online publication date: 1-Jan-2020.
  26. Nascimento D, Pires C and Mestre D (2019). Exploiting block co-occurrence to control block sizes for entity resolution, Knowledge and Information Systems, 62:1, (359-400), Online publication date: 1-Jan-2020.
  27. Kouki P, Pujara J, Marcum C, Koehly L and Getoor L (2019). Collective entity resolution in multi-relational familial networks, Knowledge and Information Systems, 61:3, (1547-1581), Online publication date: 1-Dec-2019.
  28. Baihan A, Ammar R, Aseltine R, Baihan M and Rajasekaran S Efficient Sequential and Parallel Algorithms for Incremental Record Linkage Computational Advances in Bio and Medical Sciences, (26-38)
  29. ACM
    Kimelfeld B and Martens W (2019). Technical Perspective, ACM SIGMOD Record, 48:1, (23-23), Online publication date: 5-Nov-2019.
  30. Shao J, Wang Q and Lin Y (2019). Skyblocking for entity resolution, Information Systems, 85:C, (30-43), Online publication date: 1-Nov-2019.
  31. Rasch A, Schulze R and Gorlatch S Generating Portable High-Performance Code via Multi-Dimensional Homomorphisms Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, (353-368)
  32. Fakhraei S, Mathew J and Ambite J NSEEN: Neural Semantic Embedding for Entity Normalization Machine Learning and Knowledge Discovery in Databases, (665-680)
  33. ACM
    Tai X, Soska K and Christin N Adversarial Matching of Dark Net Market Vendor Accounts Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (1871-1880)
  34. Zhang H, Slawski M and Li P Permutation Recovery from Multiple Measurement Vectors in Unlabeled Sensing 2019 IEEE International Symposium on Information Theory (ISIT), (1857-1861)
  35. ACM
    Ao J and Chirkova R Effective and Efficient Data Cleaning for Entity Matching Proceedings of the Workshop on Human-In-the-Loop Data Analytics, (1-7)
  36. ACM
    Pervaiz F, Vashistha A and Anderson R Examining the challenges in development data pipeline Proceedings of the 2nd ACM SIGCAS Conference on Computing and Sustainable Societies, (13-21)
  37. Alic A, Almeida J, Aloisio G, Andrade N, Antunes N, Ardagna D, Badia R, Basso T, Blanquer I, Braz T, Brito A, Elia D, Fiore S, Guedes D, Lattuada M, Lezzi D, Maciel M, Meira W, Mestre D, Moraes R, Morais F, Pires C, Kozievitch N, Santos W, Silva P and Vieira M (2022). BIGSEA, Future Generation Computer Systems, 96:C, (243-269), Online publication date: 1-Jul-2019.
  38. O’Hare K, Jurek-Loughrey A and de Campos C (2019). An unsupervised blocking technique for more efficient record linkage, Data & Knowledge Engineering, 122:C, (181-195), Online publication date: 1-Jul-2019.
  39. ACM
    Maurino A, Rula A, von B, Gomez M, Elvesæter B and Roman D Modelling and Linking Company Data in the euBusinessGraph Platform Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets, (1-6)
  40. ACM
    Govind Y, Konda P, Suganthan G.C. P, Martinkus P, Nagarajan P, Li H, Soundararajan A, Mudgal S, Ballard J, Zhang H, Ardalan A, Das S, Paulsen D, Singh Saini A, Paulson E, Park Y, Carter M, Sun M, Fung G and Doan A Entity Matching Meets Data Science Proceedings of the 2019 International Conference on Management of Data, (389-403)
  41. ACM
    Hou B, Chen Q, Shen J, Liu X, Zhong P, Wang Y, Chen Z and Li Z Gradual Machine Learning for Entity Resolution The World Wide Web Conference, (3526-3530)
  42. Nanayakkara C, Christen P and Ranbaduge T Robust Temporal Graph Clustering for Group Record Linkage Advances in Knowledge Discovery and Data Mining, (526-538)
  43. ACM
    Araújo T, Pires C, Mestre D, Nóbrega T, Nascimento D and Stefanidis K A noise tolerant and schema-agnostic blocking technique for entity resolution Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, (422-430)
  44. ACM
    Rasch A, Schulze R, Gorus W, Hiller J, Bartholomäus S and Gorlatch S High-performance probabilistic record linkage via multi-dimensional homomorphisms Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, (526-533)
  45. Vieira P, Lóscio B and Salgado A (2019). Incremental entity resolution process over query results for data integration systems, Journal of Intelligent Information Systems, 52:2, (451-471), Online publication date: 1-Apr-2019.
  46. Kong C, Gao M, Xu C, Fu Y, Qian W and Zhou A (2019). EnAli, Frontiers of Computer Science: Selected Publications from Chinese Universities, 13:1, (157-169), Online publication date: 1-Feb-2019.
  47. Gordeev D, Rey A and Shagarov D Unsupervised Cross-lingual Matching of Product Classifications Proceedings of the 23rd Conference of Open Innovations Association FRUCT, (459-464)
  48. C. P, Ardalan A, Doan A and Akella A (2018). Smurf, Proceedings of the VLDB Endowment, 12:3, (278-291), Online publication date: 1-Nov-2018.
  49. ACM
    Zhang Y, Ng K, Churchill T and Christen P Scalable Entity Resolution Using Probabilistic Signatures on Parallel Databases Proceedings of the 27th ACM International Conference on Information and Knowledge Management, (2213-2221)
  50. ACM
    Sun L, Zhang L and Ye X Randomized Bit Vector Proceedings of the 27th ACM International Conference on Information and Knowledge Management, (1263-1272)
  51. ACM
    Bhamidipaty A, Gruen D, Platz J and Vergo J Cognitive company discovery Proceedings of the 12th ACM Conference on Recommender Systems, (508-509)
  52. ACM
    Koumarelas I, Kroschk A, Mosley C and Naumann F (2018). Experience, Journal of Data and Information Quality, 10:2, (1-16), Online publication date: 13-Sep-2018.
  53. ACM
    Tan X and Huang J Levenshtein in Blocks World Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval, (171-174)
  54. Miller R (2018). Open data integration, Proceedings of the VLDB Endowment, 11:12, (2130-2139), Online publication date: 1-Aug-2018.
  55. Govind Y, Paulson E, Nagarajan P, C. P, Doan A, Park Y, Fung G, Conathan D, Carter M and Sun M (2018). Cloudmatcher, Proceedings of the VLDB Endowment, 11:12, (2042-2045), Online publication date: 1-Aug-2018.
  56. ACM
    Liu Q, Chao J, Mahoney T, Chern A, Min C, Javed F and Jijkoun V Lessons Learned from Developing and Deploying a Large-Scale Employer Name Normalization System for Online Recruitment Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (556-565)
  57. ACM
    Boussis D, Dritsas E, Kanavos A, Sioutas S, Tzimas G and Verykios V MapReduce Implementations for Privacy Preserving Record Linkage Proceedings of the 10th Hellenic Conference on Artificial Intelligence, (1-4)
  58. ACM
    Huang J, Hu W, Li H and Qu Y Automated Comparative Table Generation for Facilitating Human Intervention in Multi-Entity Resolution The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, (585-594)
  59. Carreras G, Simonetti M, Cricelli C and Lapi F (2018). Deterministic and Probabilistic Record Linkage, Journal of Medical Systems, 42:5, (1-3), Online publication date: 1-May-2018.
  60. Lu W, Dai H, Zhang Z, Wu C and Zhuang Y (2018). Active instance matching with pairwise constraints and its application to Chinese knowledge base construction, Knowledge and Information Systems, 55:1, (171-214), Online publication date: 1-Apr-2018.
  61. ACM
    Tuarob S, Strong R, Chandra A and Tucker C (2018). Discovering Discontinuity in Big Financial Transaction Data, ACM Transactions on Management Information Systems, 9:1, (1-26), Online publication date: 31-Mar-2018.
  62. ACM
    Zhu L, Du X, Ma Q, Meng W and Liu H Keyword Search with Real-time Entity Resolution in Relational Databases Proceedings of the 2018 10th International Conference on Machine Learning and Computing, (134-139)
  63. Rashtchian C, Makarychev K, Rácz M, Ang S, Jevdjic D, Yekhanin S, Ceze L and Strauss K Clustering billions of reads for DNA data storage Proceedings of the 31st International Conference on Neural Information Processing Systems, (3362-3373)
  64. ACM
    Wu J, Sefid A, Ge A and Giles C A Supervised Learning Approach To Entity Matching Between Scholarly Big Datasets Proceedings of the 9th Knowledge Capture Conference, (1-4)
  65. ACM
    Anindya I, Roy H, Kantarcioglu M and Malin B Building a Dossier on the Cheap Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, (1549-1558)
  66. Chi Y, Hong J, Jurek A, Liu W and O’Reilly D (2017). Privacy preserving record linkage in the presence of missing values, Information Systems, 71:C, (199-210), Online publication date: 1-Nov-2017.
  67. Jurek A, Hong J, Chi Y and Liu W (2017). A novel ensemble learning approach to unsupervised record linkage, Information Systems, 71:C, (40-54), Online publication date: 1-Nov-2017.
  68. Reyes-Galaviz O, Pedrycz W, He Z and Pizzi N (2017). A supervised gradient-based learning algorithm for optimized entity resolution, Data & Knowledge Engineering, 112:C, (106-129), Online publication date: 1-Nov-2017.
  69. Fernández-Álvarez D, Gayo J, Gayo-Avello D and Ordóñez de Pablos P (2017). MERA, International Journal on Semantic Web & Information Systems, 13:4, (42-67), Online publication date: 1-Oct-2017.
  70. Abdelkrim OUHAB , Mimoun MALKI , Djamel BERRABAH and Faouzi BOUFARES (2017). An Unsupervised Entity Resolution Framework for English and Arabic Datasets, International Journal of Strategic Information Technology and Applications, 8:4, (16-29), Online publication date: 1-Oct-2017.
  71. ACM
    Al-janabi S, Hamid A and Janicki R datumPIPE Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, (589-592)
  72. Malmi E, Rasa M and Gionis A AncestryAI Proceedings of the 26th International Conference on World Wide Web Companion, (257-261)
  73. ACM
    Shu K, Wang S, Tang J, Zafarani R and Liu H (2017). User Identity Linkage across Online Social Networks, ACM SIGKDD Explorations Newsletter, 18:2, (5-17), Online publication date: 22-Mar-2017.
  74. Mestre D, Pires C and Nascimento D (2017). Towards the efficient parallelization of multi-pass adaptive blocking for entity matching, Journal of Parallel and Distributed Computing, 101:C, (27-40), Online publication date: 1-Mar-2017.
  75. Mazumdar A and Saha B A theoretical analysis of first heuristics of crowdsourced entity resolution Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, (970-976)
  76. Zanella G, Betancourt B, Wallach H, Miller J, Zaidi A and Steorts R Flexible models for microclustering with application to entity resolution Proceedings of the 30th International Conference on Neural Information Processing Systems, (1425-1433)
  77. Kim T, Hwang M, Kim Y and Jeong D (2016). Entity Resolution Approach of Data Stream Management Systems, Wireless Personal Communications: An International Journal, 91:4, (1621-1634), Online publication date: 1-Dec-2016.
  78. Karapiperis D and Verykios V (2016). A fast and efficient Hamming LSH-based scheme for accurate linkage, Knowledge and Information Systems, 49:3, (861-884), Online publication date: 1-Dec-2016.
  79. ACM
    Montoya D, Pellissier Tanon T, Abiteboul S and Suchanek F Thymeflow, A Personal Knowledge Base with Spatio-temporal Data Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, (2477-2480)
  80. Konda P, Das S, C. P, Doan A, Ardalan A, Ballard J, Li H, Panahi F, Zhang H, Naughton J, Prasad S, Krishnan G, Deep R and Raghavendra V (2016). Magellan, Proceedings of the VLDB Endowment, 9:13, (1581-1584), Online publication date: 1-Sep-2016.
  81. Al-Bakri M, Atencia M, David J, Lalande S and Rousset M Uncertainty-sensitive reasoning for inferring same as facts in linked data Proceedings of the Twenty-second European Conference on Artificial Intelligence, (698-706)
  82. ACM
    Liu Q, Javed F and Mcnair M CompanyDepot Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (521-530)
  83. Konda P, Das S, Suganthan G. C. P, Doan A, Ardalan A, Ballard J, Li H, Panahi F, Zhang H, Naughton J, Prasad S, Krishnan G, Deep R and Raghavendra V (2016). Magellan, Proceedings of the VLDB Endowment, 9:12, (1197-1208), Online publication date: 1-Aug-2016.
  84. Patrini G, Nock R, Hardy S and Caetano T Fast learning from distributed datasets without entity matching Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, (1909-1917)
  85. ACM
    Samiei A, Koumarelas I, Loster M and Naumann F Combination of Rule-based and Textual Similarity Approaches to Match Financial Entities Proceedings of the Second International Workshop on Data Science for Macro-Modeling, (1-2)
  86. ACM
    Schild C and Schultz S Linking Deutsche Bundesbank Company Data using Machine-Learning-Based Classification Proceedings of the Second International Workshop on Data Science for Macro-Modeling, (1-3)
  87. ACM
    Christen P, Gayler R, Tran K, Fisher J and Vatsalan D (2016). Automatic Discovery of Abnormal Values in Large Textual Databases, Journal of Data and Information Quality, 7:1-2, (1-31), Online publication date: 6-Jun-2016.
  88. Misra J (2016). Terminological inconsistency analysis of natural language requirements, Information and Software Technology, 74:C, (183-193), Online publication date: 1-Jun-2016.
  89. Grzebala P and Cheatham M Private Record Linkage Proceedings of the 13th International Conference on The Semantic Web. Latest Advances and New Domains - Volume 9678, (593-606)
  90. Ranbaduge T, Vatsalan D, Christen P and Verykios V Hashing-Based Distributed Multi-party Blocking for Privacy-Preserving Record Linkage Proceedings, Part II, of the 20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining - Volume 9652, (415-427)
  91. Fisher J, Christen P and Wang Q Active Learning Based Entity Resolution Using Markov Logic Proceedings, Part II, of the 20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining - Volume 9652, (338-349)
  92. Wang Q, Gao J and Christen P A Clustering-Based Framework for Incrementally Repairing Entity Resolution Proceedings, Part II, of the 20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining - Volume 9652, (283-295)
  93. Riederer C, Kim Y, Chaintreau A, Korula N and Lattanzi S Linking Users Across Domains with Location Data Proceedings of the 25th International Conference on World Wide Web, (707-719)
  94. Vatsalan D and Christen P (2016). Privacy-preserving matching of similar patients, Journal of Biomedical Informatics, 59:C, (285-298), Online publication date: 1-Feb-2016.
  95. Wang Q, Cui M and Liang H (2016). Semantic-Aware Blocking for Entity Resolution, IEEE Transactions on Knowledge and Data Engineering, 28:1, (166-180), Online publication date: 1-Jan-2016.
  96. Papadakis G, Alexiou G, Papastefanatos G and Koutrika G (2015). Schema-agnostic vs schema-based configurations for blocking methods on homogeneous data, Proceedings of the VLDB Endowment, 9:4, (312-323), Online publication date: 1-Dec-2015.
  97. Kejriwal M and Miranker D (2015). An unsupervised instance matcher for schema-free RDF data, Web Semantics: Science, Services and Agents on the World Wide Web, 35:P2, (102-123), Online publication date: 1-Dec-2015.
  98. ACM
    Ramadan B, Christen P, Liang H and Gayler R (2015). Dynamic Sorted Neighborhood Indexing for Real-Time Entity Resolution, Journal of Data and Information Quality, 6:4, (1-29), Online publication date: 26-Oct-2015.
  99. ACM
    Büch L and Andrzejak A Approximate String Matching by End-Users using Active Learning Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, (93-102)
  100. Kejriwal M and Miranker D Decision-Making Bias in Instance Matching Model Selection The Semantic Web - ISWC 2015, (392-407)
  101. ACM
    Karapiperis D and Verykios V (2015). Load-Balancing the Distance Computations in Record Linkage, ACM SIGKDD Explorations Newsletter, 17:1, (1-7), Online publication date: 29-Sep-2015.
  102. Karapiperis D, Verykios V, Katsiri E and Delis A A Tutorial on Blocking Methods for Privacy-Preserving Record Linkage Revised Selected Papers of the First International Workshop on Algorithmic Aspects of Cloud Computing - Volume 9511, (3-15)
  103. ACM
    Goga O, Loiseau P, Sommer R, Teixeira R and Gummadi K On the Reliability of Profile Matching Across Large Online Social Networks Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (1799-1808)
  104. ACM
    Fisher J, Christen P, Wang Q and Rahm E A Clustering-Based Framework to Control Block Sizes for Entity Resolution Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (279-288)
  105. Kejriwal M and Miranker D Semi-supervised Instance Matching Using Boosted Classifiers Proceedings of the 12th European Semantic Web Conference on The Semantic Web. Latest Advances and New Domains - Volume 9088, (388-402)
  106. ACM
    Saveta T, Daskalaki E, Flouris G, Fundulaki I, Herschel M and Ngonga Ngomo A Pushing the Limits of Instance Matching Systems Proceedings of the 24th International Conference on World Wide Web, (105-106)
  107. ACM
    Cresci S, Gazzè D, Lo Duca A, Marchetti A and Tesconi M Geo Data Annotator Proceedings of the 24th International Conference on World Wide Web, (23-24)
  108. ACM
    Mestre D, Pires C and Nascimento D Adaptive sorted neighborhood blocking for entity matching with MapReduce Proceedings of the 30th Annual ACM Symposium on Applied Computing, (981-987)
  109. ACM
    Brito F and Moreira J MovieMatcher Proceedings of the 20th Brazilian Symposium on Multimedia and the Web, (21-24)
  110. ACM
    Christen P Privacy Aspects in Big Data Integration Proceedings of the First International Workshop on Privacy and Secuirty of Big Data, (1-1)
  111. ACM
    Schäfers M and Lipeck U SimMatching Proceedings of the 1st ACM SIGSPATIAL PhD Workshop, (1-5)
  112. ACM
    Ramadan B and Christen P Forest-Based Dynamic Sorted Neighborhood Indexing for Real-Time Entity Resolution Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, (1787-1790)
  113. Kejriwal M and Miranker D On linking heterogeneous dataset collections Proceedings of the 2014 International Conference on Posters & Demonstrations Track - Volume 1272, (217-220)
  114. Winkler W (2014). Matching and record linkage, WIREs Computational Statistics, 6:5, (313-325), Online publication date: 18-Aug-2014.
  115. ACM
    Whang S and Garcia-Molina H Disinformation techniques for entity resolution Proceedings of the 22nd ACM international conference on Information & Knowledge Management, (715-720)
  116. ACM
    Karapiperis D and Verykios V A distributed framework for scaling Up LSH-based computations in privacy preserving record linkage Proceedings of the 6th Balkan Conference in Informatics, (102-109)
  117. Verykios V and Christen P (2013). Privacy-preserving record linkage, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 3:5, (321-332), Online publication date: 1-Sep-2013.
  118. ACM
    Lacoste-Julien S, Palla K, Davies A, Kasneci G, Graepel T and Ghahramani Z SIGMa Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, (572-580)
  119. Ramadan B, Christen P, Liang H, Gayler R and Hawking D Dynamic Similarity-Aware Inverted Indexing for Real-Time Entity Resolution Revised Selected Papers of PAKDD 2013 International Workshops on Trends and Applications in Knowledge Discovery and Data Mining - Volume 7867, (47-58)
  120. ACM
    Talburt J (2013). SPECIAL ISSUE ON ENTITY RESOLUTION Overview, Journal of Data and Information Quality, 4:2, (1-2), Online publication date: 1-Mar-2013.
Contributors
  • The Australian National University

Index Terms

  1. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection
      Please enable JavaScript to view thecomments powered by Disqus.

      Recommendations