Entity matching with similarity encoding: A supervised learning recommendation framework for linking (big) data
Pantelis Karapanagiotis and
Marius Liebald
No 398, SAFE Working Paper Series from Leibniz Institute for Financial Research SAFE
Abstract:
In this study, we introduce a novel entity matching (EM) framework. It com-bines state-of-the-art EM approaches based on Artificial Neural Networks (ANN) with a new similarity encoding derived from matching techniques that are preva-lent in finance and economics. Our framework is on-par or outperforms alternative end-to-end frameworks in standard benchmark cases. Because similarity encod-ing is constructed using (edit) distances instead of semantic similarities, it avoids out-of-vocabulary problems when matching dirty data. We highlight this property by applying an EM application to dirty financial firm-level data extracted from historical archives.
Keywords: Entity matching; Entity resolution; Database linking; Machine learning; Record resolution; Similarity encoding (search for similar items in EconPapers)
JEL-codes: C8 (search for similar items in EconPapers)
Date: 2023
New Economics Papers: this item is included in nep-ain, nep-big and nep-cmp
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.econstor.eu/bitstream/10419/274537/1/185636643X.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:zbw:safewp:398
Access Statistics for this paper
More papers in SAFE Working Paper Series from Leibniz Institute for Financial Research SAFE Contact information at EDIRC.
Bibliographic data for series maintained by ZBW - Leibniz Information Centre for Economics ().