iMAP: Discovering complex semantic matches between database schemas
Proceedings of the 2004 ACM SIGMOD international conference on Management of …, 2004•dl.acm.org
Creating semantic matches between disparate data sources is fundamental to numerous
data sharing efforts. Manually creating matches is extremely tedious and error-prone. Hence
many recent works have focused on automating the matching process. To date, however,
virtually all of these works deal only with one-to-one (1-1) matches, such as address=
location. They do not consider the important class of more complex matches, such as
address= concat (city, state) and room-pric= room-rate*(1+ tax-rate). We describe the iMAP …
data sharing efforts. Manually creating matches is extremely tedious and error-prone. Hence
many recent works have focused on automating the matching process. To date, however,
virtually all of these works deal only with one-to-one (1-1) matches, such as address=
location. They do not consider the important class of more complex matches, such as
address= concat (city, state) and room-pric= room-rate*(1+ tax-rate). We describe the iMAP …
Creating semantic matches between disparate data sources is fundamental to numerous data sharing efforts. Manually creating matches is extremely tedious and error-prone. Hence many recent works have focused on automating the matching process. To date, however, virtually all of these works deal only with one-to-one (1-1) matches, such as address = location. They do not consider the important class of more complex matches, such as address = concat (city, state) and room-pric = room-rate* (1 + tax-rate).We describe the iMAP system which semi-automatically discovers both 1-1 and complex matches. iMAP reformulates schema matching as a search in an often very large or infinite match space. To search effectively, it employs a set of searchers, each discovering specific types of complex matches. To further improve matching accuracy, iMAP exploits a variety of domain knowledge, including past complex matches, domain integrity constraints, and overlap data. Finally, iMAP introduces a novel feature that generates explanation of predicted matches, to provide insights into the matching process and suggest actions to converge on correct matches quickly. We apply iMAP to several real-world domains to match relational tables, and show that it discovers both 1-1 and complex matches with high accuracy.
ACM Digital Library