More Web Proxy on the site http://driver.im/

research-article

Learning Knowledge Graph for Target-driven Schema Matching

Authors:

Debayan Mukherjee,

Atreya Bandyopadhyay,

Rajdip Chowdhury,

Indrajit BhattacharyaAuthors Info & Claims

CODS-COMAD '21: Proceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)

Pages 65 - 73

https://doi.org/10.1145/3430984.3431013

Published: 02 January 2021 Publication History

Abstract

Though research on schema matching goes back many years, this remains a largely unsolved problem on the ground. The knowledge of concepts, terms and their correspondences useful for matching is implicit in historical matching information. Based on this insight, we propose a probabilistic framework for schema matching based on learning. It updates a knowledge graph of concepts, terms, their relationships and associated probabilities from historical ‘training’ matches, and then uses this knowledge for predicting matches for a ‘test’ source schema. We present a solution based on stochastic EM, where the latent concepts and relationships in historical matches are inferred by sampling during the expectation step, and parameters associated with the knowledge graph are estimated using these inferred variables. Using extensive experiments we show that by making effective use of historical matches, our approach significantly outperforms state-of-the-art matching techniques for a real-world Insurance dataset as well as a benchmark dataset.

References

[1]

ACORD. 2020. ACORD REFERENCE ARCHITECTURE. https://www.acord.org/standards-architecture/reference-architecture.

[2]

Arturs Backurs and Christos Tzamos. 2017. Improving Viterbi is Hard: Better Runtimes Imply Faster Clique Algorithms. In Proceedings of the 34th International Conference on Machine Learning (ICML).

[3]

Philip. Bernstein, Jayant. Madhavan, and Erhard. Rahm. 2011. Generic Schema Matching, Ten Years Later. In Proceedings of the VLDB Endowment.

[4]

Indrajit Bhattacharya and Lise Getoor. 2007. Collective Entity Resolution in Relational Data. ACM Trans. Knowl. Discov. Data 1, 1 (March 2007), 5–es.

Digital Library

[5]

David Blei, Andrew Ng, and Michael Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (2003), 993–1022.

Digital Library

[6]

G. Celeux and J. Diebolt. 1985. The SEM Algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Computational Statistics Quarterly 2 (1985).

[7]

A.P. Dempster, N.M. Laird, and D.B. Rubin. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society 39, 1 (1977).

[8]

Li Deng, Shuo Zhang, and Krisztian Balog. 2019. Table2Vec: NeuralWord and Entity Embeddings for Table Population and Retrieval. In SIGIR.

[9]

H. Do and E. Rahm. 2002. COMA - A System for Flexible Combination of Schema Matching Approaches. In Conference on Very Large Databases (VLDB).

[10]

H. Do and E. Rahm. 2007. Matching large schemas: Approaches and evaluation. Information Systems 32, 6 (2007).

[11]

A. Doan, P. Domingos, and A. Halevy. 2001. Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach. In ACM SIGMOD International Conference on Management of Data.

[12]

AnHai Doan, Alon Halevy, and Zachary Ives. 2012. Principles of Data Integration. Morgan Kaufmann.

[13]

J. Feldman, I. Abou-Faycal, and M. Frigo. 2002. A fast maximum-likelihood decoder for convolutional codes. In Proceedings IEEE 56th Vehicular Technology Conference.

[14]

Lise Getoor and Ashwin Machanavajjhala. 2012. Entity Resolution: Theory, Practice & Open Challenges. Proc. VLDB Endow. 5, 12 (2012), 2018–2019.

Digital Library

[15]

Alon Halevy, Anand Rajaraman, and Joann Ordille. 2006. Data Integration: The Teenage Years. In VLDB.

[16]

M. Hernández, R. Miller, and L. Haas. 2001. Clio: A Semi-automatic Tool for Schema Mapping. In ACM SIGMOD International Conference on Management of Data.

[17]

Girija Limaye, Sunita Sarawagi, and Soumen Chakrabarti. 2010. Annotating and Searching Web Tables Using Entities, Types and Relationships. Proc. VLDB Endow. 3, 1-2 (2010), 1338–1347.

Digital Library

[18]

J. Madhavan, P. Bernstein, A. Doan, and A. Halevy. 2005. Corpus-based Schema Matching. In IEEE International Conference on Data Engineering (ICDE).

[19]

J. Madhavan, P. Bernstein, and E Rahm. 2001. Generic Schema Matching with Cupid. In Conference on Very Large Databases (VLDB).

[20]

Radford Neal. 1993. Probabilistic Inference Using Markov Chain Monte Carlo Methods. Technical Report CRG-TR-93-1. Department of Computer Science, University of Toronto.

[21]

L. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2 (1989).

[22]

Kunihiro Takeoka, Masafumi Oyamada, Shinji Nakadai, and Takeshi Okadome. 2019. Meimei: An Efficient Probabilistic Approach for Semantically Annotating Tables. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019. 281–288.

[23]

TAMR. 2020. TAMR Inc.https://www.tamr.com.

[24]

TCS. 2020. TCS BaNCS For Insurance. https://www.tcs.com/bancs/insurance-solutions.

Cited By

Mitra SMukherjee DBandyopadhyay AChowdhury RMedicherla RBhattacharya INaik RGrundy J(2021)Learning-based assistant for data migration of enterprise information systemsProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE51524.2021.9678533(1121-1125)Online publication date: 15-Nov-2021
https://dl.acm.org/doi/10.1109/ASE51524.2021.9678533

Recommendations

A New Complex Schema Matching System
CICC-ITOE '10: Proceedings of the 2010 International Conference on Innovative Computing and Communication and 2010 Asia-Pacific Conference on Information Technology and Ocean Engineering

Schema matching, the problem of finding semantic correspondences between elements of two schemas, plays a key role in many applications, such as data warehouse, E-Commerce. The existing approaches to automating schema matching almost focus on computing ...
Understanding the schema matching problem
ACS'07: Proceedings of the 7th Conference on 7th WSEAS International Conference on Applied Computer Science - Volume 7

Schema matching plays the central role in many applications that require interoperability between heterogeneous data sources. The best way to attain comprehensive understanding of the schema matching problem is to construct a complete, if possible, ...
eTuner: tuning schema matching software using synthetic scenarios

Most recent schema matching systems assemble multiple components, each employing a particular matching technique. The domain user mustthen tune the system: select the right component to be executed and correctly adjust their numerous “knobs” (e.g., ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

CODS-COMAD '21: Proceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)

January 2021

453 pages

ISBN:9781450388177

DOI:10.1145/3430984

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 January 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CODS COMAD 2021

CODS COMAD 2021: 8th ACM IKDD CODS and 26th COMAD

January 2 - 4, 2021

Bangalore, India

Acceptance Rates

Overall Acceptance Rate 197 of 680 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
130
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)3

Reflects downloads up to 20 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mitra SMukherjee DBandyopadhyay AChowdhury RMedicherla RBhattacharya INaik RGrundy J(2021)Learning-based assistant for data migration of enterprise information systemsProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE51524.2021.9678533(1121-1125)Online publication date: 15-Nov-2021
https://dl.acm.org/doi/10.1109/ASE51524.2021.9678533

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents