[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3430984.3431013acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
research-article

Learning Knowledge Graph for Target-driven Schema Matching

Published: 02 January 2021 Publication History

Abstract

Though research on schema matching goes back many years, this remains a largely unsolved problem on the ground. The knowledge of concepts, terms and their correspondences useful for matching is implicit in historical matching information. Based on this insight, we propose a probabilistic framework for schema matching based on learning. It updates a knowledge graph of concepts, terms, their relationships and associated probabilities from historical ‘training’ matches, and then uses this knowledge for predicting matches for a ‘test’ source schema. We present a solution based on stochastic EM, where the latent concepts and relationships in historical matches are inferred by sampling during the expectation step, and parameters associated with the knowledge graph are estimated using these inferred variables. Using extensive experiments we show that by making effective use of historical matches, our approach significantly outperforms state-of-the-art matching techniques for a real-world Insurance dataset as well as a benchmark dataset.

References

[1]
ACORD. 2020. ACORD REFERENCE ARCHITECTURE. https://www.acord.org/standards-architecture/reference-architecture.
[2]
Arturs Backurs and Christos Tzamos. 2017. Improving Viterbi is Hard: Better Runtimes Imply Faster Clique Algorithms. In Proceedings of the 34th International Conference on Machine Learning (ICML).
[3]
Philip. Bernstein, Jayant. Madhavan, and Erhard. Rahm. 2011. Generic Schema Matching, Ten Years Later. In Proceedings of the VLDB Endowment.
[4]
Indrajit Bhattacharya and Lise Getoor. 2007. Collective Entity Resolution in Relational Data. ACM Trans. Knowl. Discov. Data 1, 1 (March 2007), 5–es.
[5]
David Blei, Andrew Ng, and Michael Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (2003), 993–1022.
[6]
G. Celeux and J. Diebolt. 1985. The SEM Algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Computational Statistics Quarterly 2 (1985).
[7]
A.P. Dempster, N.M. Laird, and D.B. Rubin. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society 39, 1 (1977).
[8]
Li Deng, Shuo Zhang, and Krisztian Balog. 2019. Table2Vec: NeuralWord and Entity Embeddings for Table Population and Retrieval. In SIGIR.
[9]
H. Do and E. Rahm. 2002. COMA - A System for Flexible Combination of Schema Matching Approaches. In Conference on Very Large Databases (VLDB).
[10]
H. Do and E. Rahm. 2007. Matching large schemas: Approaches and evaluation. Information Systems 32, 6 (2007).
[11]
A. Doan, P. Domingos, and A. Halevy. 2001. Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach. In ACM SIGMOD International Conference on Management of Data.
[12]
AnHai Doan, Alon Halevy, and Zachary Ives. 2012. Principles of Data Integration. Morgan Kaufmann.
[13]
J. Feldman, I. Abou-Faycal, and M. Frigo. 2002. A fast maximum-likelihood decoder for convolutional codes. In Proceedings IEEE 56th Vehicular Technology Conference.
[14]
Lise Getoor and Ashwin Machanavajjhala. 2012. Entity Resolution: Theory, Practice & Open Challenges. Proc. VLDB Endow. 5, 12 (2012), 2018–2019.
[15]
Alon Halevy, Anand Rajaraman, and Joann Ordille. 2006. Data Integration: The Teenage Years. In VLDB.
[16]
M. Hernández, R. Miller, and L. Haas. 2001. Clio: A Semi-automatic Tool for Schema Mapping. In ACM SIGMOD International Conference on Management of Data.
[17]
Girija Limaye, Sunita Sarawagi, and Soumen Chakrabarti. 2010. Annotating and Searching Web Tables Using Entities, Types and Relationships. Proc. VLDB Endow. 3, 1-2 (2010), 1338–1347.
[18]
J. Madhavan, P. Bernstein, A. Doan, and A. Halevy. 2005. Corpus-based Schema Matching. In IEEE International Conference on Data Engineering (ICDE).
[19]
J. Madhavan, P. Bernstein, and E Rahm. 2001. Generic Schema Matching with Cupid. In Conference on Very Large Databases (VLDB).
[20]
Radford Neal. 1993. Probabilistic Inference Using Markov Chain Monte Carlo Methods. Technical Report CRG-TR-93-1. Department of Computer Science, University of Toronto.
[21]
L. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2 (1989).
[22]
Kunihiro Takeoka, Masafumi Oyamada, Shinji Nakadai, and Takeshi Okadome. 2019. Meimei: An Efficient Probabilistic Approach for Semantically Annotating Tables. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019. 281–288.
[23]
TAMR. 2020. TAMR Inc.https://www.tamr.com.
[24]
TCS. 2020. TCS BaNCS For Insurance. https://www.tcs.com/bancs/insurance-solutions.

Cited By

View all
  • (2021)Learning-based assistant for data migration of enterprise information systemsProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE51524.2021.9678533(1121-1125)Online publication date: 15-Nov-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
CODS-COMAD '21: Proceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)
January 2021
453 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 January 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Knowledge Graph
  2. Machine Learning
  3. Schema Matching

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CODS COMAD 2021
CODS COMAD 2021: 8th ACM IKDD CODS and 26th COMAD
January 2 - 4, 2021
Bangalore, India

Acceptance Rates

Overall Acceptance Rate 197 of 680 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)3
Reflects downloads up to 20 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Learning-based assistant for data migration of enterprise information systemsProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE51524.2021.9678533(1121-1125)Online publication date: 15-Nov-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media