[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Why Your Data Won’t Mix: New tools and techniques can help ease the pain of reconciling schemas.

Published: 01 October 2005 Publication History

Abstract

When independent parties develop database schemas for the same domain, they will almost always be quite different from each other. These differences are referred to as semantic heterogeneity, which also appears in the presence of multiple XML documents, Web services, and ontologies—or more broadly, whenever there is more than one way to structure a body of data. The presence of semi-structured data exacerbates semantic heterogeneity, because semi-structured schemas are much more flexible to start with. For multiple data systems to cooperate with each other, they must understand each other’s schemas. Without such understanding, the multitude of data sources amounts to a digital version of the Tower of Babel.

References

[1]
Halevy, A. Y., Ashish, N., Bitton, D., Carey, M., Draper, D., Pollock, J., Rosenthal, A., and Sikka, V., 2005. Enterprise information integration: successes, challenges and controversies. In Proceedings of the ACM SIGMOD Conference.
[2]
Aberer, K. 2003. Peer to peer data management: introduction to a special issue. SIGMOD Record 32(3).
[3]
Halevy, A. 2003. Learning about data integration challenges from day one. SIGMOD Record 32(3): 16-17.
[4]
Rahm, E., and Bernstein, P.A. 2001. A survey of approaches to automatic schema matching. VLDB Journal 10(4): 334-350.
[5]
Yan, L. L., Miller, R. J., Haas, L. M., and Fagin, R. 2001. Data driven understanding and refinement of schema mappings. In Proceedings of the ACM SIGMOD.
[6]
Do, H.-H., and Rahm, E. 2002. COMA---a system for flexible combination of schema-matching approaches. In Proceedings of the International Conference on Very Large Databases (VLDB).
[7]
Doan, A., Domingos, P., and Halevy, A. 2001. Reconciling schemas of disparate data sources: a machine learning approach. In Proceedings of the ACM SIGMOD.
[8]
Madhavan, J., Bernstein, P., and Rahm, E., 2001. Generic schema matching with cupid. In Proceedings of the International Conference on VLDB.
[9]
See Reference 7.
[10]
Halevy, A., Etzioni, O., Doan, A., Ives, Z., Madhavan, J., McDowell, L., and Tatarinov, I. 2003. Crossing the structure chasm. In Proceedings of the First Biennial Conference on Innovative Data Systems Research (CIDR).
[11]
He, B., and Chang, K. C.-C. 2003. Statistical schema integration across the deep Web. In Proceedings of the ACM SIGMOD.
[12]
Hess, A., and Kushmerick, N. 2003. Learning to attach semantic metadata to Web services. In Proceedings of the International Semantic Web Conference.
[13]
Madhavan, J., Bernstein, P., Doan, A., and Halevy, A. 2005. Corpus-based schema matching. In Proceedings of the International Conference on Data Engineering (ICDE).
[14]
Dong, X. L., Halevy, A. Y., Madhavan, J., Nemes, E., and Zhang, J. 2004. Similarity search for Web services. In Proceedings of the International Conference of VLDB.
[15]
Robertson, G. G., Czerwinski, M. P., and Churchill, J. E. 2005. Visualization of mappings between schemas. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.
[16]
Franklin, M., Halevy, A., and Widom, J. 2005. Data-spaces: a new abstraction for data management.

Cited By

View all
  • (2024)Automating Hazard-Specific Ontology Construction: Methodological Advancements through Ontology Learning Techniques from Disaster-Related Knowledge Bases2024 International Conference on Information and Communication Technologies for Disaster Management (ICT-DM)10.1109/ICT-DM62768.2024.10798930(1-7)Online publication date: 19-Nov-2024
  • (2024)Constructing Co-Occurrence Graphs and Deriving Flood Ontologies for Enhanced Understanding2024 Second International Conference on Inventive Computing and Informatics (ICICI)10.1109/ICICI62254.2024.00009(1-8)Online publication date: 11-Jun-2024
  • (2024)From Traits to Threats: Learning Risk Indicators of Malicious Insider Using Psychometric DataInformation Systems Security10.1007/978-3-031-80020-7_10(180-200)Online publication date: 15-Dec-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Queue
Queue  Volume 3, Issue 8
Semi-structured Data
October 2005
50 pages
ISSN:1542-7730
EISSN:1542-7749
DOI:10.1145/1103822
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2005
Published in QUEUE Volume 3, Issue 8

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2,227
  • Downloads (Last 6 weeks)247
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Automating Hazard-Specific Ontology Construction: Methodological Advancements through Ontology Learning Techniques from Disaster-Related Knowledge Bases2024 International Conference on Information and Communication Technologies for Disaster Management (ICT-DM)10.1109/ICT-DM62768.2024.10798930(1-7)Online publication date: 19-Nov-2024
  • (2024)Constructing Co-Occurrence Graphs and Deriving Flood Ontologies for Enhanced Understanding2024 Second International Conference on Inventive Computing and Informatics (ICICI)10.1109/ICICI62254.2024.00009(1-8)Online publication date: 11-Jun-2024
  • (2024)From Traits to Threats: Learning Risk Indicators of Malicious Insider Using Psychometric DataInformation Systems Security10.1007/978-3-031-80020-7_10(180-200)Online publication date: 15-Dec-2024
  • (2023)Open Access to Data about Silk Heritage: A Case Study in Digital Information SustainabilitySustainability10.3390/su15191434015:19(14340)Online publication date: 28-Sep-2023
  • (2023)An Evaluation of Link Prediction Approaches in Few-Shot ScenariosElectronics10.3390/electronics1210229612:10(2296)Online publication date: 19-May-2023
  • (2023)Lessons learned to boost a bioinformatics knowledge base reusability, the Bgee experienceGigaScience10.1093/gigascience/giad05812Online publication date: 17-Aug-2023
  • (2023)Multi-view graph representation with similarity diffusion for general zero-shot learningNeural Networks10.1016/j.neunet.2023.06.045166(38-50)Online publication date: Sep-2023
  • (2023)How Domain Engineering Can Help to Raise Adoption Rates of Artificial Intelligence in HealthcareInformation Integration and Web Intelligence10.1007/978-3-031-48316-5_1(3-12)Online publication date: 4-Dec-2023
  • (2021)Merging Datasets of CyberSecurity Incidents for Fun and InsightFrontiers in Big Data10.3389/fdata.2020.5211323Online publication date: 26-Jan-2021
  • (2020)Ontology Building for Cyber-Physical Systems: Application in the Manufacturing DomainIEEE Transactions on Automation Science and Engineering10.1109/TASE.2020.2991777(1-17)Online publication date: 2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media