[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2396761.2398427acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Learning to discover complex mappings from web forms to ontologies

Published: 29 October 2012 Publication History

Abstract

In order to realize the Semantic Web, various structures on the Web including Web forms need to be annotated with and mapped to domain ontologies. We present a machine learning-based automatic approach for discovering complex mappings from Web forms to ontologies. A complex mapping associates a set of semantically related elements on a form to a set of semantically related elements in an ontology. Existing schema mapping solutions mainly rely on integrity constraints to infer complex schema mappings. However, it is difficult to extract rich integrity constraints from forms. We show how machine learning techniques can be used to automatically discover complex mappings between Web forms and ontologies. The challenge is how to capture and learn the complicated knowledge encoded in existing complex mappings. We develop an initial solution that takes a naive Bayesian approach. We evaluated the performance of the solution on various domains. Our experimental results show that the solution returns the expected mappings as the top-1 results usually among several hundreds candidate mappings for more than 80% of the test cases. Furthermore, the expected mappings are always returned as the top-k results with k<4. The experiments have demonstrated that the approach is effective and has the potential to save significant human efforts.

References

[1]
Y. An, A. Borgida, and J. Mylopoulos. Constructing Complex Semantic Mappings between XML Data and Ontologies. In ISWC'05, pages 6--20, 2005.
[2]
Y. An, A. Borgida, and J. Mylopoulos. Inferring Complex Semantic Mappings between Relational Tables and Ontologies from Simple Correspondences. In ODBASE, pages 1152--1169, 2005.
[3]
Y. An, R. Khare, I.-Y. Song, and X. Hu. Automatically mapping and integrating multiple data entry forms into a database. In the Proceedings of the 30th International Conference on Conceptual Modeling (ER'11), 2011.
[4]
R. Berlanga, E. Jimenez-Ruiz, V. Nebot, and I. Sanz. Faeton: Form analysis and extraction tool for ontology construction. 2(2), 2008.
[5]
A. Bonifati, E. Q. Chang, T. Ho, V. S. Lakshmanan, and R. Pottinger. HePToX: Marring XML and Heterogeneity in Your P2P Databases. In Proceedings of International Conference on Very Large Data Bases (VLDB), pages 1267--1270, 2005.
[6]
R. Dhamankar, Y. Lee, A. Doan, A. Halevy, and P. Domingos. iMAP: Discovering Complex Semantic Matches between Database Schemas. In Proceedings of the ACM SIGMOD, pages 383--394, 2004.
[7]
H. H. Do and E. Rahm. COMA - a system for flexible combination of schema matching approaches. In Proceedings of the International Conference on Very Large Data bases (VLDB), pages 610--621, 2002.
[8]
A. Doan, P. Domingos, and A. Halevy. Reconciling schemas of disparate data sources: A machine learning approach. In Proceedings of ACM SIGMOD, pages 509--520, 2001.
[9]
A. Doan, J. Madhavan, P. Domingos, and A. Halevy. Learning to map between ontologies on the semantic web. In Proc. of the International WWW Conference, pages 662--673, 2002.
[10]
E. C. Dragut, T. Kabisch, C. T. Yu, and U. Leser. A hierarchical approach to model web query interfaces for web source integration. PVLDB, 2(1):325--336, 2009.
[11]
B. He and K. C.-C. Chang. Statistical schema matching across web query interfaces. In SIGMOD '03: Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pages 217--228, New York, NY, USA, 2003. ACM.
[12]
H. He, W. Meng, Y. Lu, C. Yu, and Z. Wu. Towards deeper understanding of the search interfaces of the deep web. World Wide Web, 10(2):133--155, 2007.
[13]
H. He, W. Meng, C. Yu, and Z. Wu. Automatic integration of web search interfaces with wise-integrator. The VLDB Journal, 13(3):256--273, 2004.
[14]
M. Hepp. Goodrelations: An ontology for describing products and services offers on the web. In Proceedings of the 16th International Conference on Knowledge Engineering and Knowledge Management (EKAW2008), volume LNCS 5268, pages 332--347, 2008.
[15]
F. K. Hwang, D. S. Richards, and P. Winter. The Steiner Tree Problem. Annals of Discrete Mathematics, 53, 1992.
[16]
Z. Kedad and X. Xue. Mapping Discovery for XML Data Integration. In Proceedings of International Conference on Cooperative Information Systems (CoopIS), pages 166--182, 2005.
[17]
R. Khare and Y. An. An empirical study on using hidden markov model for search interface segmentation. In Proceedings of 18th ACM Conference on Information and Knowledge Management (CIKM), pages 17--26, 2009.
[18]
R. Khare, Y. An, and I.-Y. Song. Understanding search interfaces: A survey. SIGMOD Record, 39(1):33--40, 2010.
[19]
J. Madhavan, P. Bernstein, A. Doan, and A. Halevy. Corpus-Based Schema Matching. In Proceedings of the International Conference on Data Engineering (ICDE), pages 57--68, 2005.
[20]
J. Madhavan, P. A. Bernstein, and E. Rahm. Generic schema matching with cupid. In VLDB '01, pages 49--58, 2001.
[21]
S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In Proceedings of the International Conference on Data Engineering (ICDE), pages 117--128, 2002.
[22]
R. J. Miller, L. M. Haas, and M. A. Hernandez. Schema Mapping as Query Discovery. In VLDB, pages 77--88, 2000.
[23]
T. Milo and S. Zohar. Using Schema Matching to Simplify Heterogeneous Data Translation. In Proceedings of International Conference on Very Large Data Bases (VLDB), pages 122--133, 1998.
[24]
H. Nguyen, T. Nguyen, and J. Freire. Learning to extract form labels. Proc. VLDB Endow., 1(1):684--694, 2008.
[25]
H. Nottelmann and U. Straccia. Information retrieval and machine learning for probabilistic schema matching. Information Processing and Management, 43:52--576, 2007.
[26]
L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hernández, and R. Fagin. Translating web data. In VLDB, pages 598--609, 2002.
[27]
E. Rahm and P. A. Bernstein. A Survey of Approaches to Automatic Schema Matching. VLDB Journal, 10:334--350, 2001.
[28]
H. Roitman and A. Gal. Ontobuilder: Fully automatic extraction and consolidation of ontologies from web sources using sequence semantics. In EDBT Workshops, pages 573--576, 2006.
[29]
J. Wang and F. Lochovsky. Data extraction and label assignment for web databases. In 12th International Conference on World Wide Web, pages 187--196, 2003.
[30]
W. Wu, C. Yu, A. Doan, and W. Meng. An interactive clustering-based approach to integrating source query interfaces on the deep web. In SIGMOD '04, pages 95--106, New York, NY, USA, 2004. ACM.
[31]
Z. Zhang, B. He, and K. C.-C. Chang. Understanding web query interfaces: best-effort parsing with hidden syntax. In SIGMOD '04, pages 107--118, New York, NY, USA, 2004. ACM.

Cited By

View all
  • (2022)A Machine Learning Approach for Automated Filling of Categorical Fields in Data Entry FormsACM Transactions on Software Engineering and Methodology10.1145/353302132:2(1-40)Online publication date: 24-May-2022
  • (2021)Automatic evaluation of complex alignmentsSemantic Web10.3233/SW-21043712:5(767-787)Online publication date: 1-Jan-2021
  • (2020)Survey on complex ontology matchingSemantic Web10.3233/SW-19036611:4(689-727)Online publication date: 1-Jan-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
October 2012
2840 pages
ISBN:9781450311564
DOI:10.1145/2396761
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. mapping discovery
  2. ontologies
  3. semantic mapping
  4. web forms

Qualifiers

  • Research-article

Conference

CIKM'12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A Machine Learning Approach for Automated Filling of Categorical Fields in Data Entry FormsACM Transactions on Software Engineering and Methodology10.1145/353302132:2(1-40)Online publication date: 24-May-2022
  • (2021)Automatic evaluation of complex alignmentsSemantic Web10.3233/SW-21043712:5(767-787)Online publication date: 1-Jan-2021
  • (2020)Survey on complex ontology matchingSemantic Web10.3233/SW-19036611:4(689-727)Online publication date: 1-Jan-2020
  • (2020)An Ontology Based Approach for User Preference StatisticsFrontier Computing10.1007/978-981-15-3250-4_42(352-361)Online publication date: 26-Feb-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media