Abstract
This paper introduces the first formal framework for learning mappings between heterogeneous schemas which is based on logics and probability theory. This task, also called “schema matching”, is a crucial step in integrating heterogeneous collections. As schemas may have different granularities, and as schema attributes do not always match precisely, a general-purpose schema mapping approach requires support for uncertain mappings, and mappings have to be learned automatically. The framework combines different classifiers for finding suitable mapping candidates (together with their weights), and selects that set of mapping rules which is the most likely one. Finally, the framework with different variants has been evaluated on two different data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Callan, J., Connell, M.: Query-based sampling of text databases. ACM Transactions on Information Systems 19(2), 97–130 (2001)
Dhamankar, R., Lee, Y., Doan, A., Halevy, A., Domingos, P.: iMAP: Discovering complex semantic matches between database schemas. In: SIGMOD 2004 (2004)
Doan, A., Domingos, P., Halevy, A.Y.: Reconciling schemas of disparate data sources: A machine-learning approach. In: SIGMOD Conference (2001)
Doan, A., Madhavan, J., Dhamankar, R., Domingos, P., Halevy, A.: Learning to match ontologies on the semantic web. VLDB Journal (2004)
Fagin, R., Kolaitis, P.G., Tan, W.-C., Popa, L.: Composing schema mappings: Second-order dependencies to the rescue. In: Proceedings PODS 2004 (2004)
Friedman, M., Levy, A.Y., Millstein, T.D.: Navigational plans for data integration. In: Proceedings of 16th Natl Conf. on Artificial Intelligence, pp. 67–73 (1999)
Fuhr, N.: Towards data abstraction in networked information retrieval systems. Information Processing and Management 35(2), 101–119 (1999)
Fuhr, N.: Probabilistic Datalog: Implementing logical information retrieval for advanced applications. Journal of the American Society for Information Science 51(2), 95–110 (2000)
He, B., Chang, K.C.-C.: Statistical schema matching across web query interfaces. In: Papakonstantinou, et al. (eds.) [13]
Kang, J., Naughton, J.F.: On schema matching with opaque column names and data values. In: Papakonstantinou, et al. (eds.) [13]
Lenzerini, M.: Data integration: a theoretical perspective. In: Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS 2002), pp. 233–246. ACM Press, New York (2002)
Nottelmann, H., Fuhr, N.: Learning probabilistic Datalog rules for information classification and transformation. In: Paques, H., Liu, L., Grossman, D. (eds.) Proceedings of the 10th International Conference on Information and Knowledge Management, pp. 387–394. ACM, New York (2001)
Papakonstantinou, Y., Halevy, A., Ives, Z. (eds.): Proceedings SIGMOD 2003 (2003)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. The VLDB Journal 10(4), 334–350 (2001)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nottelmann, H., Straccia, U. (2005). sPLMap: A Probabilistic Approach to Schema Matching. In: Losada, D.E., Fernández-Luna, J.M. (eds) Advances in Information Retrieval. ECIR 2005. Lecture Notes in Computer Science, vol 3408. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31865-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-31865-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25295-5
Online ISBN: 978-3-540-31865-1
eBook Packages: Computer ScienceComputer Science (R0)