Abstract
In distributed environments, data sources are typically characterized by various kinds of heterogeneities that can be generally classified into (1) technological heterogeneities, (2) schema heterogeneities, and (3) instance-level heterogeneities. Technological heterogeneities are due to the use of products by different vendors, employed at various layers of an information and communication infrastructure. An example of technological heterogeneity is the usage of two different relational database management systems like IBM’s DB2 vs. Microsoft’s SQLServer. Schema heterogeneities are principally caused by the use of (1) different data models, such as one source that adopts the relational data model and a different source that adopts the XML data model, and (2) different data representations, such as one source that stores addresses as one single field and another source that stores addresses with separate fields for street, civic number, and city. Instance-level heterogeneities are caused by different, conflicting data values provided by distinct sources for the same objects. This type of heterogeneity can be caused by quality errors, such as accuracy, completeness, currency, and consistency errors; such errors may result, for instance, from independent processes that feed the different data sources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arenas M, Bertossi LE, Chomicki J (1999) Consistent Query Answers in Inconsistent Databases. In: Proceedings of the PODS’99
Berti-Équille L (2004) Quality-adaptive query processing over distributed sources. In: Proceedings of the 9th International Conference on Information Quality (IQ 2004), pp 285–296
Berti-Équille L (2001) Integration of biological data and quality-driven source negotiation. In: Proceedings of the ER 2001, Yokohama, pp 256–269
Bleiholder J, Naumann F (2008) Data fusion. ACM Computing Surveys
Boag A, Chamberlin D, Fernandez MF, Florescu D, Robie J, Simèon J (2003) XQuery 1.0: An XML Query Language. http://www.w3.org/TR/xquery
Bravo L, Bertossi LE (2003) Logic Programming for Consistently Querying Data Integration Systems. In: Proceedings of the IJCAI 2003, pp 10–15
Cali A, Calvanese D, De Giacomo G, Lenzerini M (2002) On the role of integrity constraints in data integration. IEEE Data Engineering Bulletin 25(3):39–45
Calì A, Lembo D, Rosati R (2003) On the decidability and complexity of query answering over inconsistent and incomplete databases. In: Proceedings of the PODS 2003, pp 260–271
Calì A, Lembo D, Rosati R (2003) Query rewriting and answering under constraints in data integration systems. In: Proceedings of the IJCAI 2003, pp 16–21
Charnes A, Cooper W, Rhodes E (1978) Measuring the efficiency of decision making units. European Journal of operational research 2
Dayal U (1985) Query processing in a multidatabase system. In: Query Processing in Database Systems. Springer, New York, pp 81–108
De Giacomo G, Lembo D, Lenzerini M, Rosati R (2004) Tackling inconsistencies in data integration through source preferences. In: Proceedings of the IQIS 2004 (SIGMOD Workshop), pp 27–34
Fan W, Lu H, Madnick S, Cheungd D (2001) Discovering and reconciling value conflicts for numerical data integration. Information Systems 26(8):635–656
Fuxman A, Fazli E, Miller RJ (2005) ConQuer: efficient management of inconsistent databases. In: Proceedings of the SIGMOD 2005, pp 155–166
Geerts F, Mecca G, Papotti P, Santoro D (2014) Mapping and cleaning. In: IEEE 30th International Conference on Data Engineering (ICDE 2014), Chicago, March 31–April 4, 2014, pp 232–243
Geerts F, Mecca G, Papotti P, Santoro D (2014) That’s all folks! LLUNATIC goes open source. PVLDB 7(13):1565–1568
Greco G, Lembo D (2004) Data integration with preferences among sources. In: Proceedings of the ER 2004, pp 231–244
Greco G, Greco S, Zumpano E (2003) A logical framework for querying and repairing inconsistent databases. Transactions on Knowledge and Data Engineering 15(6):1389–1408
Kim W, Seo J (1991) Classifying schematic and data heterogeneity in multidatabase systems. IEEE Computer 24(12):12–18
Lenzerini M (2002) Data integration: a theoretical perspective. In: Proceedings of the PODS 2002, pp 233–246
Levy AY, Mendelzon AO, Sagiv Y, Srivastava D (1995) Answering queries using views. In: Proceedings of the PODS 1995, pp 95–104
Lim EP, Chiang RH (1998) A global object model for accommodating instance heterogeneities. In: Proceedings of the ER’98, Singapore, pp 435–448
Lin J, Mendelzon AO (1998) Merging databases under constraints. International Journal of Cooperative Information Systems 7(1):55–76
Motro A, Anokhin P (2005) Fusionplex: Resolution of Data Inconsistencies in the Data Integration of Heterogeneous Information Sources. Information Fusion
Naumann F, Häussler M (2002) Declarative data merging with conflict resolution. In: 7th International Conference on Information Quality, pp 212–214
Naumann F, Leser U, Freytag JC (1999) Quality-driven integration of heterogenous information systems. In: Proceedings of the VLDB’99, pp 447–458
Papakonstantinou Y, Abiteboul S, Garcia-Molina H (1996) Object fusion in mediator systems. In: Proceedings of the VLDB 1996, pp 413–424
Papotti P, Naumann F, Kruse S (2015) Estimating data integration and cleaning effort. In: Proceedings of the 18th International Conference on Extending Database Technology (EDBT 2015), Brussels, March 23–27, 2015, pp 61–72
Saaty TL (1980) The Analytic Hierarchy Process. McGraw-Hill, New York
Scannapieco M, Virgillito A, Marchetti C, Mecella M, Baldoni R (2004) The DaQuinCIS architecture: a platform for exchanging and improving data quality in cooperative information systems. Information Systems 29(7):551–582
Scannapieco M, Pernici B, Pierce EM (2005) IP-UML: a methodology for quality improvement based on IP-MAP and UML. In: Wang RY, Pierce EM, Madnick SE, Fisher CW (eds) Advances in Management Information Systems - Information Quality (AMIS-IQ) Monograph, Sharpe ME
Schallehn E, Sattler KU, Saake G (San Jose, CA, 2002) Extensible and similarity-based grouping for data integration. In: Proceedings of the ICDE 2002, pp 277–277
Ullman JD (1988) Principles of Database and Knowledge-Base Systems. Computer Science Press, Rockville
Wiederhold G (1992) Mediators in the architecture of future information systems. IEEE Computer 25(3):38–49
Yan LL, Ozsu T (1999) Conflict tolerant queries in AURORA. In: Proceedings of the CoopIS’99, pp 279–290
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Batini, C., Scannapieco, M. (2016). Data Quality Issues in Data Integration Systems. In: Data and Information Quality. Data-Centric Systems and Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-24106-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-24106-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24104-3
Online ISBN: 978-3-319-24106-7
eBook Packages: Computer ScienceComputer Science (R0)