[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Data Quality Issues in Data Integration Systems

  • Chapter
  • First Online:
Data and Information Quality

Part of the book series: Data-Centric Systems and Applications ((DCSA))

Abstract

In distributed environments, data sources are typically characterized by various kinds of heterogeneities that can be generally classified into (1) technological heterogeneities, (2) schema heterogeneities, and (3) instance-level heterogeneities. Technological heterogeneities are due to the use of products by different vendors, employed at various layers of an information and communication infrastructure. An example of technological heterogeneity is the usage of two different relational database management systems like IBM’s DB2 vs. Microsoft’s SQLServer. Schema heterogeneities are principally caused by the use of (1) different data models, such as one source that adopts the relational data model and a different source that adopts the XML data model, and (2) different data representations, such as one source that stores addresses as one single field and another source that stores addresses with separate fields for street, civic number, and city. Instance-level heterogeneities are caused by different, conflicting data values provided by distinct sources for the same objects. This type of heterogeneity can be caused by quality errors, such as accuracy, completeness, currency, and consistency errors; such errors may result, for instance, from independent processes that feed the different data sources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 87.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 109.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
GBP 109.99
Price includes VAT (United Kingdom)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The source schema in [397] is a collective name that indicates the set of source schemas, as introduced in Sect. 10.2.

References

  1. Arenas M, Bertossi LE, Chomicki J (1999) Consistent Query Answers in Inconsistent Databases. In: Proceedings of the PODS’99

    Google Scholar 

  2. Berti-Équille L (2004) Quality-adaptive query processing over distributed sources. In: Proceedings of the 9th International Conference on Information Quality (IQ 2004), pp 285–296

    Google Scholar 

  3. Berti-Équille L (2001) Integration of biological data and quality-driven source negotiation. In: Proceedings of the ER 2001, Yokohama, pp 256–269

    Google Scholar 

  4. Bleiholder J, Naumann F (2008) Data fusion. ACM Computing Surveys

    Google Scholar 

  5. Boag A, Chamberlin D, Fernandez MF, Florescu D, Robie J, Simèon J (2003) XQuery 1.0: An XML Query Language. http://www.w3.org/TR/xquery

  6. Bravo L, Bertossi LE (2003) Logic Programming for Consistently Querying Data Integration Systems. In: Proceedings of the IJCAI 2003, pp 10–15

    Google Scholar 

  7. Cali A, Calvanese D, De Giacomo G, Lenzerini M (2002) On the role of integrity constraints in data integration. IEEE Data Engineering Bulletin 25(3):39–45

    MATH  Google Scholar 

  8. Calì A, Lembo D, Rosati R (2003) On the decidability and complexity of query answering over inconsistent and incomplete databases. In: Proceedings of the PODS 2003, pp 260–271

    Google Scholar 

  9. Calì A, Lembo D, Rosati R (2003) Query rewriting and answering under constraints in data integration systems. In: Proceedings of the IJCAI 2003, pp 16–21

    Google Scholar 

  10. Charnes A, Cooper W, Rhodes E (1978) Measuring the efficiency of decision making units. European Journal of operational research 2

    Google Scholar 

  11. Dayal U (1985) Query processing in a multidatabase system. In: Query Processing in Database Systems. Springer, New York, pp 81–108

    Chapter  Google Scholar 

  12. De Giacomo G, Lembo D, Lenzerini M, Rosati R (2004) Tackling inconsistencies in data integration through source preferences. In: Proceedings of the IQIS 2004 (SIGMOD Workshop), pp 27–34

    Google Scholar 

  13. Fan W, Lu H, Madnick S, Cheungd D (2001) Discovering and reconciling value conflicts for numerical data integration. Information Systems 26(8):635–656

    Article  MATH  Google Scholar 

  14. Fuxman A, Fazli E, Miller RJ (2005) ConQuer: efficient management of inconsistent databases. In: Proceedings of the SIGMOD 2005, pp 155–166

    Google Scholar 

  15. Geerts F, Mecca G, Papotti P, Santoro D (2014) Mapping and cleaning. In: IEEE 30th International Conference on Data Engineering (ICDE 2014), Chicago, March 31–April 4, 2014, pp 232–243

    Google Scholar 

  16. Geerts F, Mecca G, Papotti P, Santoro D (2014) That’s all folks! LLUNATIC goes open source. PVLDB 7(13):1565–1568

    Google Scholar 

  17. Greco G, Lembo D (2004) Data integration with preferences among sources. In: Proceedings of the ER 2004, pp 231–244

    Google Scholar 

  18. Greco G, Greco S, Zumpano E (2003) A logical framework for querying and repairing inconsistent databases. Transactions on Knowledge and Data Engineering 15(6):1389–1408

    Article  Google Scholar 

  19. Kim W, Seo J (1991) Classifying schematic and data heterogeneity in multidatabase systems. IEEE Computer 24(12):12–18

    Article  Google Scholar 

  20. Lenzerini M (2002) Data integration: a theoretical perspective. In: Proceedings of the PODS 2002, pp 233–246

    Google Scholar 

  21. Levy AY, Mendelzon AO, Sagiv Y, Srivastava D (1995) Answering queries using views. In: Proceedings of the PODS 1995, pp 95–104

    Google Scholar 

  22. Lim EP, Chiang RH (1998) A global object model for accommodating instance heterogeneities. In: Proceedings of the ER’98, Singapore, pp 435–448

    Google Scholar 

  23. Lin J, Mendelzon AO (1998) Merging databases under constraints. International Journal of Cooperative Information Systems 7(1):55–76

    Article  Google Scholar 

  24. Motro A, Anokhin P (2005) Fusionplex: Resolution of Data Inconsistencies in the Data Integration of Heterogeneous Information Sources. Information Fusion

    Google Scholar 

  25. Naumann F, Häussler M (2002) Declarative data merging with conflict resolution. In: 7th International Conference on Information Quality, pp 212–214

    Google Scholar 

  26. Naumann F, Leser U, Freytag JC (1999) Quality-driven integration of heterogenous information systems. In: Proceedings of the VLDB’99, pp 447–458

    Google Scholar 

  27. Papakonstantinou Y, Abiteboul S, Garcia-Molina H (1996) Object fusion in mediator systems. In: Proceedings of the VLDB 1996, pp 413–424

    Google Scholar 

  28. Papotti P, Naumann F, Kruse S (2015) Estimating data integration and cleaning effort. In: Proceedings of the 18th International Conference on Extending Database Technology (EDBT 2015), Brussels, March 23–27, 2015, pp 61–72

    Google Scholar 

  29. Saaty TL (1980) The Analytic Hierarchy Process. McGraw-Hill, New York

    MATH  Google Scholar 

  30. Scannapieco M, Virgillito A, Marchetti C, Mecella M, Baldoni R (2004) The DaQuinCIS architecture: a platform for exchanging and improving data quality in cooperative information systems. Information Systems 29(7):551–582

    Article  Google Scholar 

  31. Scannapieco M, Pernici B, Pierce EM (2005) IP-UML: a methodology for quality improvement based on IP-MAP and UML. In: Wang RY, Pierce EM, Madnick SE, Fisher CW (eds) Advances in Management Information Systems - Information Quality (AMIS-IQ) Monograph, Sharpe ME

    Google Scholar 

  32. Schallehn E, Sattler KU, Saake G (San Jose, CA, 2002) Extensible and similarity-based grouping for data integration. In: Proceedings of the ICDE 2002, pp 277–277

    Google Scholar 

  33. Ullman JD (1988) Principles of Database and Knowledge-Base Systems. Computer Science Press, Rockville

    Google Scholar 

  34. Wiederhold G (1992) Mediators in the architecture of future information systems. IEEE Computer 25(3):38–49

    Article  Google Scholar 

  35. Yan LL, Ozsu T (1999) Conflict tolerant queries in AURORA. In: Proceedings of the CoopIS’99, pp 279–290

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Batini, C., Scannapieco, M. (2016). Data Quality Issues in Data Integration Systems. In: Data and Information Quality. Data-Centric Systems and Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-24106-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24106-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24104-3

  • Online ISBN: 978-3-319-24106-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics