[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

A Formal Taxonomy to Improve Data Defect Description

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2016)

Abstract

Data quality assessment outcomes are essential for analytical processes, especially for big data environment. Its efficiency and efficacy depends on automated solutions, which are determined by understanding the problem associated with each data defect. Despite the considerable number of works that describe data defects regarding to accuracy, completeness and consistency, there is a significant heterogeneity of terminology, nomenclature, description depth and number of examined defects. To cover this gap, this work reports a taxonomy that organizes data defects according to a three-step methodology. The proposed taxonomy enhances the descriptions and coverage of defects with regard to the related works, and also supports certain requirements of data quality assessment, including the design of semi-supervised solutions to data defect detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Almutiry, O., Wills, G., Crowder, R.: A dimension-oriented taxonomy of data quality problems in electronic health records. In: 13th IADIS International Conference on e-Society, pp. 98–114. IADIS, Portugal (2015)

    Google Scholar 

  2. Barateiro, J., Galhardas, H.: A survey of data quality tools. Datenbank-Spektrum 14, 15–21 (2005)

    Google Scholar 

  3. Borek, A., Woodall, P., Oberhofer, M., Parlikad, A.K.: A classification of data quality assessment methods. In: 16th International Conference on Information Quality, pp. 189–203. IEEE Press, New York (2011)

    Google Scholar 

  4. English, L.P.: Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits. Wiley, New York (1999)

    Google Scholar 

  5. Fan, W., Geerts, F.: Foundations of Data Quality Management. Morgan & Claypool Publishers, San Rafael (2012)

    MATH  Google Scholar 

  6. Grefen, P.: Combining theory and practice in integrity control: a declarative approach to the specification of a transaction modification subsystem. In: 19th International Conference on Very Large Data Bases, pp. 581–591. Morgan Kaufmann Publishers Inc., Dublin, Ireland (1993)

    Google Scholar 

  7. Kim, W., Choi, B.-J., Hong, E.-K., Kim, S.-K., Lee, D.: A taxonomy of dirty data. Data Min. Knowl. Discov. 7, 81–99 (2003)

    Article  MathSciNet  Google Scholar 

  8. Laranjeiro, N., Soydemir, S.N., Bernardino, J.: A survey on data quality: classifying poor data. In: 21st Pacific Rim International Symposium on Dependable Computing, pp. 179–188. IEEE Press, Zhangjiajie, China (2015)

    Google Scholar 

  9. Li, L., Peng, T., Kennedy, J.: A rule based taxonomy of dirty data. GSTF Int. J. Comput. 1, 140–148 (2011)

    Article  Google Scholar 

  10. Müller, H., Freytag, J.C.: Problems, methods, and challenges in comprehensive data cleansing. Technical report, Humboldt University Berlin (2005)

    Google Scholar 

  11. Maier, D.: The Theory of Relational Databases. Computer Science Press, Rockville (1983)

    MATH  Google Scholar 

  12. Naumann, F.: Data profiling revisited. ACM SIGMOD Rec. 42, 40–49 (2014)

    Article  Google Scholar 

  13. Oliveira, P., Rodrigues, F., Henriques, P.: A formal definition of data quality problems. In: International Conference on Information Quality, pp. 181–184. IEEE Press, New York (2005)

    Google Scholar 

  14. Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. IEEE Bull. Tech. Comm. Data Eng. 23, 3–13 (2000)

    Google Scholar 

  15. Schmid, J.: The main steps to data quality. In: Perner, P. (ed.) ICDM 2004. LNCS (LNAI), vol. 3275, pp. 69–77. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  16. Winkler, W.E.: Methods for evaluating and creating data quality. Inf. Syst. 29, 531–550 (2004)

    Article  Google Scholar 

Download references

Acknowledgments

This work has been supported by CNPq (Brazilian National Research Council) grant number 141647/2011-6 and FAPESP (Sao Paulo State Research Foundation) grant number 2015/01587-0.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João Marcelo Borovina Josko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Josko, J.M.B., Oikawa, M.K., Ferreira, J.E. (2016). A Formal Taxonomy to Improve Data Defect Description. In: Gao, H., Kim, J., Sakurai, Y. (eds) Database Systems for Advanced Applications. DASFAA 2016. Lecture Notes in Computer Science(), vol 9645. Springer, Cham. https://doi.org/10.1007/978-3-319-32055-7_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32055-7_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32054-0

  • Online ISBN: 978-3-319-32055-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics