[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1948294.1948316guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Using semantic web resources for data quality management

Published: 11 October 2010 Publication History

Abstract

The quality of data is a critical factor for all kinds of decision-making and transaction processing. While there has been a lot of research on data quality in the past two decades, the topic has not yet received sufficient attention from the Semantic Web community. In this paper, we discuss (1) the data quality issues related to the growing amount of data available on the Semantic Web, (2) how data quality problems can be handled within the Semantic Web technology framework, namely using SPARQL on RDF representations, and (3) how Semantic Web reference data, e.g. from DBPedia, can be used to spot incorrect literal values and functional dependency violations. We show how this approach can be used for data quality management of public Semantic Web data and data stored in relational databases in closed settings alike. As part of our work, we developed generic SPARQL queries to identify (1) missing datatype properties or literal values, (2) illegal values, and (3) functional dependency violations. We argue that using Semantic Web datasets reduces the effort for data quality management substantially. As a use-case, we employ Geonames, a publicly available Semantic Web resource for geographical data, as a trusted reference for managing the quality of other data sources.

References

[1]
Redman, T.C.: Data quality for the information age. Artech House, Boston (1996).
[2]
Redman, T.C.: The impact of poor data quality on the typical enterprise. Communications of the ACM 41, 79-82 (1998).
[3]
Brett, S.: World Wide Web Consortium (W3C), http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/ layerCake-4.png (retrieved on March 8, 2010).
[4]
Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems 12(4), 5-33 (1996).
[5]
Redman, T.C.: Data quality: the field guide. Digital Press, Boston (2001).
[6]
Rahm, E., Do, H.-H.: Data Cleaning: Problems and Current Approaches. IEEE Data Engineering Bulletin 23(4), 3-13 (2000).
[7]
Oliveira, P., Rodrigues, F., Henriques, P.R., Galhardas, H.: A Taxonomy of Data Quality Problems. In: Proc. 2nd Int. Workshop on Data and Information Quality (in conjunction with CAiSE 2005), Porto, Portugal (2005).
[8]
Oliveira, P., Rodrigues, F., Henriques, P.R.: A Formal Definition of Data Quality Problems. In: International Conference on Information Quality (2005).
[9]
Leser, U., Naumann, F.: Informationsintegration: Architekturen und Methoden zur Integration verteilter und heterogener Datenquellen. Dpunkt-Verlag, Heidelberg (2007).
[10]
Kashyap, V., Sheth, A.P.: Semantic and Schematic Similarities Between Database Objects: A Context-Based Approach. Very Large Data Base Journal (5), 276-304 (1996).
[11]
Fürber, C., Hepp, M.: Using SPARQL and SPIN for Data Quality Management on the Semantic Web. In: 13th International Conference on Business Information Systems (BIS 2010), Berlin, Germany. LNBIP. Springer, Heidelberg (2010) (forthcoming).
[12]
Olson, J.: Data quality: the accuracy dimension. Morgan Kaufmann/Elsevier Science, Oxford (2003).
[13]
Wang, R.Y.: A product perspective on total data quality management. ACM Commun. 41, 58-65 (1998).
[14]
Gruber, T.R.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition 5, 199-220 (1993).
[15]
Hartig, O., Zhao, J.: Using Web Data Provenance for Quality Assessment. In: First International Workshop on the role of Semantic Web in Provenance Management (Co-located with the 8th International Semantic Web Conference, ISWC 2009, Washington DC, USA (2009).
[16]
Hartig, O.: Querying Trust in RDF Data with tSPARQL. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 5-20. Springer, Heidelberg (2009).
[17]
Bizer, C., Cyganiak, R.: Quality-driven information filtering using the WIQA policy framework. Web Semant 7, 1-10 (2009).
[18]
Lei, Y., Nikolov, A.: Detecting Quality Problems in Semantic Metadata without the Presence of a Gold Standard. In: EON, vol. 329, pp. 51-60 (2007), CEUR-WS.org
[19]
Brüggemann, S., Grüning, F.: Using Ontologies Providing Domain Knowledge for Data Quality Management. In: Pellegrini, T., Auer, S., Tochtermann, K., Schaffert, S. (eds.) Networked Knowledge - Networked Media, pp. 187-203. Springer, Heidelberg (2009).
[20]
Batini, C., Scannapieco, M.: Data quality: concepts, methodologies and techniques. Springer, Berlin (2006).

Cited By

View all
  • (2018)An adaptive neuro-fuzzy inference system for improving data quality in disease registriesProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167376(30-33)Online publication date: 9-Apr-2018
  • (2017)Towards a Semantic Outlier Detection Framework in Wireless Sensor NetworksProceedings of the 13th International Conference on Semantic Systems10.1145/3132218.3132226(152-159)Online publication date: 11-Sep-2017
  • (2017)Why good data analysts need to be critical synthesists. Determining the role of semantics in data analysisFuture Generation Computer Systems10.1016/j.future.2017.02.04672:C(11-22)Online publication date: 1-Jul-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
EKAW'10: Proceedings of the 17th international conference on Knowledge engineering and management by the masses
October 2010
587 pages
ISBN:3642164374
  • Editors:
  • Philipp Cimiano,
  • H. Sofia Pinto

Sponsors

  • Fundação Calouste Gulbenkian
  • Talis: Talis
  • CITEC: The Cognitive Interaction Technology Excellence Cluster
  • ISOCO: ISOCO
  • IOS Press: IOS Press

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 11 October 2010

Author Tags

  1. SPARQL
  2. data quality management
  3. geonames
  4. linked data
  5. metadata management
  6. ontologies
  7. ontology-based data quality management
  8. semantic web
  9. trust

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)An adaptive neuro-fuzzy inference system for improving data quality in disease registriesProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167376(30-33)Online publication date: 9-Apr-2018
  • (2017)Towards a Semantic Outlier Detection Framework in Wireless Sensor NetworksProceedings of the 13th International Conference on Semantic Systems10.1145/3132218.3132226(152-159)Online publication date: 11-Sep-2017
  • (2017)Why good data analysts need to be critical synthesists. Determining the role of semantics in data analysisFuture Generation Computer Systems10.1016/j.future.2017.02.04672:C(11-22)Online publication date: 1-Jul-2017
  • (2016)Semantic Web in data mining and knowledge discoveryWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2016.01.00136:C(1-22)Online publication date: 1-Jan-2016
  • (2015)Quality Metrics for Linked Open DataProceedings, Part I, of the 26th International Conference on Database and Expert Systems Applications - Volume 926110.1007/978-3-319-22849-5_11(144-152)Online publication date: 1-Sep-2015
  • (2014)Test-driven evaluation of linked data qualityProceedings of the 23rd international conference on World wide web10.1145/2566486.2568002(747-758)Online publication date: 7-Apr-2014
  • (2012)Recommendations using linked dataProceedings of the 5th Ph.D. workshop on Information and knowledge10.1145/2389686.2389701(75-82)Online publication date: 2-Nov-2012
  • (2012)Quality assessment, provenance, and the web of linked sensor dataProceedings of the 4th international conference on Provenance and Annotation of Data and Processes10.1007/978-3-642-34222-6_19(220-222)Online publication date: 19-Jun-2012
  • (2011)Learning to detect abnormal semantic web dataProceedings of the sixth international conference on Knowledge capture10.1145/1999676.1999713(177-178)Online publication date: 26-Jun-2011
  • (2011)Towards a vocabulary for data quality management in semantic web architecturesProceedings of the 1st International Workshop on Linked Web Data Management10.1145/1966901.1966903(1-8)Online publication date: 25-Mar-2011

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media