More Web Proxy on the site http://driver.im/

research-article

Data Quality and Explainable AI

Authors:

Leopoldo Bertossi,

Floris GeertsAuthors Info & Claims

Journal of Data and Information Quality (JDIQ), Volume 12, Issue 2

Article No.: 11, Pages 1 - 9

https://doi.org/10.1145/3386687

Published: 03 May 2020 Publication History

Abstract

In this work, we provide some insights and develop some ideas, with few technical details, about the role of explanations in Data Quality in the context of data-based machine learning models (ML). In this direction, there are, as expected, roles for causality, and explainable artificial intelligence. The latter area not only sheds light on the models, but also on the data that support model construction. There is also room for defining, identifying, and explaining errors in data, in particular, in ML, and also for suggesting repair actions. More generally, explanations can be used as a basis for defining dirty data in the context of ML, and measuring or quantifying them. We think dirtiness as relative to the ML task at hand, e.g., classification.

References

[1]

Z. Bahmani, L. Bertossi, and N. Nikolaos Vasiloglou. 2017. ERBlox: Combining matching dependencies with machine learning for entity resolution. International Journal of Approximate Reasoning 83 (2017), 118--141.

Digital Library

[2]

C. Batini and M. Scannapieco. 2016. Data Quality: Concepts, Methodologies and Techniques. Second edition, Springer.

[3]

L. Bertossi and M. Milani. 2018. Ontological multidimensional data models and contextual data quality. Journal of Data and Information Quality 9, 3 (2018), 14.1--14.36.

Digital Library

[4]

L. Bertossi, F. Rizzolo, and J. Lei. 2011. Data quality is context dependent. In Proc. of the Workshop on Enabling Real-Time Business Intelligence (BIRTE) Collocated with the International Conference on Very Large Data Bases (VLDB). Springer LNBIP 84, 52--67.

[5]

L. Bertossi and B. Salimi. 2017. From causes for database queries to repairs and model-based diagnosis and back. Theory of Computing Systems 61, 1 (2017), 191--232.

Digital Library

[6]

L. Bertossi and B. Salimi. 2017. Causes for query answers from databases: Datalog abduction, view-updates, and integrity constraints. International Journal of Approximate Reasoning 90 (2017), 226--252.

[7]

L. Bertossi, S. Kolahi, and L. Lakshmanan. 2013. Data cleaning and query answering with matching dependencies and matching functions. Theory of Computing Systems 52, 3 (2013), 441--482.

Digital Library

[8]

L. Bertossi, J. Li, M. Schleich, D. Suciu, and Z. Vagena. [n.d.]. Experimenting with score-based explanations for classification outcomes. Forthcoming.

[9]

D. Calvanese, M. Ortiz, M. Simkus, and G. Stefanoni. 2013. Reasoning about explanations for negative query answers in DL-lite. Journal of Artificial Intelligence Research 48 (2013), 635--669.

Digital Library

[10]

D. Calvanese, D. Lanti, A. Ozaki, R. Peñaloza, and G. Xiao. 2019. Enriching ontology-based data access with provenance. In Proc. IJCAI.

[11]

A. Chalamalla, I. F. Ilyas, M. Ouzzani, and P. Papotti. 2017. Descriptive and prescriptive data cleaning. In Proc. SIGMOD.

[12]

C. Chen, K. Lin, C. Rudin, Y. Shaposhnik, S. Wang, and T. Wang. [n.d.]. An interpretable model with globally consistent explanations for credit risk. In Proc. NIPS 2018 Workshop on Challenges and Opportunities for AI in Financial Services: the Impact of Fairness, Explainability, Accuracy, and Privacy.

[13]

H. Chockler and J. Y. Halpern. 2004. Responsibility and blame: A structural-model approach. Journal of Artificial Intelligence Research 22 (2004), 93--115.

Digital Library

[14]

F. Croce and M. Lenzerini. 2018. A framework for explaining query answers in DL-lite. In Proc. EKAW.

[15]

A. Datta, S. Sen, and Y. Zick. 2016. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In IEEE Symposium on Security and Privacy.

[16]

U. Draisbach, P. Christen, and F. Naumann. 2019. Transforming pairwise duplicates to entity clusters for high-quality duplicate detection. Journal of Data and Information Quality 12, 1 (2019), 3:1--3:30.

[17]

J. Du, K. Wang, and Y. Shen. 2014. A tractable approach to ABox abduction over description logic ontologies. In Proc. AAAI.

[18]

P. Dubey and L. S. Shapley. 1979. Mathematical properties of the Banzhaf power index. Mathematics of Operations Research 4, 2 (1979), 99--131.

Digital Library

[19]

W. Fan and F. Geerts. 2012. Foundations of Data Quality Management. Morgan 8 Claypool.

[20]

W. Fan, H. Gao, X. Ji, J. Li, and S. Ma. 2009. Dynamic constraints for record matching. The International Journal on Very Large Data Bases (VLDBJ) 20, 4 (2009), 495--520.

Digital Library

[21]

J. Halpern and J. Pearl. 2005. Causes and explanations: A structural-model approach: Part 1. British Journal of Philosophy of Science 56 (2005), 843--887.

[22]

A. Heidari, J. McGrath, I. F. Ilyas, and Th. Rekatsinas. 2019. HoloDetect: Few-shot learning for error detection. In Proc. Sigmod.

Digital Library

[23]

L. Jiang, A. Borgida, and J. Mylopoulos. 2008. Towards a compositional semantic account of data quality atrributes. In Proc. International Conference on Conceptual Modeling (ER). 55--68.

[24]

M. A. Khamis, H. Q. Ngo, X. Nguyen, D. Olteanu, and M. Schleich. 2018. AC/DC: In-database learning thunderstruck. In Proc. DEEM.

[25]

P. Kouki, J. Pujara, C. Marcum, L. Koehly, and L. Getoor. 2019. Collective entity resolution in multi-relational familial networks. Knowledge and Information Systems 61, 3 (2019), 1547--1581.

Digital Library

[26]

B. Kimelfeld and C. Ré. 2017. A relational framework for classifier engineering. In Proc. PODS.

[27]

J. Kleinberg, J. Ludwig, S. Mullainathan, and A. Rambachan. 2018. Algorithmic fairness. AEA Papers and Proceedings 108 (2018), 22--27.

[28]

J. Krishnan, M. J. Franklin, K. Goldberg, J. Wang, and E. Wu. 2017. BoostClean: Automated error detection and repair for machine learning. arXiv:1711.01299 (2017).

[29]

E. Livshits, L. Bertossi, B. Kimelfeld, and M. Sebag. 2020. The Shapley value of tuples in query answering. In Proc. ICDT. arXiv:1904.08679.

[30]

S. Lundberg and S.-I. Lee. 2017. A unified approach to interpreting model predictions. In Proc. NIPS.

[31]

A. Meliou, W. Gatterbauer, K. F. Moore, and D. Suciu. 2010. The complexity of causality and responsibility for query answers and non-answers. In Proc. VLDB.

[32]

J. Pearl. 2009. Causality: Models, Reasoning and Inference. Cambridge Univ. Press, 2nd ed.

Digital Library

[33]

J. Rammelaere and F. Geerts. 2018. Explaining repaired data with CFDs. In Proc. VLDB.

[34]

A. Roth (ed.). 1988. The Shapley Value: Essays in Honor of Lloyd S. Shapley. Cambridge University Press.

[35]

C. Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206--215. arXiv:1811.10154

[36]

P. Saleiro, B. Kuester, A. Stevens, A. Anisfeld, L. Hinkson, J. London, and R. Ghani. 2018. Aequitas: A bias and fairness audit toolkit. CoRR abs/1811.05577 (2018).

[37]

B. Salimi, L. Bertossi, D. Suciu, and G. Van den Broeck. 2016. Quantifying causal effects on query answering in databases. In Proc. TaPP.

[38]

B. Salimi, J. Gehrke, and D. Dan Suciu. 2018. Bias in OLAP queries: Detection, explanation, and removal. In Proc. SIGMOD. 1021--1035.

[39]

B. Salimi, B. Howe, and D. Suciu. 2019. Data management for causal algorithmic fairness. IEEE Data Engineering Bulletin 42, 3 (2019), 24--35.

[40]

D. Suciu, D. Olteanu, C. Re, and C. Koch. 2011. Probabilistic Databases. Synthesis Lectures on Data Management, Morgan 8 Claypool Publishers.

Cited By

Callejón-Leblic MBlanco-Trejo SVillarreal-Garza BPicazo-Reina ATena-García BLara-Delgado ALazo-Maestre MLópez-Benítez FEscobar-Reyero FÁlvarez-Cendrero MCalero-Ramos MLópez-Ladrón CAlonso-González CRopero-Romero FAndrés-Ustarroz LTalaminos-Barroso AAtienza-Ruiz MCantero-Lorente JMoreno-Conde AMoreno-Conde JSánchez-Gómez S(2024)A multimodal database for the collection of interdisciplinary audiological research data in SpainUna base de datos multimodal para la recopilación de datos de investigación audiológicos interdisciplinaresAuditio10.51445/sja.auditio.vol8.2024.1098(e109)Online publication date: 27-Sep-2024
https://doi.org/10.51445/sja.auditio.vol8.2024.109
Alarefi M(2024)The Impact of Artificial Intelligence on Business Performance in Saudi Arabia: The Role of Technological Readiness and Data QualityEngineering, Technology & Applied Science Research10.48084/etasr.787114:5(16802-16807)Online publication date: 9-Oct-2024
https://doi.org/10.48084/etasr.7871
Manias GAzqueta-Alzúaz ADalianis AGriffiths JKalogerini MKostopoulou KKouremenou EKranas PKyriazakos SLekka DMelillo FPatiño-Martinez MGarcia-Perales OPnevmatikakis ATorrens SWajid UKyriazis D(2024)Advanced Data Processing of Pancreatic Cancer Data Integrating Ontologies and Machine Learning Techniques to Create Holistic Health RecordsSensors10.3390/s2406173924:6(1739)Online publication date: 7-Mar-2024
https://doi.org/10.3390/s24061739
Show More Cited By

Index Terms

Data Quality and Explainable AI
1. Information systems

Recommendations

Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
Highlights
- We review concepts related to the explainability of AI methods (XAI).
- We comprehensive analyze the XAI literature organized in two taxonomies.
- We identify future research directions of the XAI field.
- We discuss potential ...
Abstract
In the last few years, Artificial Intelligence (AI) has achieved a notable momentum that, if harnessed appropriately, may deliver the best of expectations over many application sectors across the field. For this to occur shortly in Machine ...
Counterfactual Explainable Recommendation
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

By providing explanations for users and system designers to facilitate better understanding and decision making, explainable recommendation has been an important research problem. In this paper, we propose Counterfactual Explainable Recommendation (...
The Use and Misuse of Counterfactuals in Ethical Machine Learning
FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency

The use of counterfactuals for considerations of algorithmic fairness and explainability is gaining prominence within the machine learning community and industry. This paper argues for more caution with the use of counterfactuals when the facts to be ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Data and Information Quality

Journal of Data and Information Quality Volume 12, Issue 2

Special Issue on Quality Assessment of Knowledge Graphs and On the Horizon

June 2020

105 pages

ISSN:1936-1955

EISSN:1936-1963

DOI:10.1145/3397186

Editor:
Tiziana Catarci
Sapienza University of Rome, Rome, Italy

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 May 2020

Accepted: 01 March 2020

Received: 01 March 2020

Published in JDIQ Volume 12, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
1,690
Total Downloads

Downloads (Last 12 months)287
Downloads (Last 6 weeks)29

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Callejón-Leblic MBlanco-Trejo SVillarreal-Garza BPicazo-Reina ATena-García BLara-Delgado ALazo-Maestre MLópez-Benítez FEscobar-Reyero FÁlvarez-Cendrero MCalero-Ramos MLópez-Ladrón CAlonso-González CRopero-Romero FAndrés-Ustarroz LTalaminos-Barroso AAtienza-Ruiz MCantero-Lorente JMoreno-Conde AMoreno-Conde JSánchez-Gómez S(2024)A multimodal database for the collection of interdisciplinary audiological research data in SpainUna base de datos multimodal para la recopilación de datos de investigación audiológicos interdisciplinaresAuditio10.51445/sja.auditio.vol8.2024.1098(e109)Online publication date: 27-Sep-2024
https://doi.org/10.51445/sja.auditio.vol8.2024.109
Alarefi M(2024)The Impact of Artificial Intelligence on Business Performance in Saudi Arabia: The Role of Technological Readiness and Data QualityEngineering, Technology & Applied Science Research10.48084/etasr.787114:5(16802-16807)Online publication date: 9-Oct-2024
https://doi.org/10.48084/etasr.7871
Manias GAzqueta-Alzúaz ADalianis AGriffiths JKalogerini MKostopoulou KKouremenou EKranas PKyriazakos SLekka DMelillo FPatiño-Martinez MGarcia-Perales OPnevmatikakis ATorrens SWajid UKyriazis D(2024)Advanced Data Processing of Pancreatic Cancer Data Integrating Ontologies and Machine Learning Techniques to Create Holistic Health RecordsSensors10.3390/s2406173924:6(1739)Online publication date: 7-Mar-2024
https://doi.org/10.3390/s24061739
Yandrapalli V(2024)AI-Powered Data Governance: A Cutting-Edge Method for Ensuring Data Quality for Machine Learning Applications2024 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE)10.1109/ic-ETITE58242.2024.10493601(1-6)Online publication date: 22-Feb-2024
https://doi.org/10.1109/ic-ETITE58242.2024.10493601
De Cillis AMerla CMonti GTarricone LZappatore M(2024)High-Frequency Irreversible Electroporation: Optimum Parameter Prediction via Machine-LearningIEEE Journal of Electromagnetics, RF and Microwaves in Medicine and Biology10.1109/JERM.2024.33785738:3(220-228)Online publication date: Sep-2024
https://doi.org/10.1109/JERM.2024.3378573
Brée TKarger EAhlemann F(2024)Shaping the Future of Data Ecosystem Research—What Is Still Missing?IEEE Access10.1109/ACCESS.2024.343296912(103162-103175)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3432969
Ofosu-Ampong K(2024)Artificial intelligence research: A review on dominant themes, methods, frameworks and future research directionsTelematics and Informatics Reports10.1016/j.teler.2024.10012714(100127)Online publication date: Jun-2024
https://doi.org/10.1016/j.teler.2024.100127
Kaplan SUusitalo HLensu L(2024)A unified and practical user-centric framework for explainable artificial intelligenceKnowledge-Based Systems10.1016/j.knosys.2023.111107283(111107)Online publication date: Jan-2024
https://doi.org/10.1016/j.knosys.2023.111107
Haque SMengersen KBarr IWang LYang WVardoulakis SBambrick HHu W(2024)Towards development of functional climate-driven early warning systems for climate-sensitive infectious diseases: Statistical models and recommendationsEnvironmental Research10.1016/j.envres.2024.118568249(118568)Online publication date: May-2024
https://doi.org/10.1016/j.envres.2024.118568
Priestley MO’donnell FSimperl E(2023)A Survey of Data Quality Requirements That Matter in ML Development PipelinesJournal of Data and Information Quality10.1145/359261615:2(1-39)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3592616
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents