Abstract
The real-world impacts of social biases in artificial intelligence technologies has come increasingly to the fore in the last several years. Basic comprehensions and translations for how biases are represented in data is seen as a key step forward in mitigating harms in AI products and services. This paper examines the core issues of mental models with users and developers working with AI models, metrics, and interpretability in AI. With the assumption that users of tools such as IBM’s AI Fairness 360 and Google’s What-if Tool work within the environment of computational notebooks, such as those developed by Project Jupyter or Google Colab, this paper looks at the use of notebooks for visualization, collaboration, and narrative. In examining the design implications for these tools and environments, new directions are proposed for the development of more critical interactive tools to empower data science and aI teams to build more equitable AI models in the future.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bansal, G., Nushi, B., Kamar, E., Lasecki, W.S., Weld, D.S., Horvitz, E.: Beyond accuracy: the role of mental models in human-AI team performance. In: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, vol. 7, no. 1, pp. 2–11 (2019)
Bellamy, R., et al.: AI Fairness 360: an extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. arXiv preprint: arXiv:1801.01943 (2018)
Blackwell, A.F., et al.: Cognitive dimensions of notations: design tools for cognitive technology. In: Beynon, M., Nehaniv, C.L., Dautenhahn, K. (eds.) CT 2001. LNCS (LNAI), vol. 2117, pp. 325–341. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44617-6_31
Blackwell, A.F., Green, T.R.G.: Notational system – the cognitive dimensions of notations framework. In: Carroll, J.M. (ed.) HCI Models, Theories, and Frameworks: Toward a Multidisciplinary Science, pp. 103–134. Morgan Kaufmann, San Francisco. (2003)
Bos, N., Glasgow, K., Gersh, J., Harbison, I., Paul, C.L.: Mental models of AI-based systems: user predictions and explanations of image classification results. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, pp. 183–188. (2019). http://dx.doi.org/10.1177/81319631392
Chattopadhyay, S., Prasad, I., Henley, A.Z., Sarma, A., Barik, T.: What’s wrong with computational notebooks? Pain points, needs, and design opportunities. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI 20), 12 p. Association for Computing Machinery (2020). http://dx.doi.org/10.1145/3313831.3376729
Crenshaw, K.: Demarginalizing the intersection of race and sex: a black feminist critique of antidiscrimination doctrine, feminist theory, and antiracist policies. Univ. Chicago Legal Forum 1989(1), 139–167 (1989)
Dourish, P.: Algorithms and Their others: algorithmic culture in context. Big Data Soc. (2016). http://dx.doi.org/10.1177/2053951716665128
Hohman, F., Head, A., Caruana, R., Deline, R., Drucker, S.M.: GAMUT: a design probe to understand how data scientists understand machine learning models. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI 19), Paper 579, 13 p. Association for Computing Machinery (2019). http://dx.doi.org/10.1145/3290605.3300809
Hong, S.R., Hullman, J., Bertini, E.: Human factors in model interpretability: industry practices, challenges, and needs. In: Proceedings of the ACM on Human Computer Interaction 4. CSCW 1, Article 68, 26 p. (2020). http://dx.doi.org/10.1145/3392878
Kahneman, D.: Thinking Fast and Slow. Farrar Straus, and Giroux, New York (2011)
Kaur, H., Nori, H., Jenkins, S., Caruana, R., Wallach, H., Worthman Vaughan, J.: Interpreting interpretability: understanding data scientists’ use of interpretability tools for machine learning. In: Proceedings of the 202 CHI Conference on Human Factors in Computing Systems (CHI 20).14 p. Association for Computing Machinery (2020). http://dx.doi.org/10.1145/3313831.3376219
Kery, M.B., Myers, B.A.: Exploring exploratory programming. In: 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 25–29 (2017). http://dx.doi.org/10.1109/VLHCC.2017.0103446
Kocielnik, R., Amershi, S., Bennett, P.N.: Will you accept an imperfect AI? Exploring designs for adjusting end-user expectations of AI systems. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI 19). Association for Computing Machinery, Paper 411, 14 p. (2019). http://dx.doi.org/10.1145/3290605.3300641
Kulesza, T., Strumpf, S., Burnett, M. Kwan, I.: Tell me more? The effects of mental model soundness on personalizing an intelligent agent. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2012), 10 p. Association for Computing Machinery (2012). http://dx.doi.org/10.1145/2207676.2207678
Lau, S., Drosos, I., Markel, J.M., Guo, P.J.: The deisgn space of computational notebook: an analysis of 60 systems in academia and industry. In: 2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 1–11. (2020). http://dx.doi.org/10.1109/VLHCC.2020.9127201
Lemaignan, S., Fink, J., Dillenbourg, P., Braboszcz, C.: The cognitive correlates of anthropomorphism. In: Proceedings of the 2014 Human-Robot Interaction Conference, Workshop on Neurosciences and Robotics (2014)
Nielsen, J.: Usability Engineering. Academic Press, San Diego (1993)
Raji, I.D., et al.: These are the four most popular misconceptions people have about race & gender bias in algorithms…, 27 March 2021. https://twitter.com/rajiinio/status/1375957284061376516
Rakova, B., Chowdhury, R., Yang, J.: Assessing the intersection of organizational structure and FAT* efforts within industry: implications tutorial. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (2020)
Rule, A., Tabard, A., Hollan, J.: Exploration and explanation in computational notebooks. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI), Article 32, 12 p. Association for Computing Machinery (2018). http://dx.doi.org/10.1145/3173574.2173606
Schiff, D., Rakova, B., Ayesh, A., Fanti, A., Lennon, M.: Principles to practices for responsible AI: closing the gap. Presented at 2020 European Conference on AI (ECAI) Workshop on “Advancing Towards the SDGs: AI For a Fair, Just, and Equitable World (AI 4EQ)” (2020). https://arxiv.org/abs/2005.04707
Yin, M., Wortman Vaughn, J., Wallach, H.: Understanding the effect of accuracy on trust in machine learning models. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI 19), Paper 279, pp. 1–12. Association for Computing Machinery New York (2019). http://dx.doi.org/10.1145/3290605.3300509
Yip, J.C., et al.: Laughter is scary, but farting is cute: a conceptual model of children’s perspectives of creepy technologies. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI 2019), Paper 73, pp. 1–15. Association for Computing Machinery, New York (2019). http://dx.doi.org/10.1145/3290605.3300303
Wang, A.Y., Mittal, A., Brooks, C., Oney, S.: How data scientists use computational notebooks for real-life collaboration. In: Proceedings of the ACM on Human-Computer Interaction 3. CSCW, Article 39, 30 p. (2019). http://dx.doi.org/10.11453359141
Wexler, J., Pushkarna, M., Balukbasi, T., Wattenberg, M., Viégas, F., Wilson, J.: The what-if tool: interactive probing of machine learning models. IEEE Trans. vis. Comput. Graph. 26(1), 56–65 (2020). https://doi.org/10.1109/TVCG.2019.2934619
Wood, J., Kachkaev, A., Dykes, J.: Design exposition with literate visualization. IEEE Trans. Vis. Comput. Graph. 25(1), 759–768 (2019). http://dx.doi.org/10.1109/TVCG.2018.285436
Related Websites
Project Jupyter. https://jupyter.org/
Google Colab. https://colab.research.google.com/
AI meets Design toolkit. http://aimeets.design/
Poieto. https://aidesigntool.com/
Apache UIMA. https://uima.apache.org/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
AI Fairness 360:
Originally released by IBM researchers in 2018, this tool consists of programmatic libraries in the Python and R programming languages that have computational methods for looking at metrics for examining bias in datasets, and in pre-processing, in-process, and post-processing steps of the machine learning pipeline. It is designed for Python to fit into the “standard” machine learning pipeline that includes use of the commonly used scikit-learn library for machine learning. It does not include visualization tools.
What-if Tool:
Released by Google’s PAIR (People + AI Research) group in 2019. The What-if Tool also is designed for use in the Python programming language, as well as the ability to be used directly in the TensorBoard tool (visualization library for the TensorFlow deep learning framework also developed by Google) and to be used in Google’s own Colab notebooks. WIT contains multiple metrics as does AIF360, but also includes a configuration tool that reflects mental models of the designers in addressing such things as intersectional bias and allows for multiple interactive visualizations on a dataset or model.
Jupyter notebooks:
A computational notebook that grew out of the iPython notebooks and began as an independent project in 2015. Designed so that each “cell” in a notebook can contain Markdown (a way to create styles and visual hierarchy in a text-like document), or code in each cell that can be run with output displayed as in a terminal or interpreter environment, but also can have visualizations inline. Hosted locally on machines.
Google Colab:
A notebook environment similar to Jupyter notebooks, developed by Google. Unlike Jupyter notebooks, Colab notebooks can easily hide code for demonstration purposes and are hosted online using Google’s cloud infrastructure for running processes in the notebook.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Thompson, J. (2021). Mental Models and Interpretability in AI Fairness Tools and Code Environments. In: Stephanidis, C., et al. HCI International 2021 - Late Breaking Papers: Multimodality, eXtended Reality, and Artificial Intelligence. HCII 2021. Lecture Notes in Computer Science(), vol 13095. Springer, Cham. https://doi.org/10.1007/978-3-030-90963-5_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-90963-5_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90962-8
Online ISBN: 978-3-030-90963-5
eBook Packages: Computer ScienceComputer Science (R0)