Abstract
Bias evaluation methods focus either on individual bias or on group bias, where groups are defined based on protected attributes such as gender or ethnicity. More generally, however, descriptively relevant combinations of feature values in the data space (profiles) may serve also as anchors for biased decisions. This paper introduces therefore a semi-hierarchical clustering method for profile extraction from mixed datasets. It elaborates on how profiles can be used to reveal historical, representational, aggregation and evaluation biases in algorithmic decision-making models, taking as example the German credit data set. Our experiments show that the proposed profile-based evaluation method for bias assessment on mixed datasets (PEBAM) can reveal forms of bias towards profiles expressed by the dataset that are undetected when using individual- or group-bias metrics alone.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Available at: https://www.kaggle.com/uciml/german-credit.
- 2.
Experimental setup: Intel Core i7-10510u, 16 GB RAM, Windows-10 64-bit.
- 3.
- 4.
Note however that the same approach will apply with any choice of profile selection method or of ML method used to train the classifier.
References
Allaj, E.: Two simple measures of variability for categorical data. J. Appl. Stat. 45(8), 1497–1516 (2018)
Belle, V., Papantonis, I.: Principles and practice of explainable machine learning. Front. Big Data 4 (2021)
Ben Ali, B., Massmoudi, Y.: K-means clustering based on Gower similarity coefficient: a comparative study. In: 2013 5th International Conference on Modeling, Simulation and Applied Optimization (ICMSAO), pp. 1–5. IEEE (2013)
Budiaji, W., Leisch, F.: Simple k-medoids partitioning algorithm for mixed variable data. Algorithms 12(9), 177 (2019)
Caruso, G., Gattone, S., Fortuna, F., Di Battista, T.: Cluster analysis for mixed data: an application to credit risk evaluation. Soc.-Econ. Plan. Sci. 73, 100850 (2021)
Cheng, C.H., Fu, A.W., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: Proceedings of the 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD 2009, pp. 84–93. Association for Computing Machinery, New York (1999)
Feldman, M., Friedler, S., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015, pp. 259–268. ACM (2015)
Friedler, S., Scheidegger, C., Venkatasubramanian, S., Choudhary, S., Hamilton, E., Roth, D.: A comparative study of fairness-enhancing interventions in machine learning. In: Proceedings of the Conference on fairness, accountability, and transparency, FAT 2019, pp. 329–338. ACM (2019)
Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971)
Hardt, M., Price, E., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Advances in Neural Information Processing Systems (NIPS) (2016)
Kamiran, F., Calders, T.: Classifying without discriminating. In: Proceedings of 2nd IEEE International Conference on Computer, Control and Communication (2009)
Kamiran, F., Žliobaitė, I., Calders, T.: Quantifying explainable discrimination and removing illegal discrimination in automated decision making. Knowl. Inf. Syst. 35(3), 613–644 (2013)
Kamishima, T., Akaho, S., Sakuma, J.: Fairness-aware learning through regularization approach. In: 2011 IEEE 11th International Conference on Data Mining Workshops, pp. 643–650. IEEE (2011)
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. ACM Comput. Surv. 54(6), 1–35 (2021)
Pleis, J.: Mixtures of Discrete and Continuous Variables: Considerations for Dimension Reduction. Ph.D. thesis, University of Pittsburgh (2018)
Suresh, H., Guttag, J.: A framework for understanding sources of harm throughout the machine learning life cycle. In: Equity and Access in Algorithms, Mechanisms, and Optimization, EAAMO 2021 (2021)
Tiwari, M., Zhang, M.J., Mayclin, J., Thrun, S., Piech, C., Shomorony, I.: Banditpam: almost linear time k-medoids clustering via multi-armed bandits. In: Advances in Neural Information Processing Systems (NIPS) (2020)
Wachter, S., Mittelstadt, B., Russell, C.: Why fairness cannot be automated: bridging the gap between EU non-discrimination law and AI. Comput. Law Secur. Rev. 41, 105567 (2021)
Acknowledgments
Giovanni Sileno was partly funded by the Dutch Research Council (NWO) for the HUMAINER AI project (KIVI.2019.006).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Profiles on the German Credit Dataset
A Profiles on the German Credit Dataset
The following table reports the profiles selected on the German credit dataset by applying the semi-hierarchical clustering proposed in the paper, as described by their medoids:
Profile | Age | Sex | Job | Housing | Saving accounts | Checking account | Credit amount | Duration | Purpose | Sample |
---|---|---|---|---|---|---|---|---|---|---|
0 | 26 | Male | 2 | Rent | Moderate | Unknown | 3577 | 9 | Car | 859 |
1 | 37 | Male | 2 | Own | Unknown | Unknown | 7409 | 36 | Business | 868 |
2 | 39 | Male | 3 | Own | Little | Unknown | 6458 | 18 | Car | 106 |
3 | 26 | Male | 2 | Own | Little | Little | 4370 | 42 | Radio/TV | 639 |
4 | 31 | Male | 2 | Own | Quite rich | Unknown | 3430 | 24 | Radio/TV | 19 |
5 | 38 | Female | 2 | Own | Unknown | Unknown | 1240 | 12 | Radio/TV | 135 |
6 | 43 | Male | 1 | Own | Little | Little | 1344 | 12 | Car | 929 |
7 | 36 | Male | 2 | Rent | Little | Little | 2799 | 9 | Car | 586 |
8 | 39 | Male | 2 | Own | Little | Little | 2522 | 30 | Radio/TV | 239 |
9 | 31 | Male | 2 | Own | Little | Moderate | 1935 | 24 | Business | 169 |
10 | 33 | Female | 2 | Own | Little | Little | 1131 | 18 | Furniture/equipment | 166 |
11 | 26 | Male | 1 | Own | Little | Moderate | 625 | 12 | Radio/TV | 220 |
12 | 23 | Male | 2 | Own | Unknown | Moderate | 1444 | 15 | Radio/TV | 632 |
13 | 42 | Male | 2 | Own | Little | Little | 4153 | 18 | Furniture/equipment | 899 |
14 | 29 | Male | 2 | Own | Unknown | Unknown | 3556 | 15 | Car | 962 |
15 | 37 | Female | 2 | Own | Little | Moderate | 3612 | 18 | Furniture/equipment | 537 |
16 | 27 | Female | 2 | Own | Little | Little | 2389 | 18 | Radio/TV | 866 |
17 | 26 | Female | 2 | Rent | Little | Unknown | 1388 | 9 | Furniture/equipment | 582 |
18 | 29 | Male | 2 | Own | Little | Unknown | 2743 | 28 | Radio/TV | 426 |
19 | 53 | Male | 2 | Free | Little | Little | 4870 | 24 | Car | 4 |
20 | 36 | Male | 2 | Own | Little | Little | 1721 | 15 | Car | 461 |
21 | 38 | Male | 2 | Own | Little | Unknown | 804 | 12 | Radio/TV | 997 |
22 | 29 | Male | 2 | Own | Little | Moderate | 1103 | 12 | Radio/TV | 696 |
23 | 43 | Male | 2 | Own | Unknown | Unknown | 2197 | 24 | Car | 406 |
24 | 27 | Male | 2 | Own | Little | Little | 3552 | 24 | Furniture/equipment | 558 |
25 | 30 | Male | 2 | Own | Little | Moderate | 1056 | 18 | Car | 580 |
26 | 24 | Female | 2 | Own | Little | Moderate | 2150 | 30 | Car | 252 |
27 | 34 | Male | 2 | Own | Little | Unknown | 2759 | 12 | Furniture/equipment | 452 |
28 | 24 | Female | 2 | Rent | Little | Little | 2124 | 18 | Furniture/equipment | 761 |
29 | 34 | Male | 2 | Own | Little | Moderate | 5800 | 36 | Car | 893 |
30 | 34 | Female | 2 | Own | Little | Unknown | 1493 | 12 | Radio/TV | 638 |
31 | 30 | Female | 2 | Own | Little | Unknown | 1055 | 18 | Car | 161 |
32 | 35 | Male | 2 | Own | Little | Unknown | 2346 | 24 | Car | 654 |
33 | 35 | Male | 2 | Own | Unknown | Unknown | 1979 | 15 | Radio/TV | 625 |
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wilms, M., Sileno, G., Haned, H. (2022). PEBAM: A Profile-Based Evaluation Method for Bias Assessment on Mixed Datasets. In: Bergmann, R., Malburg, L., Rodermund, S.C., Timm, I.J. (eds) KI 2022: Advances in Artificial Intelligence. KI 2022. Lecture Notes in Computer Science(), vol 13404. Springer, Cham. https://doi.org/10.1007/978-3-031-15791-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-15791-2_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15790-5
Online ISBN: 978-3-031-15791-2
eBook Packages: Computer ScienceComputer Science (R0)