PEBAM: A Profile-Based Evaluation Method for Bias Assessment on Mixed Datasets

Mieke Wilms¹¹,
Giovanni Sileno¹¹ &
Hinda Haned¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13404))

Included in the following conference series:

German Conference on Artificial Intelligence (Künstliche Intelligenz)

947 Accesses

Abstract

Bias evaluation methods focus either on individual bias or on group bias, where groups are defined based on protected attributes such as gender or ethnicity. More generally, however, descriptively relevant combinations of feature values in the data space (profiles) may serve also as anchors for biased decisions. This paper introduces therefore a semi-hierarchical clustering method for profile extraction from mixed datasets. It elaborates on how profiles can be used to reveal historical, representational, aggregation and evaluation biases in algorithmic decision-making models, taking as example the German credit data set. Our experiments show that the proposed profile-based evaluation method for bias assessment on mixed datasets (PEBAM) can reveal forms of bias towards profiles expressed by the dataset that are undetected when using individual- or group-bias metrics alone.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 39.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 49.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multi-criteria classification, sorting, and clustering: a bibliometric review and research agenda

Article 30 September 2022

Unveiling and Unraveling Aggregation and Dispersion Fallacies in Group MCDM

Article Open access 17 April 2023

Multicriteria Methods for Group Decision Processes: An Overview

Notes

1.
Available at: https://www.kaggle.com/uciml/german-credit.
2.
Experimental setup: Intel Core i7-10510u, 16 GB RAM, Windows-10 64-bit.
3.
https://github.com/mcwilms/PEBAM.
4.
Note however that the same approach will apply with any choice of profile selection method or of ML method used to train the classifier.

References

Allaj, E.: Two simple measures of variability for categorical data. J. Appl. Stat. 45(8), 1497–1516 (2018)
Article MathSciNet Google Scholar
Belle, V., Papantonis, I.: Principles and practice of explainable machine learning. Front. Big Data 4 (2021)
Google Scholar
Ben Ali, B., Massmoudi, Y.: K-means clustering based on Gower similarity coefficient: a comparative study. In: 2013 5th International Conference on Modeling, Simulation and Applied Optimization (ICMSAO), pp. 1–5. IEEE (2013)
Google Scholar
Budiaji, W., Leisch, F.: Simple k-medoids partitioning algorithm for mixed variable data. Algorithms 12(9), 177 (2019)
Article Google Scholar
Caruso, G., Gattone, S., Fortuna, F., Di Battista, T.: Cluster analysis for mixed data: an application to credit risk evaluation. Soc.-Econ. Plan. Sci. 73, 100850 (2021)
Article Google Scholar
Cheng, C.H., Fu, A.W., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: Proceedings of the 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD 2009, pp. 84–93. Association for Computing Machinery, New York (1999)
Google Scholar
Feldman, M., Friedler, S., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015, pp. 259–268. ACM (2015)
Google Scholar
Friedler, S., Scheidegger, C., Venkatasubramanian, S., Choudhary, S., Hamilton, E., Roth, D.: A comparative study of fairness-enhancing interventions in machine learning. In: Proceedings of the Conference on fairness, accountability, and transparency, FAT 2019, pp. 329–338. ACM (2019)
Google Scholar
Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971)
Article Google Scholar
Hardt, M., Price, E., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Advances in Neural Information Processing Systems (NIPS) (2016)
Google Scholar
Kamiran, F., Calders, T.: Classifying without discriminating. In: Proceedings of 2nd IEEE International Conference on Computer, Control and Communication (2009)
Google Scholar
Kamiran, F., Žliobaitė, I., Calders, T.: Quantifying explainable discrimination and removing illegal discrimination in automated decision making. Knowl. Inf. Syst. 35(3), 613–644 (2013)
Article Google Scholar
Kamishima, T., Akaho, S., Sakuma, J.: Fairness-aware learning through regularization approach. In: 2011 IEEE 11th International Conference on Data Mining Workshops, pp. 643–650. IEEE (2011)
Google Scholar
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. ACM Comput. Surv. 54(6), 1–35 (2021)
Article Google Scholar
Pleis, J.: Mixtures of Discrete and Continuous Variables: Considerations for Dimension Reduction. Ph.D. thesis, University of Pittsburgh (2018)
Google Scholar
Suresh, H., Guttag, J.: A framework for understanding sources of harm throughout the machine learning life cycle. In: Equity and Access in Algorithms, Mechanisms, and Optimization, EAAMO 2021 (2021)
Google Scholar
Tiwari, M., Zhang, M.J., Mayclin, J., Thrun, S., Piech, C., Shomorony, I.: Banditpam: almost linear time k-medoids clustering via multi-armed bandits. In: Advances in Neural Information Processing Systems (NIPS) (2020)
Google Scholar
Wachter, S., Mittelstadt, B., Russell, C.: Why fairness cannot be automated: bridging the gap between EU non-discrimination law and AI. Comput. Law Secur. Rev. 41, 105567 (2021)
Article Google Scholar

Download references

Acknowledgments

Giovanni Sileno was partly funded by the Dutch Research Council (NWO) for the HUMAINER AI project (KIVI.2019.006).

Author information

Authors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Mieke Wilms, Giovanni Sileno & Hinda Haned

Authors

Mieke Wilms
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Sileno
View author publications
You can also search for this author in PubMed Google Scholar
Hinda Haned
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giovanni Sileno .

Editor information

Editors and Affiliations

University of Trier, Trier, Rheinland-Pfalz, Germany
Ralph Bergmann
University of Trier, Trier, Germany
Lukas Malburg
University of Trier, Trier, Germany
Stephanie C. Rodermund
University of Trier, Trier, Germany
Ingo J. Timm

A Profiles on the German Credit Dataset

The following table reports the profiles selected on the German credit dataset by applying the semi-hierarchical clustering proposed in the paper, as described by their medoids:

Profile	Age	Sex	Job	Housing	Saving accounts	Checking account	Credit amount	Duration	Purpose	Sample
0	26	Male	2	Rent	Moderate	Unknown	3577	9	Car	859
1	37	Male	2	Own	Unknown	Unknown	7409	36	Business	868
2	39	Male	3	Own	Little	Unknown	6458	18	Car	106
3	26	Male	2	Own	Little	Little	4370	42	Radio/TV	639
4	31	Male	2	Own	Quite rich	Unknown	3430	24	Radio/TV	19
5	38	Female	2	Own	Unknown	Unknown	1240	12	Radio/TV	135
6	43	Male	1	Own	Little	Little	1344	12	Car	929
7	36	Male	2	Rent	Little	Little	2799	9	Car	586
8	39	Male	2	Own	Little	Little	2522	30	Radio/TV	239
9	31	Male	2	Own	Little	Moderate	1935	24	Business	169
10	33	Female	2	Own	Little	Little	1131	18	Furniture/equipment	166
11	26	Male	1	Own	Little	Moderate	625	12	Radio/TV	220
12	23	Male	2	Own	Unknown	Moderate	1444	15	Radio/TV	632
13	42	Male	2	Own	Little	Little	4153	18	Furniture/equipment	899
14	29	Male	2	Own	Unknown	Unknown	3556	15	Car	962
15	37	Female	2	Own	Little	Moderate	3612	18	Furniture/equipment	537
16	27	Female	2	Own	Little	Little	2389	18	Radio/TV	866
17	26	Female	2	Rent	Little	Unknown	1388	9	Furniture/equipment	582
18	29	Male	2	Own	Little	Unknown	2743	28	Radio/TV	426
19	53	Male	2	Free	Little	Little	4870	24	Car	4
20	36	Male	2	Own	Little	Little	1721	15	Car	461
21	38	Male	2	Own	Little	Unknown	804	12	Radio/TV	997
22	29	Male	2	Own	Little	Moderate	1103	12	Radio/TV	696
23	43	Male	2	Own	Unknown	Unknown	2197	24	Car	406
24	27	Male	2	Own	Little	Little	3552	24	Furniture/equipment	558
25	30	Male	2	Own	Little	Moderate	1056	18	Car	580
26	24	Female	2	Own	Little	Moderate	2150	30	Car	252
27	34	Male	2	Own	Little	Unknown	2759	12	Furniture/equipment	452
28	24	Female	2	Rent	Little	Little	2124	18	Furniture/equipment	761
29	34	Male	2	Own	Little	Moderate	5800	36	Car	893
30	34	Female	2	Own	Little	Unknown	1493	12	Radio/TV	638
31	30	Female	2	Own	Little	Unknown	1055	18	Car	161
32	35	Male	2	Own	Little	Unknown	2346	24	Car	654
33	35	Male	2	Own	Unknown	Unknown	1979	15	Radio/TV	625

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wilms, M., Sileno, G., Haned, H. (2022). PEBAM: A Profile-Based Evaluation Method for Bias Assessment on Mixed Datasets. In: Bergmann, R., Malburg, L., Rodermund, S.C., Timm, I.J. (eds) KI 2022: Advances in Artificial Intelligence. KI 2022. Lecture Notes in Computer Science(), vol 13404. Springer, Cham. https://doi.org/10.1007/978-3-031-15791-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-15791-2_17
Published: 12 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15790-5
Online ISBN: 978-3-031-15791-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PEBAM: A Profile-Based Evaluation Method for Bias Assessment on Mixed Datasets

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-criteria classification, sorting, and clustering: a bibliometric review and research agenda

Unveiling and Unraveling Aggregation and Dispersion Fallacies in Group MCDM

Multicriteria Methods for Group Decision Processes: An Overview

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Profiles on the German Credit Dataset

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

PEBAM: A Profile-Based Evaluation Method for Bias Assessment on Mixed Datasets

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-criteria classification, sorting, and clustering: a bibliometric review and research agenda

Unveiling and Unraveling Aggregation and Dispersion Fallacies in Group MCDM

Multicriteria Methods for Group Decision Processes: An Overview

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Profiles on the German Credit Dataset

A Profiles on the German Credit Dataset

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation