[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

PEBAM: A Profile-Based Evaluation Method for Bias Assessment on Mixed Datasets

  • Conference paper
  • First Online:
KI 2022: Advances in Artificial Intelligence (KI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13404))

Included in the following conference series:

  • 947 Accesses

Abstract

Bias evaluation methods focus either on individual bias or on group bias, where groups are defined based on protected attributes such as gender or ethnicity. More generally, however, descriptively relevant combinations of feature values in the data space (profiles) may serve also as anchors for biased decisions. This paper introduces therefore a semi-hierarchical clustering method for profile extraction from mixed datasets. It elaborates on how profiles can be used to reveal historical, representational, aggregation and evaluation biases in algorithmic decision-making models, taking as example the German credit data set. Our experiments show that the proposed profile-based evaluation method for bias assessment on mixed datasets (PEBAM) can reveal forms of bias towards profiles expressed by the dataset that are undetected when using individual- or group-bias metrics alone.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 39.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 49.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Available at: https://www.kaggle.com/uciml/german-credit.

  2. 2.

    Experimental setup: Intel Core i7-10510u, 16 GB RAM, Windows-10 64-bit.

  3. 3.

    https://github.com/mcwilms/PEBAM.

  4. 4.

    Note however that the same approach will apply with any choice of profile selection method or of ML method used to train the classifier.

References

  1. Allaj, E.: Two simple measures of variability for categorical data. J. Appl. Stat. 45(8), 1497–1516 (2018)

    Article  MathSciNet  Google Scholar 

  2. Belle, V., Papantonis, I.: Principles and practice of explainable machine learning. Front. Big Data 4 (2021)

    Google Scholar 

  3. Ben Ali, B., Massmoudi, Y.: K-means clustering based on Gower similarity coefficient: a comparative study. In: 2013 5th International Conference on Modeling, Simulation and Applied Optimization (ICMSAO), pp. 1–5. IEEE (2013)

    Google Scholar 

  4. Budiaji, W., Leisch, F.: Simple k-medoids partitioning algorithm for mixed variable data. Algorithms 12(9), 177 (2019)

    Article  Google Scholar 

  5. Caruso, G., Gattone, S., Fortuna, F., Di Battista, T.: Cluster analysis for mixed data: an application to credit risk evaluation. Soc.-Econ. Plan. Sci. 73, 100850 (2021)

    Article  Google Scholar 

  6. Cheng, C.H., Fu, A.W., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: Proceedings of the 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD 2009, pp. 84–93. Association for Computing Machinery, New York (1999)

    Google Scholar 

  7. Feldman, M., Friedler, S., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015, pp. 259–268. ACM (2015)

    Google Scholar 

  8. Friedler, S., Scheidegger, C., Venkatasubramanian, S., Choudhary, S., Hamilton, E., Roth, D.: A comparative study of fairness-enhancing interventions in machine learning. In: Proceedings of the Conference on fairness, accountability, and transparency, FAT 2019, pp. 329–338. ACM (2019)

    Google Scholar 

  9. Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971)

    Article  Google Scholar 

  10. Hardt, M., Price, E., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Advances in Neural Information Processing Systems (NIPS) (2016)

    Google Scholar 

  11. Kamiran, F., Calders, T.: Classifying without discriminating. In: Proceedings of 2nd IEEE International Conference on Computer, Control and Communication (2009)

    Google Scholar 

  12. Kamiran, F., Žliobaitė, I., Calders, T.: Quantifying explainable discrimination and removing illegal discrimination in automated decision making. Knowl. Inf. Syst. 35(3), 613–644 (2013)

    Article  Google Scholar 

  13. Kamishima, T., Akaho, S., Sakuma, J.: Fairness-aware learning through regularization approach. In: 2011 IEEE 11th International Conference on Data Mining Workshops, pp. 643–650. IEEE (2011)

    Google Scholar 

  14. Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. ACM Comput. Surv. 54(6), 1–35 (2021)

    Article  Google Scholar 

  15. Pleis, J.: Mixtures of Discrete and Continuous Variables: Considerations for Dimension Reduction. Ph.D. thesis, University of Pittsburgh (2018)

    Google Scholar 

  16. Suresh, H., Guttag, J.: A framework for understanding sources of harm throughout the machine learning life cycle. In: Equity and Access in Algorithms, Mechanisms, and Optimization, EAAMO 2021 (2021)

    Google Scholar 

  17. Tiwari, M., Zhang, M.J., Mayclin, J., Thrun, S., Piech, C., Shomorony, I.: Banditpam: almost linear time k-medoids clustering via multi-armed bandits. In: Advances in Neural Information Processing Systems (NIPS) (2020)

    Google Scholar 

  18. Wachter, S., Mittelstadt, B., Russell, C.: Why fairness cannot be automated: bridging the gap between EU non-discrimination law and AI. Comput. Law Secur. Rev. 41, 105567 (2021)

    Article  Google Scholar 

Download references

Acknowledgments

Giovanni Sileno was partly funded by the Dutch Research Council (NWO) for the HUMAINER AI project (KIVI.2019.006).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giovanni Sileno .

Editor information

Editors and Affiliations

A Profiles on the German Credit Dataset

A Profiles on the German Credit Dataset

The following table reports the profiles selected on the German credit dataset by applying the semi-hierarchical clustering proposed in the paper, as described by their medoids:

Profile

Age

Sex

Job

Housing

Saving accounts

Checking account

Credit amount

Duration

Purpose

Sample

0

26

Male

2

Rent

Moderate

Unknown

3577

9

Car

859

1

37

Male

2

Own

Unknown

Unknown

7409

36

Business

868

2

39

Male

3

Own

Little

Unknown

6458

18

Car

106

3

26

Male

2

Own

Little

Little

4370

42

Radio/TV

639

4

31

Male

2

Own

Quite rich

Unknown

3430

24

Radio/TV

19

5

38

Female

2

Own

Unknown

Unknown

1240

12

Radio/TV

135

6

43

Male

1

Own

Little

Little

1344

12

Car

929

7

36

Male

2

Rent

Little

Little

2799

9

Car

586

8

39

Male

2

Own

Little

Little

2522

30

Radio/TV

239

9

31

Male

2

Own

Little

Moderate

1935

24

Business

169

10

33

Female

2

Own

Little

Little

1131

18

Furniture/equipment

166

11

26

Male

1

Own

Little

Moderate

625

12

Radio/TV

220

12

23

Male

2

Own

Unknown

Moderate

1444

15

Radio/TV

632

13

42

Male

2

Own

Little

Little

4153

18

Furniture/equipment

899

14

29

Male

2

Own

Unknown

Unknown

3556

15

Car

962

15

37

Female

2

Own

Little

Moderate

3612

18

Furniture/equipment

537

16

27

Female

2

Own

Little

Little

2389

18

Radio/TV

866

17

26

Female

2

Rent

Little

Unknown

1388

9

Furniture/equipment

582

18

29

Male

2

Own

Little

Unknown

2743

28

Radio/TV

426

19

53

Male

2

Free

Little

Little

4870

24

Car

4

20

36

Male

2

Own

Little

Little

1721

15

Car

461

21

38

Male

2

Own

Little

Unknown

804

12

Radio/TV

997

22

29

Male

2

Own

Little

Moderate

1103

12

Radio/TV

696

23

43

Male

2

Own

Unknown

Unknown

2197

24

Car

406

24

27

Male

2

Own

Little

Little

3552

24

Furniture/equipment

558

25

30

Male

2

Own

Little

Moderate

1056

18

Car

580

26

24

Female

2

Own

Little

Moderate

2150

30

Car

252

27

34

Male

2

Own

Little

Unknown

2759

12

Furniture/equipment

452

28

24

Female

2

Rent

Little

Little

2124

18

Furniture/equipment

761

29

34

Male

2

Own

Little

Moderate

5800

36

Car

893

30

34

Female

2

Own

Little

Unknown

1493

12

Radio/TV

638

31

30

Female

2

Own

Little

Unknown

1055

18

Car

161

32

35

Male

2

Own

Little

Unknown

2346

24

Car

654

33

35

Male

2

Own

Unknown

Unknown

1979

15

Radio/TV

625

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wilms, M., Sileno, G., Haned, H. (2022). PEBAM: A Profile-Based Evaluation Method for Bias Assessment on Mixed Datasets. In: Bergmann, R., Malburg, L., Rodermund, S.C., Timm, I.J. (eds) KI 2022: Advances in Artificial Intelligence. KI 2022. Lecture Notes in Computer Science(), vol 13404. Springer, Cham. https://doi.org/10.1007/978-3-031-15791-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15791-2_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15790-5

  • Online ISBN: 978-3-031-15791-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics