[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Data mining for discrimination discovery

Published: 28 May 2010 Publication History

Abstract

In the context of civil rights law, discrimination refers to unfair or unequal treatment of people based on membership to a category or a minority, without regard to individual merit. Discrimination in credit, mortgage, insurance, labor market, and education has been investigated by researchers in economics and human sciences. With the advent of automatic decision support systems, such as credit scoring systems, the ease of data collection opens several challenges to data analysts for the fight against discrimination. In this article, we introduce the problem of discovering discrimination through data mining in a dataset of historical decision records, taken by humans or by automatic systems. We formalize the processes of direct and indirect discrimination discovery by modelling protected-by-law groups and contexts where discrimination occurs in a classification rule based syntax. Basically, classification rules extracted from the dataset allow for unveiling contexts of unlawful discrimination, where the degree of burden over protected-by-law groups is formalized by an extension of the lift measure of a classification rule. In direct discrimination, the extracted rules can be directly mined in search of discriminatory contexts. In indirect discrimination, the mining process needs some background knowledge as a further input, for example, census data, that combined with the extracted rules might allow for unveiling contexts of discriminatory decisions. A strategy adopted for combining extracted classification rules with background knowledge is called an inference model. In this article, we propose two inference models and provide automatic procedures for their implementation. An empirical assessment of our results is provided on the German credit dataset and on the PKDD Discovery Challenge 1999 financial dataset.

Supplementary Material

Ruggieri Appendix (a9-ruggieri-apndx.pdf)
Online appendix to data mining for discrimination discovery on article 9.

References

[1]
Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of the International Conference on Very Large Databases. Morgan Kaufmann, 487--499.
[2]
Agrawal, R. and Srikant, R. 2000. Privacy-preserving data mining. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 439--450.
[3]
Australian Legislation. 2009. (a) Equal Opportunity Act—Victoria State, (b) Anti-Discrimination Act—Queensland State. http://www.austlii.edu.au.
[4]
Baesens, B., Gestel, T. V., Viaene, S., Stepanova, M., Suykens, J., and Vanthienen, J. 2003. Benchmarking state-of-the-art classification algorithms for credit scoring. J. Oper. Resear. Soc. 54, 6, 627--635.
[5]
Becker, G. S. 1957. The Economics of Discrimination. University of Chicago Press.
[6]
Berka, P. 1999. PKDD 1999 discovery challenge. http://lisp.vse.cz/challenge.
[7]
Chien, C.-F. and Chen, L. 2008. Data mining to improve personnel selection and enhance human capital: A case study in high-technology industry. Exp. Syst. Appl. 34, 1, 280--290.
[8]
Clifton, C. 2003. Privacy preserving data mining: How do we mine data when we aren't allowed to see it? In Procedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Tutorial. http://www.cs.purdue.edu/homes/clifton.
[9]
European Union Legislation. 2009. (a) Racial Equality Directive, (b) Employment Equality Directive. http://ec.europa.eu/employment_social/fundamental_rights.
[10]
Gastwirth, J. L. 1992. Statistical reasoning in the legal setting. Amer. Statist. 46, 1, 55--69.
[11]
Goethals, B. 2009. Frequent itemset mining implementations repository. http://fimi.cs.helsinki.fi.
[12]
Hand, D. J. 2001. Modelling consumer credit risk. IMA J. Manag. Math. 12, 139--155.
[13]
Hand, D. J. and Henley, W. E. 1997. Statistical classification methods in consumer credit scoring: A review. J. Royal Statist. Soc. Series A 160, 523--541.
[14]
Harford, T. 2008. Logic of Life. The Random House.
[15]
Hintoglu, A. A., Inan, A., Saygin, Y., and Keskinöz, M. 2005. Suppressing data sets to prevent discovery of association rules. In Proceedings of the IEEE International Conference on Data Mining. IEEE Computer Society, 645--648.
[16]
Holzer, H., Raphael, S., and Stoll, M. 2004. Black job applicants and the hiring officer's race. Industr. Labor Relat. Rev. 57, 2, 267--287.
[17]
Holzer, H. J. and Neumark, D. 2006. Affirmative action: What do we know? J. Policy Anal. Manag. 25, 463--490.
[18]
Hunter, R. 1992. Indirect Discrimination in the Workplace. The Federation Press.
[19]
Kamiran, F. and Calders, T. 2009. Classification without discrimination. In Proceedings of the IEEE International Conference on Computer, Control & Communication. IEEE Press.
[20]
Kaye, D. and Aickin, M., Eds. 1992. Statistical Methods in Discrimination Litigation. Marcel Dekker, Inc.
[21]
Knopff, R. 1986. On proving discrimination: Statistical methods and unfolding policy logics. Canad. Pub. Policy 12, 573--583.
[22]
Knuth, D. 1997. Fundamental Algorithms. Addison-Wesley.
[23]
Kuhn, P. 1987. Sex discrimination in labor markets: The role of statistical evidence. Amer. Econ. Rev. 77, 567--583.
[24]
LaCour-Little, M. 1999. Discrimination in mortgage lending: A critical review of the literature. J. Real Estate Lit. 7, 15--50.
[25]
Liu, B., Hsu, W., and Ma, Y. 1998. Integrating classification and association rule mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. AAAI Press, 80--86.
[26]
Liu, K. 2009. Privacy preserving data mining bibliography. http://www.csee.umbc.edu/~kunliu1/research/privacy_review.html.
[27]
Makkonen, T. 2007. Measuring discrimination: Data collection and the EU equality law. http://ec.europa.eu/employment_social/fundamental_rights.
[28]
Newman, D., Hettich, S., Blake, C., and Merz, C. 1998. UCI repository of machine learning databases. http://archive.ics.uci.edu/ml.
[29]
Pedreschi, D., Ruggieri, S., and Turini, F. 2008. Discrimination-aware data mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 560--568.
[30]
Pedreschi, D., Ruggieri, S., and Turini, F. 2009. Measuring discrimination in socially-sensitive decision records. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 581--592.
[31]
Piette, M. J. and White, P. F. 1999. Approaches for dealing with small sample sizes in employment discrimination litigation. J. Foren. Econ. 12, 43--56.
[32]
Rauch, J. 2005. Logic of association rules. Appl. Intell. 22, 1, 9--28.
[33]
Rauch, J. and Simunek, M. 2001. Mining for association rules by 4ft-Miner. In Proceedings of the INAP 2001. Prolog Association of Japan, 285--295.
[34]
Rauch, J. and Simunek, M. 2009. 4-ft Miner Procedure. http://lispminer.vse.cz.
[35]
Riach, P. A. and Rich, J. 2002. Field experiments of discrimination in the market place. Econ. J. 112, 480--518.
[36]
Squires, G. D. 2003. Racial profiling, insurance style: Insurance redlining and the uneven development of metropolitan areas. J. Urban Affairs 25, 4, 391--410.
[37]
Srikant, R. and Agrawal, R. 1995. Mining generalized association rules. In Proceedings of the International Conference on Very Large Databases. Morgan Kaufmann, 407--419.
[38]
Sweeney, L. 2001. Computational disclosure control: A primer on data privacy protection. Ph.D. thesis, MIT, Cambridge, MA.
[39]
Sweeney, L. 2002. Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzz. Knowl.-Bas. Syst. 10, 5, 571--588.
[40]
Tan, P.-N., Kumar, V., and Srivastava, J. 2004. Selecting the right objective measure for association analysis. Inform. Syst. 29, 4, 293--313.
[41]
Thomas, L. C. 2000. A survey of credit and behavioural scoring: Forecasting financial risk of lending to consumers. Int. J. Forecast. 16, 149--172.
[42]
U.K. Legislation. 2009. (a) Sex Discrimination Act, (b) Race Relation Act. http://www.statutelaw.gov.uk.
[43]
U.S. Federal Legislation. 2009. (a) Equal Credit Opportunity Act, (b) Fair Housing Act, (c) Intentional Employment Discrimination, (d) Equal Pay Act, (e) Pregnancy Discrimination Act. http://www.usdoj.gov.
[44]
Vaidya, J., Clifton, C. W., and Zhu, Y. M. 2006. Privacy Preserving Data Mining. Advances in Information Security. Springer.
[45]
Verykios, V. S., Elmagarmid, A. K., Bertino, E., Saygin, Y., and Dasseni, E. 2004. Association rule hiding. IEEE Trans. Knowl. Data Engin. 16, 4, 434--447.
[46]
Viaene, S., Derrig, R. A., Baesens, B., and Dedene, G. 2001. A comparison of state-of-the-art classification techniques for expert automobile insurance claim fraud detection. J. Risk Insur. 69, 3, 373--421.
[47]
Vojtek, M. and Kočenda, E. 2006. Credit scoring methods. J. Econ. Finance 56, 152--167.
[48]
Wang, K., Fung, B. C. M., and Yu, P. S. 2005. Template-based privacy preservation in classification problems. In Proceedings of the IEEE International Conference on Data Mining. IEEE Computer Society, 466--473.
[49]
Wu, X., Zhang, C., and Zhang, S. 2004. Efficient mining of both positive and negative association rules. ACM Trans. Inform. Syst. 22, 3, 381--405.
[50]
Yin, X. and Han, J. 2003. CPAR: Classification based on Predictive Association Rules. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 331--335.

Cited By

View all
  • (2024)Interpretability Gone Bad: The Role of Bounded Rationality in How Practitioners Understand Machine LearningProceedings of the ACM on Human-Computer Interaction10.1145/36373548:CSCW1(1-34)Online publication date: 26-Apr-2024
  • (2024)An algorithm frequently mines a set of items for big data to enhance efficiency2024 International Conference on Automation and Computation (AUTOCOM)10.1109/AUTOCOM60220.2024.10486150(313-319)Online publication date: 14-Mar-2024
  • (2024)Artificial intelligence in mobile forensics: A survey of current status, a use case analysis and AI alignment objectivesForensic Science International: Digital Investigation10.1016/j.fsidi.2024.30173749(301737)Online publication date: Jun-2024
  • Show More Cited By

Index Terms

  1. Data mining for discrimination discovery

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 4, Issue 2
    May 2010
    129 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/1754428
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 May 2010
    Accepted: 01 August 2009
    Revised: 01 June 2009
    Received: 01 July 2008
    Published in TKDD Volume 4, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Discrimination
    2. classification rules

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)60
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Interpretability Gone Bad: The Role of Bounded Rationality in How Practitioners Understand Machine LearningProceedings of the ACM on Human-Computer Interaction10.1145/36373548:CSCW1(1-34)Online publication date: 26-Apr-2024
    • (2024)An algorithm frequently mines a set of items for big data to enhance efficiency2024 International Conference on Automation and Computation (AUTOCOM)10.1109/AUTOCOM60220.2024.10486150(313-319)Online publication date: 14-Mar-2024
    • (2024)Artificial intelligence in mobile forensics: A survey of current status, a use case analysis and AI alignment objectivesForensic Science International: Digital Investigation10.1016/j.fsidi.2024.30173749(301737)Online publication date: Jun-2024
    • (2024)Exploring and mitigating gender bias in book recommender systems with explicit feedbackJournal of Intelligent Information Systems10.1007/s10844-023-00827-862:5(1325-1346)Online publication date: 1-Oct-2024
    • (2023)Three Pathways for Standardisation and Ethical Disclosure by Default under the European Union Artificial Intelligence ActSSRN Electronic Journal10.2139/ssrn.4365079Online publication date: 2023
    • (2023)Counterfactual Situation Testing: Uncovering Discrimination under Fairness given the DifferenceProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization10.1145/3617694.3623222(1-11)Online publication date: 30-Oct-2023
    • (2023)Fairness Implications of Encoding Protected Categorical AttributesProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society10.1145/3600211.3604657(454-465)Online publication date: 8-Aug-2023
    • (2023)Software Fault Tolerance in Real-Time Systems: Identifying the Future Research QuestionsACM Computing Surveys10.1145/358995055:14s(1-30)Online publication date: 17-Jul-2023
    • (2023)Users’ needs in interactive bias auditing tools introducing a requirement checklist and evaluating existing toolsAI and Ethics10.1007/s43681-023-00342-05:1(341-369)Online publication date: 18-Oct-2023
    • (2023)Detection and evaluation of bias-inducing features in machine learningEmpirical Software Engineering10.1007/s10664-023-10409-529:1Online publication date: 13-Dec-2023
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media