[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Trusting My Predictions: On the Value of Instance-Level Analysis

Published: 09 April 2024 Publication History

Abstract

Machine Learning solutions have spread along many domains, including critical applications. The development of such models usually relies on a dataset containing labeled data. This dataset is then split into training and test sets and the accuracy of the models in replicating the test labels is assessed. This process is often iterated in a cross-validation procedure for obtaining average performance estimates. But is the average of the predictive performance on test sets enough for assessing the trustfulness of a Machine Learning model? This paper discusses the importance of knowing which individual observations of a dataset are more challenging than others and how this characteristic can be measured and used in order to improve classification performance and trustfulness. A set of strategies for measuring the hardness level of the instances of a dataset is surveyed and a Python package containing their implementation is provided.

References

[1]
José L. M. Arruda, Ricardo B. C. Prudêncio, and Ana C. Lorena. 2020. Measuring instance hardness using data complexity measures. In Brazilian Conference on Intelligent Systems. Springer, 483–497.
[2]
Victor H. Barella, Luis P. F. Garcia, Marcilio C. P. de Souto, Ana C. Lorena, and André C. P. L. F. de Carvalho. 2021. Assessing the data complexity of imbalanced datasets. Information Sciences 553 (2021), 83–109.
[3]
Christopher M. Bishop and Nasser M. Nasrabadi. 2006. Pattern Recognition and Machine Learning, Vol. 4. Springer.
[4]
Björn Böken. 2021. On the appropriateness of Platt scaling in classifier calibration. Information Systems 95 (2021), 101641.
[5]
Leo Breiman. 1996. Bagging predictors. Machine Learning 24, 2 (1996), 123–140.
[6]
Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5–32.
[7]
André L. Brun, Alceu S. Britto, Luiz S. Oliveira, Fabricio Enembreck, and Robert Sabourin. 2016. Contribution of data complexity features on dynamic classifier selection. In 2016 International Joint Conference on Neural Networks (IJCNN’16). IEEE, 4396–4403.
[8]
David Charte, Francisco Charte, and Francisco Herrera. 2021. Reducing data complexity using autoencoders with class-informed loss functions. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 12 (2021), 9549–9560.
[9]
Yu Chen, Telmo M. Silva Filho, Ricardo B. Prudencio, Tom Diethe, and Peter Flach. 2019. \(\beta ^3\)-IRT: A new item response model and its applications. In Proceedings of Machine Learning Research(Proceedings of Machine Learning Research, Vol. 89). PMLR, 1013–1021.
[10]
Alberto Fernández, Salvador García, Mikel Galar, Ronaldo C. Prati, Bartosz Krawczyk, and Francisco Herrera. 2018. Learning from Imbalanced Data Sets, Vol. 11. Springer.
[11]
Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics (2001), 1189–1232.
[12]
Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Vol. 29. Curran Associates, Inc.https://proceedings.neurips.cc/paper/2016/file/9d2682367c3935defcb1f9e247a97c0d-Paper.pdf
[13]
Tin Kam Ho and Mitra Basu. 2002. Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 3 (2002), 289–300.
[14]
Andrew Houston, Georgina Cosma, Phillipa Turner, and Alexander Bennett. 2021. Predicting surgical outcomes for chronic exertional compartment syndrome using a machine learning framework with embedded trust by interrogation strategies. Scientific Reports 11, 1 (2021), 24281.
[15]
Heinrich Jiang, Been Kim, Melody Y. Guan, and Maya R. Gupta. 2018. To trust or not to trust a classifier. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3–8, 2018, Montréal, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). 5546–5557.
[16]
Joanna Komorniczak, Paweł Ksieniewicz, and Michał Woźniak. 2022. Data complexity and classification accuracy correlation in oversampling algorithms. In Fourth International Workshop on Learning with Imbalanced Domains: Theory and Applications. PMLR, 175–186.
[17]
Carmen Lancho, Isaac Martín De Diego, Marina Cuesta, Victor Acena, and Javier M. Moguerza. 2023. Hostility measure for multi-level study of data complexity. Applied Intelligence 53, 7 (2023), 8073–8096.
[18]
E. Leyva, A. González, and R. Pérez. 2014. A set of complexity measures designed for applying meta-learning to instance selection. IEEE Transactions on Knowledge and Data Engineering 27, 2 (2014), 354–367.
[19]
Enrique Leyva, Antonio González, and Raúl Pérez. 2015. Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective. Pattern Recognition 48, 4 (2015), 1523–1537.
[20]
Ling Li and Yaser S. Abu-Mostafa. 2006. Data complexity in machine learning. Technical Report CaltechCSTR:2006.004 (2006).
[21]
Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin Kam Ho. 2019. How complex is your classification problem? A survey on measuring classification complexity. ACM Computing Surveys (CSUR) 52, 5 (2019), 1–34.
[22]
Fernando Martínez-Plumed, Ricardo B. C. Prudêncio, Adolfo Martínez-Usó, and José Hernández-Orallo. 2019. Item response theory in AI: Analysing machine learning classifiers at the instance level. Artificial Intelligence 271 (2019), 18–42.
[23]
Fernando Martínez-Plumed, David Castellano, Carlos Monserrat-Aranda, and José Hernández-Orallo. 2022. When AI difficulty is easy: The explanatory power of predicting IRT difficulty. Proceedings of the AAAI Conference on Artificial Intelligence 36, 7 (2022), 7719–7727.
[24]
João V. C. Moraes, Jéssica T. S. Reinaldo, Manuel Ferreira-Junior, Telmo Silva Filho, and Ricardo B.C. Prudêncio. 2022. Evaluating regression algorithms at the instance level using item response theory. Knowledge-Based Systems 240 (2022), 108076. DOI:
[25]
Mario A. Muñoz, Laura Villanova, Davaatseren Baatar, and Kate Smith-Miles. 2018. Instance spaces for machine learning classification. Machine Learning 107, 1 (2018), 109–147.
[26]
Gustavo H. Nunes, Gustavo O. Martins, Carlos H. Q. Forster, and Ana C. Lorena. 2021. Using instance hardness measures in curriculum learning. In Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional. SBC, 177–188.
[27]
Pedro Yuri Arbs Paiva, Camila Castro Moreno, Kate Smith-Miles, Maria Gabriela Valeriano, and Ana Carolina Lorena. 2022. Relating instance hardness to classification performance in a dataset: A visual approach. Machine Learning 111, 8 (2022), 3085–3123.
[28]
Pedro Yuri Arbs Paiva, Kate Smith-Miles, Maria Gabriela Valeriano, and Ana Carolina Lorena. 2021. PyHard: A novel tool for generating hardness embeddings to support data-centric analysis. arXiv preprint arXiv:2109.14430 (2021).
[29]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research 12, 85 (2011), 2825–2830.
[30]
J. Pimentel, P. J. Azevedo, and L. Torgo. 2022. Subgroup mining for performance analysis of regression models (to appear). Expert Systems (2022).
[31]
Ricardo B. C. Prudêncio. 2020. Cost sensitive evaluation of instance hardness in machine learning. In Machine Learning and Knowledge Discovery in Databases, Ulf Brefeld, Elisa Fromont, Andreas Hotho, Arno Knobbe, Marloes Maathuis, and Céline Robardet (Eds.). Springer International Publishing, 86–102.
[32]
R. B. C. Prudêncio and T. M. Silva-Filho. 2022. Explaining learning performance with local performance regions and maximally relevant meta-rules. In Brazilian Conference on Intelligent Systems (to appear).
[33]
J. Ross Quinlan. 1986. Induction of decision trees. Machine Learning 1, 1 (1986), 81–106.
[34]
Hana Řezanková. 2018. Different approaches to the silhouette coefficient calculation in cluster evaluation. In 21st International Scientific Conference AMSE Applications of Mathematics and Statistics in Economics. 1–10.
[35]
José A. Sáez, Mikel Galar, Julián Luengo, and Francisco Herrera. 2014. Analyzing the presence of noise in multi-class problems: Alleviating its influence with the one-vs-one decomposition. Knowledge and Information Systems 38, 1 (2014), 179–206.
[36]
Habiba Muhammad Sani, Ci Lei, and Daniel Neagu. 2018. Computational complexity analysis of decision tree algorithms. In International Conference on Innovative Techniques and Applications of Artificial Intelligence. Springer, 191–197.
[37]
Miriam Seoane Santos, Pedro Henriques Abreu, Nathalie Japkowicz, Alberto Fernández, and João Santos. 2023. A unifying view of class overlap and imbalance: Key concepts, multi-view panorama, and open avenues for research. Information Fusion 89 (2023), 228–253.
[38]
Peter Schulam and Suchi Saria. 2019. Can you trust this prediction? Auditing pointwise reliability after learning. In The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019, 16–18 April 2019, Naha, Okinawa, Japan(Proceedings of Machine Learning Research, Vol. 89), Kamalika Chaudhuri and Masashi Sugiyama (Eds.). PMLR, 1022–1031.
[39]
Michael R. Smith, Tony Martinez, and Christophe Giraud-Carrier. 2014. An instance level analysis of data complexity. Machine Learning 95, 2 (2014), 225–256.
[40]
Kate A. Smith-Miles. 2009. Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Computing Surveys (CSUR) 41, 1 (2009), 1–25.
[41]
Mariana A. Souza, George D. C. Cavalcanti, Rafael M. O. Cruz, and Robert Sabourin. 2019. Online local pool generation for dynamic classifier selection. Pattern Recognition 85 (2019), 132–148.
[42]
Mariana A. Souza, Robert Sabourin, George D. C. Cavalcanti, and Rafael M. O. Cruz. 2023. OLP++: An online local classifier for high dimensional data. Information Fusion 90 (2023), 120–137.
[43]
Victor L. F. Souza, Adriano L. I. Oliveira, Rafael M. O. Cruz, and Robert Sabourin. 2020. A white-box analysis on the writer-independent dichotomy transformation applied to offline handwritten signature verification. Expert Systems with Applications 154 (2020), 113397.
[44]
Ingo Steinwart and Andreas Christmann. 2008. Support Vector Machines. Springer Science & Business Media.
[45]
Joaquin Vanschoren. 2019. Meta-learning. In Automated Machine Learning. Springer, Cham, 35–61.
[46]
Halbert White. 1992. Artificial Neural Networks: Approximation and Learning Theory. Blackwell Cambridge, Mass.
[47]
Jie Xie, Mingying Zhu, Kai Hu, and Jinglan Zhang. 2023. Instance hardness and multivariate Gaussian distribution-based oversampling technique for imbalance classification. Pattern Analysis and Applications (2023), 1–15.
[48]
Shen Yan, Hsien-Te Kao, and Emilio Ferrara. 2020. Fair class balancing: Enhancing model fairness without observing sensitive attributes. Proceedings of the 29th ACM International Conference on Information & Knowledge Management (2020).
[49]
Harry Zhang. 2004. The optimality of naive Bayes. AAAI 1, 2 (2004), 3.

Cited By

View all
  • (2024)Assessor Models for Explaining Instance Hardness in Classification Problems2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651521(1-8)Online publication date: 30-Jun-2024
  • (2024)Measuring Latent Traits of Instance Hardness and Classifier Ability using Boltzmann Machines2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651497(1-8)Online publication date: 30-Jun-2024

Index Terms

  1. Trusting My Predictions: On the Value of Instance-Level Analysis

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Computing Surveys
    ACM Computing Surveys  Volume 56, Issue 7
    July 2024
    1006 pages
    EISSN:1557-7341
    DOI:10.1145/3613612
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 April 2024
    Online AM: 09 August 2023
    Accepted: 03 August 2023
    Revised: 20 May 2023
    Received: 15 October 2022
    Published in CSUR Volume 56, Issue 7

    Check for updates

    Author Tags

    1. Instance hardness
    2. metalearning

    Qualifiers

    • Research-article

    Funding Sources

    • Brazilian Research Agencies FAPESP
    • CNPq

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)423
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 09 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Assessor Models for Explaining Instance Hardness in Classification Problems2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651521(1-8)Online publication date: 30-Jun-2024
    • (2024)Measuring Latent Traits of Instance Hardness and Classifier Ability using Boltzmann Machines2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651497(1-8)Online publication date: 30-Jun-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media