A Framework for Efficient Model Evaluation Through Stratification, Sampling, and Estimation

Riccardo Fogliato¹³,
Pratik Patil¹⁴,
Mathew Monfort¹³ &
…
Pietro Perona^13,15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15146))

Included in the following conference series:

European Conference on Computer Vision

69 Accesses

Abstract

Model performance evaluation is a critical and expensive task in machine learning and computer vision. Without clear guidelines, practitioners often estimate model accuracy using a one-time completely random selection of the data. However, by employing tailored sampling and estimation strategies, one can obtain more precise estimates and reduce annotation costs. In this paper, we propose a statistical framework for model evaluation that includes stratification, sampling, and estimation components. We examine the statistical properties of each component and evaluate their efficiency (precision). One key result of our work is that stratification via \(k\)-means clustering based on accurate predictions of model performance yields efficient estimators. Our experiments on computer vision datasets show that this method consistently provides more precise accuracy estimates than the traditional simple random sampling, even with substantial efficiency gains of 10x. We also find that model-assisted estimators, which leverage predictions of model accuracy on the unlabeled portion of the dataset, are generally more efficient than the traditional estimates based solely on the labeled data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 49.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In preliminary experiments we assessed other model-assisted estimators in the class of “generalized” regression estimators [8, 83, 100] and found results comparable to \(\texttt{DF}\).

References

Angelopoulos, A.N., Bates, S., Fannjiang, C., Jordan, M.I., Zrnic, T.: Prediction-powered inference. Science 382(6671), 669–674 (2023)
Article MathSciNet Google Scholar
Angelopoulos, A.N., Duchi, J.C., Zrnic, T.: Ppi++: Efficient prediction-powered inference. arXiv preprint arXiv:2311.01453 (2023)
Ash, J.T., Zhang, C., Krishnamurthy, A., Langford, J., Agarwal, A.: Deep batch active learning by diverse, uncertain gradient lower bounds. arXiv preprint arXiv:1906.03671 (2019)
Baek, C., Jiang, Y., Raghunathan, A., Kolter, J.Z.: Agreement-on-the-line: predicting the performance of neural networks under distribution shift. Adv. Neural. Inf. Process. Syst. 35, 19274–19289 (2022)
Google Scholar
Barbu, A., et al.: Objectnet: a large-scale bias-controlled dataset for pushing the limits of object recognition models. Adv. Neural Inform. Process. Syst. 32 (2019)
Google Scholar
Beery, S., Cole, E., Gjoka, A.: The iwildcam 2020 competition dataset. arXiv preprint arXiv:2004.10340 (2020)
Breidt, F.J., Claeskens, G., Opsomer, J.: Model-assisted estimation for complex surveys using penalised splines. Biometrika 92(4), 831–846 (2005)
Article MathSciNet Google Scholar
Breidt, F.J., Opsomer, J.D.: Model-assisted survey estimation with modern prediction techniques. Stat. Sci. 32(2), 190–205 (2017). https://doi.org/10.1214/16-STS589
Article MathSciNet Google Scholar
Brus, D.J.: Spatial sampling with R. CRC Press (2022)
Google Scholar
Chen, M., Goel, K., Sohoni, N.S., Poms, F., Fatahalian, K., Ré, C.: Mandoline: Model evaluation under distribution shift. In: International Conference on Machine Learning, pp. 1617–1629. PMLR (2021)
Google Scholar
Chen, T., Lumley, T.: Optimal multiwave sampling for regression modeling in two-phase designs. Stat. Med. 39(30), 4912–4921 (2020)
Article MathSciNet Google Scholar
Chen, T., Lumley, T.: Optimal sampling for design-based estimators of regression models. Stat. Med. 41(8), 1482–1497 (2022)
Article MathSciNet Google Scholar
Chen, Y., Zhang, S., Song, R.: Scoring your prediction on unseen data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 3279–3288 (June 2023)
Google Scholar
Cheng, G., Han, J., Lu, X.: Remote sensing image scene classification: benchmark and state of the art. Proc. IEEE 105(10), 1865–1883 (2017)
Article Google Scholar
Chouldechova, A., Deng, S., Wang, Y., Xia, W., Perona, P.: Unsupervised and semi-supervised bias benchmarking in face recognition. In: European Conference on Computer Vision, pp. 289–306. Springer (2022). https://doi.org/10.1007/978-3-031-19778-9_17
Chu, W., Zinkevich, M., Li, L., Thomas, A., Tseng, B.: Unbiased online active learning in data streams. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery And Data Mining, pp. 195–203 (2011)
Google Scholar
Chuang, C.Y., Torralba, A., Jegelka, S.: Estimating generalization under distribution shifts via domain-invariant representations. arXiv preprint arXiv:2007.03511 (2020)
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., , Vedaldi, A.: Describing textures in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Clark, R.G., Steel, D.G.: Sample design for analysis using high-influence probability sampling. J. R. Stat. Soc. Ser. A Stat. Soc. 185(4), 1733–1756 (2022)
Article MathSciNet Google Scholar
Coates, A., Ng, A., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 215–223. JMLR Workshop and Conference Proceedings (2011)
Google Scholar
Cochran, W.G.: Sampling Techniques. John Wiley & Sons (1977)
Google Scholar
Cohn, D.A., Ghahramani, Z., Jordan, M.I.: Active learning with statistical models. J. Artif. Intell. Res. 4, 129–145 (1996)
Article Google Scholar
Deng, L.: The mnist database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29(6), 141–142 (2012)
Article Google Scholar
Deng, W., Gould, S., Zheng, L.: What does rotation prediction tell us about classifier accuracy under varying testing environments? In: International Conference on Machine Learning, pp. 2579–2589. PMLR (2021)
Google Scholar
Deng, W., Zheng, L.: Are labels always necessary for classifier accuracy evaluation? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15069–15078 (2021)
Google Scholar
Emma, D., Jared, J., Cukierski, W.: Diabetic retinopathy detection (2015). https://kaggle.com/competitions/diabetic-retinopathy-detection
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC 2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
Farquhar, S., Gal, Y., Rainforth, T.: On statistical bias in active learning: How and when to fix it. arXiv preprint arXiv:2101.11665 (2021)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In: 2004 Conference on Computer Vision and Pattern Recognition Workshop, pp. 178–178. IEEE (2004)
Google Scholar
Fuller, W.A.: Sampling Statistics. John Wiley & Sons (2011)
Google Scholar
Gal, Y., Islam, R., Ghahramani, Z.: Deep bayesian active learning with image data. In: International Conference on Machine Learning, pp. 1183–1192. PMLR (2017)
Google Scholar
Ganti, R., Gray, A.: Upal: Unbiased pool based active learning. In: Artificial Intelligence and Statistics, pp. 422–431. PMLR (2012)
Google Scholar
Garg, S., Balakrishnan, S., Lipton, Z.C., Neyshabur, B., Sedghi, H.: Leveraging unlabeled data to predict out-of-distribution performance. arXiv preprint arXiv:2201.04234 (2022)
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Inter. J. Robot. Res. (IJRR) (2013)
Google Scholar
Graubardand, B.I., Korn, E.L.: Inference for superpopulation parameters using sample surveys. Stat. Sci. 17(1), 73–96 (2002)
Article MathSciNet Google Scholar
Guillory, D., Shankar, V., Ebrahimi, S., Darrell, T., Schmidt, L.: Predicting with confidence on unseen distributions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1134–1144 (2021)
Google Scholar
Hájek, J.: Optimal strategy and other problems in probability sampling. Časopis pro pěstování matematiky 84(4), 387–423 (1959)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hébert-Johnson, U., Kim, M., Reingold, O., Rothblum, G.: Multicalibration: calibration for the (computationally-identifiable) masses. In: International Conference on Machine Learning, pp. 1939–1948. PMLR (2018)
Google Scholar
Helber, P., Bischke, B., Dengel, A., Borth, D.: Eurosat: a novel dataset and deep learning benchmark for land use and land cover classification. IEEE J. Selected Topics Appli. Earth Observations Remote Sensing (2019)
Google Scholar
Hendrycks, D., et al.: The many faces of robustness: a critical analysis of out-of-distribution generalization. In: ICCV (2021)
Google Scholar
Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., Song, D.: Natural adversarial examples. In: CVPR (2021)
Google Scholar
Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47(260), 663–685 (1952)
Article MathSciNet Google Scholar
Ilharco, G., et al.: Openclip (2021). https://doi.org/10.5281/zenodo.5143773
Imberg, H., Axelson-Fisk, M., Jonasson, J.: Optimal subsampling designs. arXiv preprint arXiv:2304.03019 (2023)
Imberg, H., Jonasson, J., Axelson-Fisk, M.: Optimal sampling in unbiased active learning. In: International Conference on Artificial Intelligence and Statistics, pp. 559–569. PMLR (2020)
Google Scholar
Imberg, H., Yang, X., Flannagan, C., Bärgman, J.: Active sampling: A machine-learning-assisted framework for finite population inference with optimal subsamples. arXiv preprint arXiv:2212.10024 (2022)
Isaki, C.T., Fuller, W.A.: Survey design under the regression superpopulation model. J. Am. Stat. Assoc. 77(377), 89–96 (1982)
Article MathSciNet Google Scholar
Jiang, Y., Nagarajan, V., Baek, C., Kolter, J.Z.: Assessing generalization of sgd via disagreement. arXiv preprint arXiv:2106.13799 (2021)
Johnson, J., Hariharan, B., Van Der Maaten, L., Fei-Fei, L., Lawrence Zitnick, C., Girshick, R.: Clevr: a diagnostic dataset for compositional language and elementary visual reasoning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2901–2910 (2017)
Google Scholar
Kim, M.P., Kern, C., Goldwasser, S., Kreuter, F., Reingold, O.: Universal adaptability: target-independent inference that competes with propensity scoring. Proc. Nat. Acad. Sci. 119(4), e2108097119 (2022)
Article Google Scholar
Kirsch, A., Van Amersfoort, J., Gal, Y.: Batchbald: efficient and diverse batch acquisition for deep bayesian active learning. Adv. Neural Inform. Process. Syst. 32 (2019)
Google Scholar
Koh, P.W., et al.: Wilds: a benchmark of in-the-wild distribution shifts. In: International Conference on Machine Learning, pp. 5637–5664. PMLR (2021)
Google Scholar
Kossen, J., Farquhar, S., Gal, Y., Rainforth, T.: Active surrogate estimators: an active learning approach to label-efficient model evaluation. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 24557–24570. Curran Associates, Inc. (2022)
Google Scholar
Kossen, J., Farquhar, S., Gal, Y., Rainforth, T.: Active testing: sample-efficient model evaluation. In: International Conference on Machine Learning, pp. 5753–5763. PMLR (2021)
Google Scholar
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia (2013)
Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Kull, M., Flach, P.: Novel decompositions of proper scoring rules for classification: score adjustment as precursor to calibration. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9284, pp. 68–85. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23528-8_5
Chapter Google Scholar
LAION AI: Clip benchmark. https://github.com/LAION-AI/CLIP_benchmark
LeCun, Y., Huang, F.J., Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2004. vol. 2, pp. II–104. IEEE (2004)
Google Scholar
Lewis, D.D.: A sequential algorithm for training text classifiers: corrigendum and additional data. In: ACM SIGIR Forum, vol. 29, pp. 13–19. ACM New York (1995)
Google Scholar
Lewis, D.D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: Machine Learning Proceedings 1994, pp. 148–156. Elsevier (1994)
Google Scholar
Li, Z., Ma, X., Xu, C., Cao, C., Xu, J., Lü, J.: Boosting operational dnn testing efficiency through conditioning 10(1145/3338906), 3338930 (2019)
Google Scholar
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference On Computer Vision and Pattern Recognition, pp. 11976–11986 (2022)
Google Scholar
Lohr, S.L.: Sampling: design and analysis. CRC press (2021)
Google Scholar
Lumley, T., Shaw, P.A., Dai, J.Y.: Connections between survey calibration estimators and semiparametric models for incomplete data. Int. Stat. Rev. 79(2), 200–220 (2011)
Article Google Scholar
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)
Matthey, L., Higgins, I., Hassabis, D., Lerchner, A.: dsprites: Disentanglement testing sprites dataset (2017). https://github.com/deepmind/dsprites-dataset/
McConville, K.S., Breidt, F.J., Lee, T.C., Moisen, G.G.: Model-assisted survey regression estimation with the lasso. J. Surv. Statist. Methodol. 5(2), 131–158 (2017)
Article Google Scholar
Miller, B.A., Vila, J., Kirn, M., Zipkin, J.R.: Classifier performance estimation with unbalanced, partially labeled data. In: Torgo, L., Matwin, S., Weiss, G., Moniz, N., Branco, P. (eds.) Proceedings of The International Workshop on Cost-Sensitive Learning. Proceedings of Machine Learning Research, 05 May, vol. 88, pp. 4–16. PMLR (2018)
Google Scholar
Miller, J.P., et al.: Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization. In: International Conference on Machine Learning, pp. 7721–7735. PMLR (2021)
Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)
Google Scholar
Neyman, J.: On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. In: Breakthroughs in Statistics: Methodology and Distribution, pp. 123–150. Springer (1992). https://doi.org/10.1007/978-1-4612-4380-9_12
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pp. 722–729. IEEE (2008)
Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.: Cats and dogs. In: 2012 IEEE conference on Computer Vision and Pattern Recognition, pp. 3498–3505. IEEE (2012)
Google Scholar
Poms, F., et al.: Low-shot validation: active importance sampling for estimating classifier performance on rare categories. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10705–10714 (October 2021)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Recht, B., Roelofs, R., Schmidt, L., Shankar, V.: Do imagenet classifiers generalize to imagenet? In: International Conference on Machine Learning, pp. 5389–5400. PMLR (2019)
Google Scholar
Ren, P., et al.: A survey of deep active learning. ACM Comput. Surv. (CSUR) 54(9), 1–40 (2021)
Article Google Scholar
Roth, A.: Uncertain: Modern topics in uncertainty estimation (2022)
Google Scholar
Russakovsky, O., et al.: ImageNet Large Scale Visual Recognition Challenge. Inter. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Särndal, C.E.: The calibration approach in survey theory and practice. Surv. Pract. 33(2), 99–119 (2007)
Google Scholar
Särndal, C.E., Swensson, B., Wretman, J.: Model assisted survey sampling. Springer Science & Business Media (2003)
Google Scholar
Sawade, C., Landwehr, N., Bickel, S., Scheffer, T.: Active risk estimation. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML 2010. pp. 951-958. Omnipress, Madison, WI, USA (2010)
Google Scholar
Sawade, C., Landwehr, N., Scheffer, T.: Active estimation of f-measures. In: Lafferty, J., Williams, C., Shawe-Taylor, J., Zemel, R., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23. Curran Associates, Inc. (2010)
Google Scholar
Scheffer, T., Decomain, C., Wrobel, S.: Active hidden markov models for information extraction. In: Hoffmann, F., Hand, D.J., Adams, N., Fisher, D., Guimaraes, G. (eds.) IDA 2001. LNCS, vol. 2189, pp. 309–318. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44816-0_31
Chapter Google Scholar
Schuhmann, C., et al.: LAION-5b: an open large-scale dataset for training next generation image-text models. In: Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2022). https://openreview.net/forum?id=M3Y74vmsMcY
Sener, O., Savarese, S.: Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489 (2017)
Settles, B.: Active learning literature survey (2009)
Google Scholar
Siddhant, A., Lipton, Z.C.: Deep bayesian active learning for natural language processing: Results of a large-scale empirical study. arXiv preprint arXiv:1808.05697 (2018)
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Google Scholar
Stallkamp, J., Schlipsing, M., Salmen, J., Igel, C.: The german traffic sign recognition benchmark: a multi-class classification competition. In: The 2011 International Joint Conference on Neural Networks, pp. 1453–1460. IEEE (2011)
Google Scholar
Taylor, J., Earnshaw, B., Mabey, B., Victors, M., Yosinski, J.: Rxrx1: an image set for cellular morphological variation across many experimental batches. In: International Conference on Learning Representations (ICLR) (2019)
Google Scholar
Tillé, Y.: Sampling and estimation from finite populations. John Wiley & Sons (2020)
Google Scholar
Veeling, B.S., Linmans, J., Winkens, J., Cohen, T., Welling, M.: Rotation equivariant CNNs for digital pathology. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11071, pp. 210–218. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00934-2_24
Chapter Google Scholar
Wald, Y., Feder, A., Greenfeld, D., Shalit, U.: On calibration and out-of-domain generalization. Adv. Neural. Inf. Process. Syst. 34, 2215–2227 (2021)
Google Scholar
Wang, H., Ge, S., Lipton, Z., Xing, E.P.: Learning robust global representations by penalizing local predictive power. Adv. Neural Inform. Process. Syst., 10506–10518 (2019)
Google Scholar
Welinder, P., Welling, M., Perona, P.: A lazy man’s approach to benchmarking: Semisupervised classifier evaluation and recalibration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2013)
Google Scholar
Wenzel, F., et al.: Assaying out-of-distribution generalization in transfer learning. Adv. Neural. Inf. Process. Syst. 35, 7181–7198 (2022)
Google Scholar
Wu, C., Sitter, R.R.: A model-calibration approach to using complete auxiliary information from survey data. J. Am. Stat. Assoc. 96(453), 185–193 (2001)
Article MathSciNet Google Scholar
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3485–3492 (June 2010). https://doi.org/10.1109/CVPR.2010.5539970
Yu, Y., Bates, S., Ma, Y., Jordan, M.: Robust calibration with multi-domain temperature scaling. Adv. Neural. Inf. Process. Syst. 35, 27510–27523 (2022)
Google Scholar
Yu, Y., Yang, Z., Wei, A., Ma, Y., Steinhardt, J.: Predicting out-of-distribution error with the projection norm. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S. (eds.) Proceedings of the 39th International Conference on Machine Learning, 17–23 Jul. Proceedings of Machine Learning Research, vol. 162, pp. 25721–25746. PMLR (2022)
Google Scholar
Zhai, Xet al.: The visual task adaptation benchmark (2020). https://openreview.net/forum?id=BJena3VtwS
Zrnic, T., Candès, E.J.: Active statistical inference. arXiv preprint arXiv:2403.03208 (2024)
Zrnic, T., Candès, E.J.: Cross-prediction-powered inference. Proc. Nat. Acad. Sci. 121(15), e2322083121 (2024)
Article MathSciNet Google Scholar

Download references

Acknowledgments

We thank the anonymous reviewers for their encouraging comments and valuable suggestions that have improved our manuscript. We also thank Tijana Zrnic for highlighting the connections between prediction-powered inference and our work, as well as Georgy Noarov for pointing out the link between our results and decompositions for proper scoring rules.

Author information

Authors and Affiliations

Amazon Web Services, Seattle, USA
Riccardo Fogliato, Mathew Monfort & Pietro Perona
University of California Berkeley, Berkeley, USA
Pratik Patil
California Institute of Technology, Pasadena, USA
Pietro Perona

Authors

Riccardo Fogliato
View author publications
You can also search for this author in PubMed Google Scholar
Pratik Patil
View author publications
You can also search for this author in PubMed Google Scholar
Mathew Monfort
View author publications
You can also search for this author in PubMed Google Scholar
Pietro Perona
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Riccardo Fogliato .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2053 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fogliato, R., Patil, P., Monfort, M., Perona, P. (2025). A Framework for Efficient Model Evaluation Through Stratification, Sampling, and Estimation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15146. Springer, Cham. https://doi.org/10.1007/978-3-031-73223-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-73223-2_9
Published: 08 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73222-5
Online ISBN: 978-3-031-73223-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics