More Web Proxy on the site http://driver.im/

tutorial

Learning to Quantify: Estimating Class Prevalence via Supervised Learning

Authors:

Alejandro Moreo,

Fabrizio SebastianiAuthors Info & Claims

SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1415 - 1416

https://doi.org/10.1145/3331184.3331389

Published: 18 July 2019 Publication History

Abstract

Quantification (also known as "supervised prevalence estimation" [2], or "class prior estimation" [7]) is the task of estimating, given a set σ of unlabelled items and a set of classes C = c1, . . . , c |C|, the relative frequency (or "prevalence") p(ci ) of each class ci C, i.e., the fraction of items in σ that belong to ci . When each item belongs to exactly one class, since 0 ≤ p(ci ) ≤ 1 and Í ci C p(ci ) = 1, p is a distribution of the items in σ across the classes in C (the true distribution), and quantification thus amounts to estimating p (i.e., to computing a predicted distribution p?).

Quantification is important in many disciplines (such as e.g., market research, political science, the social sciences, and epidemiology) which usually deal with aggregate (as opposed to individual) data. In these contexts, classifying individual unlabelled instances is usually not a primary goal, while estimating the prevalence of the classes of interest in the data is. For instance, when classifying the tweets about a certain entity (e.g., a political candidate) as displaying either a Positive or a Negative stance towards the entity, we are usually not much interested in the class of a specific tweet: instead, we usually want to know the fraction of these tweets that belong to the class [14].

References

[1]

Barranquero, J., Díez, J. and del Coz, J. J. {2015}, 'Quantification-oriented learning based on reliable classifiers',Pattern Recognition 48(2), 591--604.

Digital Library

[2]

Barranquero, J., González, P., Díez, J. and del Coz, J. J. {2013}, 'On the study of nearest neighbor algorithms for prevalence estimation in binary problems', Pattern Recognition 46(2), 472--482.

Digital Library

[3]

Beijbom, O., Hoffman, J., Yao, E., Darrell, T., Rodriguez-Ramirez, A., Gonzalez-Rivero, M. and Hoegh-Guldberg, O. {2015}, Quantification in-the-wild: Data-sets and baselines. CoRR abs/1510.04811 (2015). Presented at the NIPS 2015 Workshop on Transfer and Multi-Task Learning, Montreal, CA.

[4]

Bella, A., Ferri, C., Hernández-Orallo, J. and Ramírez-Quintana, M. J. {2010}, Quantification via probability estimators, in 'Proceedings of the 11th IEEE Inter-national Conference on Data Mining (ICDM 2010)', Sydney, AU, pp. 737--742.

Digital Library

[5]

Ceron, A., Curini, L. and Iacus, S. M. {2016}, 'iSA: A fast, scalable and accurate algorithm for sentiment analysis of social media content', Information Sciences 367/368, 105-124.

Digital Library

[6]

Da San Martino, G., Gao, W. and Sebastiani, F. {2016}, Ordinal text quantification, in 'Proceedings of the 39th ACM Conference on Research and Development in Information Retrieval (SIGIR 2016)', Pisa, IT, pp. 937--940.

Digital Library

[7]

du Plessis, M. C., Niu, G. and Sugiyama, M. {2017}, 'Class-prior estimation for learning from positive and unlabeled data', Machine Learning 106(4), 463--492.

Digital Library

[8]

Esuli, A. {2016}, ISTI-CNR at SemEval-2016 Task 4: Quantification on an ordinal scale, in 'Proceedings of the 10th International Workshop on Semantic Evaluation(SemEval 2016)', San Diego, US.

[9]

Esuli, A., Moreo, A. and Sebastiani, F. {2019}, 'Cross-lingual sentiment quantification', arXiv:1904.07965.

[10]

Esuli, A., Moreo, A., Sebastiani, F. and Trevisan, D. {2019}, Evaluation protocols for quantification. Submitted for publication.

[11]

Esuli, A. and Sebastiani, F. {2010}, 'Sentiment quantification', IEEE Intelligent Systems 25(4), 72--75.

Digital Library

[12]

Esuli, A. and Sebastiani, F. {2015}, 'Optimizing text quantifiers for multivariate loss functions', ACM Transactions on Knowledge Discovery and Data 9(4), Article 27.

Digital Library

[13]

Forman, G. {2008}, 'Quantifying counts and costs via classification', Data Mining and Knowledge Discovery 17(2), 164--206.

Digital Library

[14]

Gao, W. and Sebastiani, F. {2016}, 'From classification to quantification in tweet sentiment analysis', Social Network Analysis and Mining6(19), 1--22.

[15]

González, P., Castaño, A., Chawla, N. V. and del Coz, J. J. {2017}, 'A review on quantification learning', ACM Computing Surveys 50(5), 74:1--74:40.

Digital Library

[16]

González-Castro, V., Alaiz-Rodríguez, R. and Alegre, E. {2013}, 'Class distribution estimation based on the Hellinger distance', Information Sciences 218, 146--164.

Digital Library

[17]

Hopkins, D. J. and King, G. {2010}, 'A method of automated nonparametric content analysis for social science',American Journal of Political Science54(1), 229--247.

[18]

Kar, P., Li, S., Narasimhan, H., Chawla, S. and Sebastiani, F. {2016}, Online optimization methods for the quantification problem, in 'Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016)', San Francisco, US, pp. 1625--1634.

Digital Library

[19]

King, G. and Lu, Y. {2008}, 'Verbal autopsy methods with multiple causes of death', Statistical Science 23(1), 78--91.

[20]

Levin, R. and Roitman, H. {2017}, Enhanced probabilistic classify and count methods for multi-label text quantification, in 'Proceedings of the 7th ACM International Conference on the Theory of Information Retrieval (ICTIR 2017)', Amsterdam, NL, pp. 229--232.

Digital Library

[21]

Maletzke, A. G., Moreira dos Reis, D. and Batista, G. E. {2018}, 'Combining in-stance selection and self-training to improve data stream quantification', Journal of the Brazilian Computer Society24(12), 43--48.

[22]

Milli, L., Monreale, A., Rossetti, G., Giannotti, F., Pedreschi, D. and Sebastiani, F. {2013}, Quantification trees, in 'Proceedings of the 13th IEEE International Conference on Data Mining (ICDM 2013)', Dallas, US, pp. 528--536.

[23]

Milli, L., Monreale, A., Rossetti, G., Pedreschi, D., Giannotti, F. and Sebastiani, F. {2015}, Quantification in social networks, in 'Proceedings of the 2nd IEEE International Conference on Data Science and Advanced Analytics (DSAA 2015)',Paris, FR.

[24]

Moreno-Torres, J. G., Raeder, T., Alaiz-Rodríguez, R., Chawla, N. V. and Herrera, F. {2012}, 'A unifying view on dataset shift in classification', Pattern Recognition 45(1), 521--530.

Digital Library

[25]

Nakov, P., Farra, N. and Rosenthal, S. {2017}, SemEval-2017 Task 4: Sentiment analysis in Twitter, in 'Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval 2017)', Vancouver, CA.

[26]

Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F. and Stoyanov, V. {2016}, SemEval-2016 Task 4: Sentiment analysis in Twitter,in?Proceedings of the 10th Inter-national Workshop on Semantic Evaluation (SemEval 2016)', San Diego, US, pp. 1--18.

[27]

Pérez-Gállego, P., Quevedo, J. R. and del Coz, J. J. {2017}, 'Using ensembles for problems with characterizable changes in data distribution: A case study on quantification',Information Fusion 34, 87--100.

Digital Library

[28]

Saerens, M., Latinne, P. and Decaestecker, C. {2002}, 'Adjusting the outputs of a classifier to new a priori probabilities: A simple procedure', Neural Computation 14(1), 21--41.

Digital Library

[29]

Sanya, A., Kumar, P., Kar, P., Chawla, S. and Sebastiani, F. {2018}, 'Optimizing non-decomposable measures with deep networks',Machine Learning 107(8--10), 1597--1620.

Digital Library

[30]

Sebastiani, F. {2018}, 'Evaluation measures for quantification: An axiomatic approach', arXiv:1809.01991.

[31]

Storkey, A. {2009}, When training and test sets are different: Characterizing learn-ing transfer, in J. Quiñonero-Candela, M. Sugiyama, A. Schwaighofer and N. D.Lawrence, eds, 'Dataset shift in machine learning', The MIT Press, Cambridge,US, pp. 3--28.

[32]

Tang, L., Gao, H. and Liu, H. {2010}, Network quantification despite biased labels, in 'Proceedings of the 8th Workshop on Mining and Learning with Graphs (MLG2010)', Washington, US, pp. 147--154.

Digital Library

[33]

Vapnik, V. {1998},Statistical Learning Theory, Wiley, New York, US.

Cited By

Sneyd AStevenson MDiaz FShah CSuel TCastells PJones RSakai T(2021)Stopping Criteria for Technology Assisted Reviews based on Counting ProcessesProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463013(2293-2297)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3404835.3463013

Index Terms

Learning to Quantify: Estimating Class Prevalence via Supervised Learning
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

A Review on Quantification Learning

The task of quantification consists in providing an aggregate estimation (e.g., the class distribution in a classification problem) for unseen test sets, applying a model that is trained using a training set with a different data distribution. Several ...
Learning to Quantify: Methods and Applications (LQ 2021)
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Learning to Quantify (LQ) is the task of training class prevalence estimators via supervised learning. The task of these estimators is to estimate, given an unlabelled set of data items D and a set of classes C ={c1,...., c|C|}, the prevalence (i.e., ...
QuaPy: A Python-Based Framework for Quantification
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

QuaPy is an open-source framework for performing quantification (a.k.a. supervised prevalence estimation), written in Python. Quantification is the task of training quantifiers via supervised learning, where a quantifier is a predictor that estimates ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2019

1512 pages

ISBN:9781450361729

DOI:10.1145/3331184

General Chairs:
Benjamin Piwowarski
CNRS - Sorbonne Universite, France
,
Max Chevalier
Universite de Toulouse, CNRS, France
,
Eric Gaussier
Universite Grenoble Alpes, CNRS, France
,
Program Chairs:
Yoelle Maarek
Amazon Research, Israel
,
Jian-Yun Nie
University of Montreal, Canada
,
Falk Scholer
RMIT University, Australia

Copyright © 2019 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2019

Check for updates

Author Tags

Qualifiers

Tutorial

Conference

SIGIR '19

Sponsor:

SIGIR

SIGIR '19: The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 21 - 25, 2019

Paris, France

Acceptance Rates

SIGIR'19 Paper Acceptance Rate 84 of 426 submissions, 20%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
156
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sneyd AStevenson MDiaz FShah CSuel TCastells PJones RSakai T(2021)Stopping Criteria for Technology Assisted Reviews based on Counting ProcessesProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463013(2293-2297)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3404835.3463013

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten