More Web Proxy on the site http://driver.im/

article

Free access

An easy-to-hard learning paradigm for multiple classes and multiple labels

Authors:

Klaus-Robert MüllerAuthors Info & Claims

The Journal of Machine Learning Research, Volume 18, Issue 1

Pages 3300 - 3337

Published: 01 January 2017 Publication History

PDF eReader Publisher Site

Abstract

Many applications, such as human action recognition and object detection, can be formulated as a multiclass classification problem. One-vs-rest (OVR) is one of the most widely used approaches for multiclass classification due to its simplicity and excellent performance. However, many confusing classes in such applications will degrade its results. For example, hand clap and boxing are two confusing actions. Hand clap is easily misclassified as boxing, and vice versa. Therefore, precisely classifying confusing classes remains a challenging task. To obtain better performance for multiclass classifications that have confusing classes, we first develop a classifier chain model for multiclass classification (CCMC) to transfer class information between classifiers. Then, based on an analysis of our proposed model, we propose an easy-to-hard learning paradigm for multiclass classification to automatically identify easy and hard classes and then use the predictions from simpler classes to help solve harder classes. Similar to CCMC, the classifier chain (CC) model is also proposed by Read et al. (2009) to capture the label dependency for multi-label classification. However, CC does not consider the order of di_culty of the labels and achieves degenerated performance when there are many confusing labels. Therefore, it is non-trivial to learn the appropriate label order for CC. Motivated by our analysis for CCMC, we also propose the easy-to-hard learning paradigm for multi-label classification to automatically identify easy and hard labels, and then use the predictions from simpler labels to help solve harder labels. We also demonstrate that our proposed strategy can be successfully applied to a wide range of applications, such as ordinal classification and relationship prediction. Extensive empirical studies validate our analysis and the efiectiveness of our proposed easy-to-hard learning strategies.

References

[1]

Jimmy Ba and Rich Caruana. Do deep nets really need to be deep? In NIPS, pages 2654-2662, 2014.

Digital Library

[2]

Peter Bartlett and John Shawe-Taylor. Generalization performance of support vector machines and other pattern classifiers. In Advances in Kernel Methods - Support Vector Learning, pages 43-54. MIT Press, Cambridge, MA, USA, 1998.

Digital Library

[3]

Zafer Barutcuoglu, Robert E. Schapire, and Olga G. Troyanskaya. Hierarchical multilabel prediction of gene function. Bioinformatics, 22(7):830-836, 2006.

Digital Library

[4]

Samy Bengio, Jason Weston, and David Grangier. Label embedding trees for large multiclass tasks. In Advances in Neural Information Processing Systems 23, pages 163-171, 2010.

Digital Library

[5]

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In ICML, pages 41-48, 2009.

Digital Library

[6]

Alina Beygelzimer, John Langford, Yury Lifshits, Gregory B. Sorkin, and Alexander L. Strehl. Conditional probability tree estimation analysis and algorithms. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Arti_cial Intelligence, pages 51-58, 2009a.

Digital Library

[7]

Alina Beygelzimer, John Langford, and Pradeep Ravikumar. Error-correcting tournaments. In Proceedings of the 20th Conference on Algorithmic Learning Theory, pages 247-262, 2009b.

Digital Library

[8]

Matthew R. Boutell, Jiebo Luo, Xipeng Shen, and C.M.Christopher M. Brown. Learning multi-label scene classi_cation. Pattern Recognition, 37(9):1757-1771, 2004.

[9]

Ignas Budvytis, Vijay Badrinarayanan, and Roberto Cipolla. Label propagation in complex video sequences using semi-supervised learning. In British Machine Vision Conference, pages 1-12. British Machine Vision Association, 2010.

[10]

Yao-Nan Chen and Hsuan-Tien Lin. Feature-aware label space dimension reduction for multi-label classi_cation. In Advances in Neural Information Processing Systems 25, pages 1538-1546, 2012.

Digital Library

[11]

Kai-Yang Chiang, Cho-Jui Hsieh, Nagarajan Natarajan, Inderjit S. Dhillon, and Ambuj Tewari. Prediction and clustering in signed networks: a local to global perspective. Journal of Machine Learning Research, 15(1):1177-1213, 2014.

Digital Library

[12]

Kai-Yang Chiang, Cho-Jui Hsieh, and Inderjit S. Dhillon. Matrix completion with noisy side information. In NIPS, pages 3447-3455, 2015.

Digital Library

[13]

Wei Chu and Zoubin Ghahramani. Gaussian processes for ordinal regression. Journal of Machine Learning Research, 6:1019-1041, 2005.

Digital Library

[14]

Moustapha Cissé, Maruan Al-Shedivat, and Samy Bengio. ADIOS: architectures deep in output space. In ICML, pages 2770-2779, 2016.

Digital Library

[15]

Krzysztof Dembczynski, Weiwei Cheng, and Eyke Hüllermeier. Bayes optimal multilabel classification via probabilistic classifier chains. In Johannes Fürnkranz and Thorsten Joachims, editors, Proceedings of the 27th International Conference on Machine Learning, pages 279-286, Haifa, Israel, 2010. Omnipress.

Digital Library

[16]

Can Demirkesen and Hocine Cherifi. An evaluation of divide-and-combine strategies for image categorization by multi-class support vector machines. In 23rd International Symposium on Computer and Information Sciences, pages 1-6, 2008.

[17]

Sébastien Destercke and Gen Yang. Cautious ordinal classification by binary decomposition. In ECML/PKDD, pages 323-337, 2014.

Digital Library

[18]

Thomas G. Dietterich and Ghulum Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2:263-286, 1995.

[19]

Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. LIB-LINEAR: A library for large linear classi_cation. Journal of Machine Learning Research, 9:1871-1874, 2008.

Digital Library

[20]

Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse recti_er neural networks. In AISTATS, pages 315-323, 2011.

[21]

Chen Gong, Dacheng Tao, Wei Liu, Liu Liu, and Jie Yang. Label propagation via teaching-to-learn and learning-to-teach. IEEE Trans. Neural Netw. Learning Syst., 28(6):1452- 1465, 2017.

[22]

Matthieu Guillaumin, Jakob J. Verbeek, and Cordelia Schmid. Multimodal semi-supervised learning for image classification. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 902-909. IEEE Computer Society, 2010.

[23]

Yuhong Guo and Suicheng Gu. Multi-label classi_cation using conditional dependency networks. In Toby Walsh, editor, Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, pages 1300-1305, Barcelona, Catalonia, Spain, 2011. AAAI Press.

Digital Library

[24]

Yuhong Guo and Dale Schuurmans. Adaptive large margin training for multilabel classification. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011.

Digital Library

[25]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, pages 770-778, 2016.

[26]

Xuming He, Richard S. Zemel, and Miguel Á. Carreira-Perpiñán. Multiscale conditional random fields for image labeling. In CVPR, pages 695-702, 2004.

Digital Library

[27]

Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. CoRR, abs/1503.02531, 2015.

[28]

Cho-Jui Hsieh, Kai-Yang Chiang, and Inderjit S. Dhillon. Low rank modeling of signed networks. In KDD, pages 507-515, 2012.

Digital Library

[29]

Chih-Wei Hsu and Chih-Jen Lin. A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks and Learning Systems, 13(2):415-425, 2002.

Digital Library

[30]

Daniel Hsu, Sham Kakade, John Langford, and Tong Zhang. Multi-label prediction via compressed sensing. In Advances in Neural Information Processing Systems, pages 772- 780, 2009.

Digital Library

[31]

Sheng-Jun Huang and Zhi-Hua Zhou. Multi-label learning by exploiting label correlations locally. In Jörg Hoffmann and Bart Selman, editors, Proceedings of the Twenty-Sixth AAAI Conference on Artifcial Intelligence, Toronto, Ontario, Canada, 2012. AAAI Press.

Digital Library

[32]

Lina Huo, Licheng Jiao, Shuang Wang, and Shuyuan Yang. Object-level saliency detection with color attributes. Pattern Recognition, 49:162-173, 2016.

Digital Library

[33]

Prateek Jain and Inderjit S. Dhillon. Provable inductive matrix completion. CoRR, abs/1306.0626, 2013.

[34]

Feng Kang, Rong Jin, and Rahul Sukthankar. Correlated label propagation with application to multi-label learning. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1719-1726, NY, USA, 2006. IEEE Computer Society.

Digital Library

[35]

Michael J. Kearns and Robert E. Schapire. E_cient distribution-free learning of probabilistic concepts. In Proceedings of the 31st Symposium on the Foundations of Computer Science, pages 382-391, Los Alamitos, CA, 1990. IEEE Computer Society Press.

Digital Library

[36]

Maksim Lapin, Matthias Hein, and Bernt Schiele. Top-k multiclass SVM. In Advances in Neural Information Processing Systems 28, pages 325-333, 2015.

Digital Library

[37]

Maksim Lapin, Matthias Hein, and Bernt Schiele. Loss functions for top-k error: Analysis and insights. In The IEEE Conference on Computer Vision and Pattern Recognition, 2016.

[38]

Jure Leskovec, Daniel P. Huttenlocher, and Jon M. Kleinberg. Predicting positive and negative links in online social networks. In WWW, pages 641-650, 2010.

Digital Library

[39]

Weiwei Liu and Ivor W. Tsang. Large margin metric learning for multi-label prediction. In AAAI, pages 2800-2806, 2015a.

Digital Library

[40]

Weiwei Liu and Ivor W. Tsang. On the optimality of classifier chain for multi-label classification. In NIPS, pages 712-720, 2015b.

Digital Library

[41]

Weiwei Liu and Ivor W. Tsang. Sparse perceptron decision tree for millions of dimensions. In AAAI, pages 1881-1887, 2016.

Digital Library

[42]

Weiwei Liu and Ivor W. Tsang. Making decision trees feasible in ultrahigh feature and label dimensions. Journal of Machine Learning Research, 18:1-36, 2017.

[43]

Weiwei Liu, Xiaobo Shen, and Ivor W. Tsang. Sparse embedded k-means clustering. In NIPS, 2017.

[44]

Qi Mao, Ivor Wai-Hung Tsang, and Shenghua Gao. Objective-guided image annotation. IEEE Transactions on Image Processing, 22(4):1585-1597, 2013.

Digital Library

[45]

Paolo Massa and Paolo Avesani. Trust-aware bootstrapping of recommender systems. In Proceedings of ECAI 2006 Workshop on Recommender Systems, pages 29-33, 2006.

[46]

Jonathan Milgram, Mohamed Cheriet, and Robert Sabourin. "one against one" or "one against all": Which one is better for handwriting recognition with SVMs? In Tenth International Workshop on Frontiers in Handwriting Recognition, 2006.

[47]

Gang Niu, Marthinus Christo_el du Plessis, Tomoya Sakai, Yao Ma, and Masashi Sugiyama. Theoretical comparisons of positive-unlabeled learning against positive-negative learning. In NIPS, pages 1199-1207, 2016.

Digital Library

[48]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake VanderPlas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011.

Digital Library

[49]

Trung T. Pham, Ian Reid, Yasir Latif, and Stephen Gould. Hierarchical higher-order regression forest _elds: An application to 3D indoor scene labelling. In ICCV, 2015.

Digital Library

[50]

Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. Classifier chains for multilabel classi_cation. In Wray L. Buntine, Marko Grobelnik, Dunja Mladenic, and John Shawe-Taylor, editors, Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II, pages 254-269, Berlin, Heidelberg, 2009. Springer-Verlag.

[51]

Ryan M. Rifkin and Aldebaro Klautau. In defense of one-vs-all classi_cation. Journal of Machine Learning Research, 5:101-141, 2004.

Digital Library

[52]

Robert E. Schapire and Yoram Singer. Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2-3):135-168, 2000.

Digital Library

[53]

Christian Schüldt, Ivan Laptev, and Barbara Caputo. Recognizing human actions: A local svm approach. In 17th International Conference on Pattern Recognition, pages 32-36. IEEE Computer Society, 2004.

Digital Library

[54]

Chun-Wei Seah, Ivor W. Tsang, and Yew-Soon Ong. Transductive ordinal regression. TNNLS, 23(7):1074-1086, 2012.

[55]

Amnon Shashua and Anat Levin. Ranking with large margin principle: Two approaches. In NIPS, pages 937-944, 2002.

Digital Library

[56]

John Shawe-Taylor, Peter L. Bartlett, Robert C. Williamson, and Martin Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5):1926-1940, 1998.

Digital Library

[57]

Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. Indoor segmentation and support inference from RGBD images. In 12th European Conference on Computer Vision, pages 746-760. Springer, 2012.

Digital Library

[58]

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.

[59]

Farbound Tai and Hsuan-Tien Lin. Multilabel classi_cation with principal label space transformation. Neural Computation, 24(9):2508-2542, 2012.

Digital Library

[60]

Ali Fallah Tehrani, Weiwei Cheng, and Eyke Hüllermeier. Preference learning using the choquet integral: The case of multipartite ranking. IEEE Trans. Fuzzy Systems, 20(6): 1102-1113, 2012.

Digital Library

[61]

Antonio Torralba, Kevin P. Murphy, and William T. Freeman. Contextual models for object detection using boosted random fields. In Advances in Neural Information Processing Systems, pages 1401-1408, 2004.

Digital Library

[62]

Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. Mining multi-label data. In Data Mining and Knowledge Discovery Handbook, pages 667-685. Springer US, 2010.

[63]

Jason Weston and Chris Watkins. Support vector machines for multi-class pattern recognition. In 7th European Symposium on Arti_cial Neural Networks, pages 219-224, 1999.

[64]

Jian-Bo Yang and Ivor W. Tsang. Hierarchical maximum margin learning for multi-class classi_cation. In UAI, pages 753-760, 2011.

Digital Library

[65]

Min-Ling Zhang and Kun Zhang. Multi-label learning by exploiting label dependency. In Bharat Rao, Balaji Krishnapuram, Andrew Tomkins, and Qiang Yang, editors, Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 999-1008, Washington, DC, USA, 2010. ACM.

Digital Library

[66]

Min-Ling Zhang and Zhi-Hua Zhou. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8):1819-1837, 2014.

[67]

Yi Zhang and Jeff Schneider. Maximum margin output coding. In John Langford and Joelle Pineau, editors, Proceedings of the 29th International Conference on Machine Learning, pages 1575-1582, New York, NY, USA, 2012. Omnipress.

Digital Library

[68]

Yi Zhang and Je_ G. Schneider. Multi-label output codes using canonical correlation analysis. In Geo_rey J. Gordon, David B. Dunson, and Miroslav Dudík, editors, Proceedings of the Fourteenth International Conference on Arti_cial Intelligence and Statistics, pages 873-882, Fort Lauderdale, USA, 2011. JMLR.org.

Cited By

Gong XBisht NXu GSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Does label smoothing help deep partial label learning?Proceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692704(15823-15838)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692704
Yu CMa XLiu WKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Delving into noisy label detection with clean dataProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620093(40290-40305)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3620093
del Moral PNowaczyk SSant’Anna APashami S(2023)Pitfalls of assessing extracted hierarchies for multi-class classificationPattern Recognition10.1016/j.patcog.2022.109225136:COnline publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1016/j.patcog.2022.109225
Show More Cited By

Recommendations

Two stage architecture for multi-label learning

A common approach to solving multi-label learning problems is to use problem transformation methods and dichotomizing classifiers as in the pair-wise decomposition strategy. One of the problems with this strategy is the need for querying a quadratic ...
Context-aware MIML instance annotation: exploiting label correlations with classifier chains

In multi-instance multi-label (MIML) instance annotation, the goal is to learn an instance classifier while training on a MIML dataset, which consists of bags of instances paired with label sets; instance labels are not provided in the training data. ...
Multiple labels associative classification

Building fast and accurate classifiers for large-scale databases is an important task in data mining. There is growing evidence that integrating classification and association rule mining can produce more efficient and accurate classifiers than ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 18, Issue 1

January 2017

8830 pages

ISSN:1532-4435

EISSN:1533-7928

Editors:
Kevin Murphy
Google
,
Bernhard Schölkopf
MPI for Intelligent Systems

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 January 2017

Published in JMLR Volume 18, Issue 1

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
216
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)6

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gong XBisht NXu GSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Does label smoothing help deep partial label learning?Proceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692704(15823-15838)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692704
Yu CMa XLiu WKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Delving into noisy label detection with clean dataProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620093(40290-40305)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3620093
del Moral PNowaczyk SSant’Anna APashami S(2023)Pitfalls of assessing extracted hierarchies for multi-class classificationPattern Recognition10.1016/j.patcog.2022.109225136:COnline publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1016/j.patcog.2022.109225
Xu JLiu WKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)On robust multiclass learnabilityProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602618(32412-32423)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602618
Mishra NSingh P(2022)Linear Ordering Problem based Classifier Chain using Genetic Algorithm for multi-label classificationApplied Soft Computing10.1016/j.asoc.2021.108395117:COnline publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1016/j.asoc.2021.108395
Li YZhang TBi JWang J(2022)DD-GAN: pedestrian image inpainting with simultaneous tone correctionMultimedia Tools and Applications10.1007/s11042-022-12342-z82:2(2503-2516)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1007/s11042-022-12342-z
Gong XYuan DBao WRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Understanding partial multi-label learning via mutual informationProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540578(4147-4156)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3540578
Song LWu JYang MZhang QLi YYuan JShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Handling Difficult Labels for Multi-label Image Classification via Uncertainty DistillationProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475406(2410-2419)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475406
Wang LDing ZFu Y(2021)Generic Multi-label Annotation via Adaptive Graph and Marginalized AugmentationACM Transactions on Knowledge Discovery from Data10.1145/345188416:1(1-20)Online publication date: 20-Jul-2021
https://dl.acm.org/doi/10.1145/3451884
Zhang CHan ZCui YFu HZhou JHu QWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)CPM-netsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454338(559-569)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3454338
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents