More Web Proxy on the site http://driver.im/

research-article

Public Access

Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies

Authors:

William W. CohenAuthors Info & Claims

WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

Pages 193 - 202

https://doi.org/10.1145/2835776.2835810

Published: 08 February 2016 Publication History

Abstract

In an entity classification task, topic or concept hierarchies are often incomplete. Previous work by Dalvi et al. [12] has showed that in non-hierarchical semi-supervised classification tasks, the presence of such unanticipated classes can cause semantic drift for seeded classes. The Exploratory learning [12] method was proposed to solve this problem; however it is limited to the flat classification task. This paper builds such exploratory learning methods for hierarchical classification tasks.

We experimented with subsets of the NELL [8] ontology and text, and HTML table datasets derived from the ClueWeb09 corpus. Our method (OptDAC-ExploreEM) outperforms the existing Exploratory EM method, and its naive extension (DAC-ExploreEM), in terms of seed class F1 on average by 10% and 7% respectively.

References

[1]

Freebase. http://freebase.com.

[2]

AIMMS. The MOSEK toolkit.

[3]

S. Basu, A. Banerjee, and R. Mooney. Semi-supervised clustering by seeding. In Proceedings of 19th International Conference on Machine Learning (ICML-2002. Citeseer, 2002.

Digital Library

[4]

D. M. Blei, T. L. Griffiths, and M. I. Jordan. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. Journal of the ACM (JACM), 57(2):7, 2010.

Digital Library

[5]

D. M. Blei, T. L. Griffiths, M. I. Jordan, and J. B. Tenenbaum. Hierarchical topic models and the nested chinese restaurant process. Advances in neural information processing systems, 16:17, 2004.

[6]

L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In Proceedings of the thirteenth ACM international conference on Information and knowledge management, pages 78--87. ACM, 2004.

Digital Library

[7]

J. Callan. The clueweb09 dataset. http://boston.lti.cs.cmu.edu/Data/clueweb09/.

[8]

A. Carlson, J. Betteridge, R. C. Wang, E. R. Hruschka Jr, and T. M. Mitchell. Coupled semi-supervised learning for information extraction. In Proceedings of the third ACM international conference on Web search and data mining, pages 101--110. ACM, 2010.

Digital Library

[9]

B. Dalvi and W. W. Cohen. Multi-view hierarchical semi-supervised learning by optimal assignment of sets of labels to instances. 2014.

[10]

B. Dalvi, W. W. Cohen, and J. Callan. Websets: Extracting sets of entities from the web using unsupervised information extraction. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM '12, pages 243--252, New York, NY, USA, 2012. ACM.

Digital Library

[11]

B. Dalvi, W. W. Cohen, and J. Callan. Classifying entities into an incomplete ontology. In Proceedings of the 2013 workshop on Automated knowledge base construction, pages 31--36. ACM, 2013.

Digital Library

[12]

B. Dalvi, W. W. Cohen, and J. Callan. Exploratory learning. In Machine Learning and Knowledge Discovery in Databases, pages 128--143. Springer, 2013.

[13]

B. Dalvi, E. Minkov, P. P. Talukdar, and W. W. Cohen. Automatic gloss finding for a knowledge base using ontological constraints. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM '15, pages 369--378, New York, NY, USA, 2015. ACM.

Digital Library

[14]

N. Friedman. The bayesian structural em algorithm. In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence, pages 129--138. Morgan Kaufmann Publishers Inc., 1998.

Digital Library

[15]

Z. Ghahramani, M. I. Jordan, and R. P. Adams. Tree-structured stick breaking for hierarchical data. In Advances in Neural Information Processing Systems, pages 19--27, 2010.

[16]

S. Gopal, Y. Yang, B. Bai, and A. Niculescu-Mizil. Bayesian models for large-scale hierarchical classification. In Advances in Neural Information Processing Systems, pages 2411--2419, 2012.

[17]

C. D. Manning, P. Raghavan, and H. Schtze. Introduction to information retrieval. In Cambridge University Press, 2008.

[18]

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111--3119, 2013.

Digital Library

[19]

T. P. Mohamed, E. R. Hruschka, Jr., and T. M. Mitchell. Discovering relations between noun categories. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '11, pages 1447--1455, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics.

Digital Library

[20]

A. Pal, N. Dalvi, and K. Bellare. Discovering hierarchical structure for sources and entities. In Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013.

Digital Library

[21]

J. Reisinger and M. Paşca. Latent variable models of concept-attribute attachment. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, pages 620--628. Association for Computational Linguistics, 2009.

Digital Library

[22]

R. Snow, D. Jurafsky, and A. Y. Ng. Semantic taxonomy induction from heterogenous evidence. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 801--808. Association for Computational Linguistics, 2006.

Digital Library

[23]

P. Willett. Recent trends in hierarchic document clustering: a critical review. Information Processing & Management, 24(5):577--597, 1988.

Digital Library

[24]

L. Xiao, D. Zhou, and M. Wu. Hierarchical classification via orthogonal transfer. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 801--808, 2011.

Cited By

Ghayekhloo MNickabadi A(2023)CLP-GCN: Confidence and label propagation applied to Graph Convolutional NetworksApplied Soft Computing10.1016/j.asoc.2022.109850132(109850)Online publication date: Jan-2023
https://doi.org/10.1016/j.asoc.2022.109850
Hong XZhang TCui ZYang J(2021)Variational Gridded Graph Convolution Network for Node ClassificationIEEE/CAA Journal of Automatica Sinica10.1109/JAS.2021.10042018:10(1697-1708)Online publication date: Oct-2021
https://doi.org/10.1109/JAS.2021.1004201
Xiao HLiu XSong Y(2019)Efficient Path Prediction for Semi-Supervised and Weakly Supervised Hierarchical Text ClassificationThe World Wide Web Conference10.1145/3308558.3313658(3370-3376)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313658
Show More Cited By

Index Terms

Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies
1. Computing methodologies
  1. Machine learning
    1. Learning settings

Recommendations

Efficient Path Prediction for Semi-Supervised and Weakly Supervised Hierarchical Text Classification
WWW '19: The World Wide Web Conference

Hierarchical text classification has many real-world applications. However, labeling a large number of documents is costly. In practice, we can use semi-supervised learning or weakly supervised learning (e.g., dataless classification) to reduce the ...
Semi-supervised Hierarchical Classification Based on Local Information
Advances in Artificial Intelligence – IBERAMIA 2022
Abstract
In this work, a semi-supervised hierarchical classifier based on local information (SSHC-BLI) is proposed. SSHC-BLI is a semi-supervised learning algorithm that can be applied to hierarchical classification, that is, it can handle labeled and ...
Semi-supervised boosting for multi-class classification
ECMLPKDD'08: Proceedings of the 2008th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II

Most semi-supervised learning algorithms have been designed for binary classification, and are extended to multi-class classification by approaches such as one-against-the-rest. The main shortcoming of these approaches is that they are unable to exploit ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

February 2016

746 pages

ISBN:9781450337168

DOI:10.1145/2835776

General Chairs:
Paul N. Bennett
Microsoft Research
,
Vanja Josifovski
Pinterest
,
Program Chairs:
Jennifer Neville
Purdue University
,
Filip Radlinski
Microsoft

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF
Google

Conference

WSDM 2016

Sponsor:

WSDM 2016: Ninth ACM International Conference on Web Search and Data Mining

February 22 - 25, 2016

California, San Francisco, USA

Acceptance Rates

WSDM '16 Paper Acceptance Rate 67 of 368 submissions, 18%;

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
513
Total Downloads

Downloads (Last 12 months)63
Downloads (Last 6 weeks)4

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ghayekhloo MNickabadi A(2023)CLP-GCN: Confidence and label propagation applied to Graph Convolutional NetworksApplied Soft Computing10.1016/j.asoc.2022.109850132(109850)Online publication date: Jan-2023
https://doi.org/10.1016/j.asoc.2022.109850
Hong XZhang TCui ZYang J(2021)Variational Gridded Graph Convolution Network for Node ClassificationIEEE/CAA Journal of Automatica Sinica10.1109/JAS.2021.10042018:10(1697-1708)Online publication date: Oct-2021
https://doi.org/10.1109/JAS.2021.1004201
Xiao HLiu XSong Y(2019)Efficient Path Prediction for Semi-Supervised and Weakly Supervised Hierarchical Text ClassificationThe World Wide Web Conference10.1145/3308558.3313658(3370-3376)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313658
Zhuang CMa QChampin PGandon FMédini LLalmas MIpeirotis P(2018)Dual Graph Convolutional Networks for Graph-Based Semi-Supervised ClassificationProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186116(499-508)Online publication date: 10-Apr-2018
https://dl.acm.org/doi/10.1145/3178876.3186116
Duan XZhang JRamachandran RGatlin PMaskey MMiller JBugbee KLee T(2018)A Neural Network-Powered Cognitive Method of Identifying Semantic Entities in Earth Science Papers2018 IEEE International Conference on Cognitive Computing (ICCC)10.1109/ICCC.2018.00009(9-16)Online publication date: Jul-2018
https://doi.org/10.1109/ICCC.2018.00009
Lu WDai HZhang ZWu CZhuang Y(2018)Active instance matching with pairwise constraints and its application to Chinese knowledge base constructionKnowledge and Information Systems10.1007/s10115-017-1076-755:1(171-214)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.1007/s10115-017-1076-7
Clarkson KGentile AGruhl DRistoski PTerdiman JWelch S(2018)User-Centric Ontology PopulationThe Semantic Web10.1007/978-3-319-93417-4_8(112-127)Online publication date: 3-Jun-2018
https://doi.org/10.1007/978-3-319-93417-4_8
Naik ARangwala H(2017)Integrated Framework for Improving Large-Scale Hierarchical Classification2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA.2017.0-146(281-288)Online publication date: Dec-2017
https://doi.org/10.1109/ICMLA.2017.0-146
Li YWang YJiang XDong Z(2016)Teaching-to-Learn and Learning-to-Teach for Few Labeled Classification2016 International Conference on Advanced Cloud and Big Data (CBD)10.1109/CBD.2016.054(271-276)Online publication date: Aug-2016
https://doi.org/10.1109/CBD.2016.054

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents