[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Text Learning and Hierarchical Feature Selection in Webpage Classification

  • Conference paper
Advanced Data Mining and Applications (ADMA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5139))

Included in the following conference series:

Abstract

One of the solutions of retrieving information from the Internet is by classifying web pages automatically. In almost all classification methods that have been published, feature selection is a very important issue. Although there are many feature selection methods has been proposed. Most of them focus on the features within a category and ignore that the hierarchy of categories also plays an important role in achieving accurate classification results. This paper proposes a new feature selection method that incorporates hierarchical information, which prevents the classifying process from going through every node in the hierarchy. Our test results show that our classification algorithm using hierarchical information reduces the search complexity from n to log(n) and increases the accuracy by 6.2% comparing to a related algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 71.50
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 89.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Rousu, J., Saunders, C., Szedmak, S., Shawe-Taylor, J.: Learning Hierarchical Multi-Category Text Classification Models. In: Proceedings of 22nd International Conference on Machine Learning (ICML 2005), Bonn, Germany (2005)

    Google Scholar 

  2. Yahoo.: http://www.Yahoo.com

  3. Kan, M.-Y., Thi, H.O.N.: Fast webpage classification using URL features. In: Proc. of Conf. on Info. and Knowledge Management (CIKM 2005), Germany (2005)

    Google Scholar 

  4. Dumais, S., Chen, H.: Hierarchical Classification of Web Content. In: Proceedings of SIGIR 2000, 23rd ACM International Conference on Research and Development in Information Retrieval (2000)

    Google Scholar 

  5. Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of the 14th international Conference on Machine Learning ECML 1998 (1998)

    Google Scholar 

  6. Lang, K.: Newsweeder: Learning to filter news. In: Proceedings of the 12th International Conference on Machine Learning, pp. 331–339 (1995)

    Google Scholar 

  7. Mladenic, D., Grobelnik, M.: Word sequences as features in text-learning. In: Proceedings of ERK 1998, the Seventh Electro-technical and Computer Science Conference, pp. 145–148 (1998)

    Google Scholar 

  8. Chan, P.K.: A non-invasive learning approach to building web user profiles. In: KDD 1999 Workshop on Web Usage Analysis and User Profiling (1999)

    Google Scholar 

  9. Salton, G., Buckley, C.: Term Weighting Approaches in Automatic Text Retrieval. Technical Report, COR-87-881, Department of Computer Science, Cornell University (1987)

    Google Scholar 

  10. Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: International Conference on Machine Learning (ICML) (1997)

    Google Scholar 

  11. Dominggos, P., Pazzani, M.: On the optimality of the simple Baysian classifier under zero-one loss. Machine learning 29, 103–130 (1997)

    Article  Google Scholar 

  12. Yang, Y., Pedersen, O.J.: A comparative Study o Feature Selection in Text Categorization. In: Proc. of the fifth International Conference on Machine Learning ICML 1997, pp. 412–420 (1997)

    Google Scholar 

  13. Paice, C.D.: Constructing Literature Abstracts by Computer: Techniques and Prospects. Information Processing and Management 26(1), 171–186 (1990)

    Article  Google Scholar 

  14. Mladenic, D.: Machine Learning on non-homogeneous, distributed text data. Ph.D thesis. University of Ljubljana, Slovenia (1998)

    Google Scholar 

  15. Labrou, Y., Finin, T.: Yahoo! as an ontology – using Yahoo! Categories to Describe Document. In: CIKM 1999. Proceedings of the Eighth International Conference on Knowledge and Information Management, pp. 180–187. ACM, New York (1999)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Peng, X., Ming, Z., Wang, H. (2008). Text Learning and Hierarchical Feature Selection in Webpage Classification. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88192-6_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88191-9

  • Online ISBN: 978-3-540-88192-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics