[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Information and Media Technologies
Online ISSN : 1881-0896
ISSN-L : 1881-0896
Media (processing) and Interaction
Web Page Classification Based on Surrounding Page Model Representing Connection Type and Directory Hierarchy
Yuxin WangKeizo Oyama
Author information
JOURNAL FREE ACCESS

2009 Volume 4 Issue 4 Pages 922-936

Details
Abstract

We propose a web page classification method that is suitable for building web page collections and show its effectiveness through experimentation. First, we describe a model that represents a surrounding page group structure that takes the link relation and directory hierarchy relation into consideration and a method for extracting features based on the model. The method is tested through classification experimentation on two data sets and using the support vector machine (SVM) as the classification algorithm, and its effectiveness is confirmed through comparison with a baseline and the results of previous studies. The contribution of each part of the surrounding pages is also analyzed. Next, we test the method's performance on overall recall-precision range and find that it is superior in the high recall range. Finally, we estimate the performance of a three-grade classifier composed with the method and the amount of manual assessment required to build a web page collection.

Content from these authors
© 2009 by Information Processing Society of Japan
Previous article Next article
feedback
Top