Wakaki et al., 2004 - Google Patents

Rough set-aided feature selection for automatic web-page classification

Wakaki et al., 2004

Document ID: 1448851852248161898
Author: Wakaki T; Itakura H; Tamura M
Publication year: 2004
Publication venue: IEEE/WIC/ACM International Conference on Web Intelligence (WI'04)

External Links

Cited by

Snippet

Recently Web-pages on the World Wide Web are explosively increasing, and it is now required for portal sites such as Yahoo! service having directory-style search engines to classify Web-pages into many categories automatically. This paper investigates how rough …

Continue reading at ieeexplore.ieee.org (other versions)

238000010187 selection method 0 abstract description 17

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/3071—Clustering or classification including class or cluster creation or modification
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/30707—Clustering or classification into predefined classes
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6279—Classification techniques relating to the number of classes
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6228—Selecting the most significant subset of features
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99935—Query augmenting and refining, e.g. inexact access
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management

Similar Documents

Publication	Publication Date	Title
Özgür et al.	2005	Text categorization with class-based and corpus-based keyword selection
US7043468B2 (en)	2006-05-09	Method and system for measuring the quality of a hierarchy
US7376635B1 (en)	2008-05-20	Theme-based system and method for classifying documents
JP3726263B2 (en)	2005-12-14	Document classification method and apparatus
Gong et al.	2011	Text stream clustering algorithm based on adaptive feature selection
US7809705B2 (en)	2010-10-05	System and method for determining web page quality using collective inference based on local and global information
US20110125747A1 (en)	2011-05-26	Data classification based on point-of-view dependency
Zheng et al.	2003	Optimally combining positive and negative features for text categorization
Wang et al.	2005	A new approach to feature selection in text classification
Hotho et al.	2002	Conceptual clustering of text clusters
Joshi et al.	2011	Categorizing the document using multi class classification in data mining
Wakaki et al.	2004	Rough set-aided feature selection for automatic web-page classification
Silva et al.	2007	On text-based mining with active learning and background knowledge using svm
Zaghloul et al.	2009	Text classification: neural networks vs support vector machines
Zelaia et al.	2011	A multiclass/multilabel document categorization system: Combining multiple classifiers in a reduced dimension
Gonçalves et al.	2005	Evaluating preprocessing techniques in a text classification problem
Wakaki et al.	2006	A study on rough set-aided feature selection for automatic web-page classification
Fresno et al.	2004	An analytical approach to concept extraction in html environments
Cardoso-Cachopo et al.	2006	Empirical evaluation of centroid-based models for single-label text categorization
Selamat et al.	2003	Neural networks for web page classification based on augmented PCA
Tasci et al.	2008	An evaluation of existing and new feature selection metrics in text categorization
Krithara et al.	2013	TL-PLSA: Transfer learning between domains with different classes
Silva et al.	2004	Margin-based active learning and background knowledge in text mining
Zhang et al.	2003	Improving the classification performance of boolean kernels by applying Occam’s razor
Addis et al.	2010	Using progressive filtering to deal with information overload