[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Wakaki et al., 2004 - Google Patents

Rough set-aided feature selection for automatic web-page classification

Wakaki et al., 2004

Document ID
1448851852248161898
Author
Wakaki T
Itakura H
Tamura M
Publication year
Publication venue
IEEE/WIC/ACM International Conference on Web Intelligence (WI'04)

External Links

Snippet

Recently Web-pages on the World Wide Web are explosively increasing, and it is now required for portal sites such as Yahoo! service having directory-style search engines to classify Web-pages into many categories automatically. This paper investigates how rough …
Continue reading at ieeexplore.ieee.org (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30705Clustering or classification
    • G06F17/3071Clustering or classification including class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30705Clustering or classification
    • G06F17/30707Clustering or classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30634Querying
    • G06F17/30657Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6279Classification techniques relating to the number of classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06K9/6228Selecting the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • G06N99/005Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models
    • G06N5/02Knowledge representation
    • G06N5/022Knowledge engineering, knowledge acquisition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99935Query augmenting and refining, e.g. inexact access
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/46Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management

Similar Documents

Publication Publication Date Title
Özgür et al. Text categorization with class-based and corpus-based keyword selection
US7043468B2 (en) Method and system for measuring the quality of a hierarchy
US7376635B1 (en) Theme-based system and method for classifying documents
JP3726263B2 (en) Document classification method and apparatus
Gong et al. Text stream clustering algorithm based on adaptive feature selection
US7809705B2 (en) System and method for determining web page quality using collective inference based on local and global information
US20110125747A1 (en) Data classification based on point-of-view dependency
Zheng et al. Optimally combining positive and negative features for text categorization
Wang et al. A new approach to feature selection in text classification
Hotho et al. Conceptual clustering of text clusters
Joshi et al. Categorizing the document using multi class classification in data mining
Wakaki et al. Rough set-aided feature selection for automatic web-page classification
Silva et al. On text-based mining with active learning and background knowledge using svm
Zaghloul et al. Text classification: neural networks vs support vector machines
Zelaia et al. A multiclass/multilabel document categorization system: Combining multiple classifiers in a reduced dimension
Gonçalves et al. Evaluating preprocessing techniques in a text classification problem
Wakaki et al. A study on rough set-aided feature selection for automatic web-page classification
Fresno et al. An analytical approach to concept extraction in html environments
Cardoso-Cachopo et al. Empirical evaluation of centroid-based models for single-label text categorization
Selamat et al. Neural networks for web page classification based on augmented PCA
Tasci et al. An evaluation of existing and new feature selection metrics in text categorization
Krithara et al. TL-PLSA: Transfer learning between domains with different classes
Silva et al. Margin-based active learning and background knowledge in text mining
Zhang et al. Improving the classification performance of boolean kernels by applying Occam’s razor
Addis et al. Using progressive filtering to deal with information overload