[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Berger et al., 2005 - Google Patents

On the impact of document representation on classifier performance in e-mail categorization

Berger et al., 2005

View PDF
Document ID
5325949267863916437
Author
Berger H
Köhle M
Merkl D
Publication year

External Links

Snippet

This paper provides an analysis of multi-class e-mail categorization performance. In order to investigate this issue, the quality of various classification algorithms based on two distinct document representation formalisms is compared. In particular, both a standard word-based …
Continue reading at dl.gi.de (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30705Clustering or classification
    • G06F17/30707Clustering or classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30705Clustering or classification
    • G06F17/3071Clustering or classification including class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2765Recognition
    • G06F17/277Lexical analysis, e.g. tokenisation, collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30634Querying
    • G06F17/30657Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30716Browsing or visualization
    • G06F17/30719Summarization for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2705Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30613Indexing
    • G06F17/30619Indexing indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation, e.g. computer aided management of electronic mail or groupware; Time management, e.g. calendars, reminders, meetings or time accounting
    • G06Q10/107Computer aided management of electronic mail

Similar Documents

Publication Publication Date Title
Dewdney et al. The form is the substance: Classification of genres in text
Méndez et al. Tokenising, stemming and stopword removal on anti-spam filtering domain
Özgür et al. Text categorization with class-based and corpus-based keyword selection
Schneider Techniques for improving the performance of naive bayes for text classification
Cohen Learning rules that classify e-mail
US7827133B2 (en) Method and arrangement for SIM algorithm automatic charset detection
US8150822B2 (en) On-line iterative multistage search engine with text categorization and supervised learning
Song et al. A comparative study on text representation schemes in text categorization
US7689531B1 (en) Automatic charset detection using support vector machines with charset grouping
Ginting et al. Hate speech detection on twitter using multinomial logistic regression classification method
US8560466B2 (en) Method and arrangement for automatic charset detection
De Vel et al. Multi-topic e-mail authorship attribution forensics
Freitag Trained named entity recognition using distributional clusters
Ferreira et al. A comparative study of feature extraction algorithms in customer reviews
CN113112239A (en) Portable post talent screening method
Demirci Emotion analysis on Turkish tweets
Fagan et al. An introduction to textual econometrics
Krishnan et al. A supervised approach for extractive text summarization using minimal robust features
US7698333B2 (en) Intelligent query system and method using phrase-code frequency-inverse phrase-code document frequency module
Ko et al. Feature selection using association word mining for classification
Berger et al. On the impact of document representation on classifier performance in e-mail categorization
Tsuboi Authorship identification for heterogeneous documents
Berger et al. A comparison of text-categorization methods applied to n-gram frequency statistics
Katakis et al. E-mail mining: Emerging techniques for e-mail management
Hong et al. Effective topic modeling for email