Berger et al., 2005 - Google Patents
On the impact of document representation on classifier performance in e-mail categorizationBerger et al., 2005
View PDF- Document ID
- 5325949267863916437
- Author
- Berger H
- Köhle M
- Merkl D
- Publication year
External Links
Snippet
This paper provides an analysis of multi-class e-mail categorization performance. In order to investigate this issue, the quality of various classification algorithms based on two distinct document representation formalisms is compared. In particular, both a standard word-based …
- 238000004458 analytical method 0 abstract description 3
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/30707—Clustering or classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/3071—Clustering or classification including class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30716—Browsing or visualization
- G06F17/30719—Summarization for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30613—Indexing
- G06F17/30619—Indexing indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation, e.g. computer aided management of electronic mail or groupware; Time management, e.g. calendars, reminders, meetings or time accounting
- G06Q10/107—Computer aided management of electronic mail
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dewdney et al. | The form is the substance: Classification of genres in text | |
Méndez et al. | Tokenising, stemming and stopword removal on anti-spam filtering domain | |
Özgür et al. | Text categorization with class-based and corpus-based keyword selection | |
Schneider | Techniques for improving the performance of naive bayes for text classification | |
Cohen | Learning rules that classify e-mail | |
US7827133B2 (en) | Method and arrangement for SIM algorithm automatic charset detection | |
US8150822B2 (en) | On-line iterative multistage search engine with text categorization and supervised learning | |
Song et al. | A comparative study on text representation schemes in text categorization | |
US7689531B1 (en) | Automatic charset detection using support vector machines with charset grouping | |
Ginting et al. | Hate speech detection on twitter using multinomial logistic regression classification method | |
US8560466B2 (en) | Method and arrangement for automatic charset detection | |
De Vel et al. | Multi-topic e-mail authorship attribution forensics | |
Freitag | Trained named entity recognition using distributional clusters | |
Ferreira et al. | A comparative study of feature extraction algorithms in customer reviews | |
CN113112239A (en) | Portable post talent screening method | |
Demirci | Emotion analysis on Turkish tweets | |
Fagan et al. | An introduction to textual econometrics | |
Krishnan et al. | A supervised approach for extractive text summarization using minimal robust features | |
US7698333B2 (en) | Intelligent query system and method using phrase-code frequency-inverse phrase-code document frequency module | |
Ko et al. | Feature selection using association word mining for classification | |
Berger et al. | On the impact of document representation on classifier performance in e-mail categorization | |
Tsuboi | Authorship identification for heterogeneous documents | |
Berger et al. | A comparison of text-categorization methods applied to n-gram frequency statistics | |
Katakis et al. | E-mail mining: Emerging techniques for e-mail management | |
Hong et al. | Effective topic modeling for email |