Enhancements to the data mining process

January 1997

Author:
George Harrison John

Publisher:

Stanford University
408 Panama Mall, Suite 217
Stanford
CA
United States

Order Number:UMI Order No. GAX97-23376

Bibliometrics

Abstract

Data mining is the emerging science and industry of applying modern statistical and computational technologies to the problem of finding useful patterns hidden within large databases. This thesis describes the data mining process and presents advances and novel methods for the six steps in the data mining process: extracting data from a database or data warehouse, cleaning the data, data engineering, algorithm engineering, data mining, and analyzing the results.

We show how the standard data extraction process can be improved by building a direct interface between a data-mining algorithm and a relational database management system. Next, in data cleaning, we show how automatically iterating through the data mining process can identify records that can be profitably ignored during data mining. For data engineering, we develop an automated way to iterate through the data mining process to choose the subset of attributes that yields the best estimated results. In algorithm engineering, a similar process is used to automatically set the parameters of a mining algorithm.

For the data mining algorithms, we study enhancements to classification tree induction methods and Bayesian methods. Our new flexible Bayes data-mining algorithm is fast, understandable, and more accurate than the standard Bayesian classifier in most situations. In classification tree induction we study various univariate splitting criteria and multivariate partitions.

The analysis of results is necessarily domain-dependent. In an example applying data mining to stock selection, we discuss a key requirement in real-world applications: using appropriate domain-dependent methods to evaluate the proposed solution.

Cited By

Contributors

George Harrison John
IBM Research - Almaden
- Publication Years1994 - 2015
- Publication counts16
- Citation count2,318
- Available for Download3
- Downloads (cumulative)4,377
- Downloads (12 months)253
- Downloads (6 weeks)25
- Average Downloads per Article1,459
- Average Citation per Article145
View Full Profile

Index Terms

Enhancements to the data mining process
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information systems applications

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

Toward Intelligent Assistance for a Data Mining Process: An Ontology-Based Approach for Cost-Sensitive Classification

A data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a data mining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some ...
Mining uncertain data

As an important data mining and knowledge discovery task, association rule mining searches for implicit, previously unknown, and potentially useful pieces of information—in the form of rules revealing associative relationships—that are embedded in the ...
Data Mining: Theories, Algorithms, and Examples

Browse Theses

Sections

Cited By

Index Terms

Toward Intelligent Assistance for a Data Mining Process: An Ontology-Based Approach for Cost-Sensitive Classification

Mining uncertain data

Data Mining: Theories, Algorithms, and Examples

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Toward Intelligent Assistance for a Data Mining Process: An Ontology-Based Approach for Cost-Sensitive Classification

Mining uncertain data

Data Mining: Theories, Algorithms, and Examples