Export Citations
Data mining is the emerging science and industry of applying modern statistical and computational technologies to the problem of finding useful patterns hidden within large databases. This thesis describes the data mining process and presents advances and novel methods for the six steps in the data mining process: extracting data from a database or data warehouse, cleaning the data, data engineering, algorithm engineering, data mining, and analyzing the results.
We show how the standard data extraction process can be improved by building a direct interface between a data-mining algorithm and a relational database management system. Next, in data cleaning, we show how automatically iterating through the data mining process can identify records that can be profitably ignored during data mining. For data engineering, we develop an automated way to iterate through the data mining process to choose the subset of attributes that yields the best estimated results. In algorithm engineering, a similar process is used to automatically set the parameters of a mining algorithm.
For the data mining algorithms, we study enhancements to classification tree induction methods and Bayesian methods. Our new flexible Bayes data-mining algorithm is fast, understandable, and more accurate than the standard Bayesian classifier in most situations. In classification tree induction we study various univariate splitting criteria and multivariate partitions.
The analysis of results is necessarily domain-dependent. In an example applying data mining to stock selection, we discuss a key requirement in real-world applications: using appropriate domain-dependent methods to evaluate the proposed solution.
Cited By
- Taylor P, Griffiths N, Bhalerao A, Xu Z, Gelencser A and Popham T (2017). Investigating the Feasibility of Vehicle Telemetry Data as a Means of Predicting Driver Workload, International Journal of Mobile Human Computer Interaction, 9:3, (54-72), Online publication date: 1-Jul-2017.
- Vidulin V, Bohanec M and Gams M (2014). Combining human analysis and machine data mining to obtain credible data relations, Information Sciences: an International Journal, 288:C, (254-278), Online publication date: 20-Dec-2014.
- Bergholz A, De Beer J, Glahn S, Moens M, Paaß G and Strobel S (2010). New filtering approaches for phishing email, Journal of Computer Security, 18:1, (7-35), Online publication date: 1-Jan-2010.
- El-Mouadib F, Zubi Z and Alhouni A New implementation of unsupervised ID3 algorithm (NIU-ID3) using Visual Basic.net Proceedings of the 8th WSEAS international conference on Data networks, communications, computers, (95-108)
- Hilas C and Sahalos J An Application of Decision Trees for Rule Extraction Towards Telecommunications Fraud Detection Knowledge-Based Intelligent Information and Engineering Systems and the XVII Italian Workshop on Neural Networks on Proceedings of the 11th International Conference, (1112-1121)
- Abe H and Yamaguchi T Constructive meta-level feature selection method based on method repositories Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining, (70-80)
- Abdullah A and Hussain A Using biclustering for automatic attribute selection to enhance global visualization Proceedings of the 1st first visual information expert conference on Pixelization paradigm, (35-47)
- Liu H, Yin X and Han J An efficient multi-relational Naïve Bayesian classifier based on semantic relationship graph Proceedings of the 4th international workshop on Multi-relational mining, (39-48)
- Li X (2005). A scalable decision tree system and its application in pattern recognition and intrusion detection, Decision Support Systems, 41:1, (112-130), Online publication date: 1-Nov-2005.
- Li X (2005). A scalable decision tree system and its application in pattern recognition and intrusion detection, Decision Support Systems, 41:1, (112-130), Online publication date: 1-Nov-2005.
- Amor N, Benferhat S and Elouedi Z Naive Bayes vs decision trees in intrusion detection systems Proceedings of the 2004 ACM symposium on Applied computing, (420-424)
- Kwak N and Choi C (2003). Feature Extraction Based on ICA for Binary Classification Problems, IEEE Transactions on Knowledge and Data Engineering, 15:6, (1374-1388), Online publication date: 1-Nov-2003.
- Graefe G, Fayyad U and Chaudhuri S On the efficient gathering of sufficient statistics for classification from large SQL databases Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, (204-208)
Index Terms
- Enhancements to the data mining process
Recommendations
Toward Intelligent Assistance for a Data Mining Process: An Ontology-Based Approach for Cost-Sensitive Classification
A data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a data mining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some ...
Mining uncertain data
As an important data mining and knowledge discovery task, association rule mining searches for implicit, previously unknown, and potentially useful pieces of information—in the form of rules revealing associative relationships—that are embedded in the ...