[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Data mining and genetic algorithm based gene/SNP selection

Published: 01 July 2004 Publication History

Abstract

Objective: Genomic studies provide large volumes of data with the number of single nucleotide polymorphisms (SNPs) ranging into thousands. The analysis of SNPs permits determining relationships between genotypic and phenotypic information as well as the identification of SNPs related to a disease. The growing wealth of information and advances in biology call for the development of approaches for discovery of new knowledge. One such area is the identification of gene/SNP patterns impacting cure/drug development for various diseases. Methods: A new approach for predicting drug effectiveness is presented. The approach is based on data mining and genetic algorithms. A global search mechanism, weighted decision tree, decision-tree-based wrapper, a correlation-based heuristic, and the identification of intersecting feature sets are employed for selecting significant genes. Results: The feature selection approach has resulted in 85% reduction of number of features. The relative increase in cross-validation accuracy and specificity for the significant gene/SNP set was 10% and 3.2%, respectively. Conclusion: The feature selection approach was successfully applied to data sets for drug and placebo subjects. The number of features has been significantly reduced while the quality of knowledge was enhanced. The feature set intersection approach provided the most significant genes/SNPs. The results reported in the paper discuss associations among SNPs resulting in patient-specific treatment protocols.

References

[1]
NCBI-single nucleotide polymorphism, DbSNP overview-a database of single nucleotide polymorphisms, NCBI. Available at http://www.ncbi.nlm.nih.gov/SNP/get_html.cgi?whichHtml=overview. Accessed on 30 July 2003.
[2]
Herrera S. With the race to chart the human genome over, now the real work begins. Red Herring magazine. 1 April 2001. Available at http://www.redherring.com/mag/issue95/1380018938.html. Accessed on 30 July 2003.
[3]
SNP Consortium, single nucleotide polymorphisms for biomedical research. The SNP Consortium Ltd. Available at http://www.snp.cshl.org/. Accessed on 30 July 2003.
[4]
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., GaasenBeek, M. and Mesirov, J.P., Molecular classification of cancer: class discovery and class prediction by gene-expression monitoring. Science. v286. 531-537.
[5]
Raychaudhuri, S., Sutphin, P.D., Chang, J.T. and Altman, R.B., Basic microarray analysis: grouping and feature reduction. Trends Biotechnol. v19 i5. 189-193.
[6]
Johnson, J.A. and Evans, W.E., Molecular diagnostics as a predictive tool: genetics of drug efficacy and toxicity. Trends Mol. Med. v8 i6. 300-305.
[7]
NHGRI, Executive summary of the SNP meeting, National Human Genome Research Institute. Available at http://www.genome.gov/10001884. Accessed on 30 July 2003.
[8]
D'haeseleer, P., Liang, S. and Somogyi, R., Genetic network inference: from co-expression clustering to reverse engineering. Bioinformatics. v16. 707-726.
[9]
Kirschner, M., Pujol, G. and Radu, A., Oligonucleotide microarray data mining: search for age-dependent gene expression. Biochem. Biophys. Res. Commun. v298 i5. 772-778.
[10]
Mining DNA sequences to predict sites which mutations cause genetic diseases. Knowl-based Syst. v15 i4. 225-233.
[11]
Oliveira, G. and Johnston, D.A., Mining the schistosome DNA sequence database. Trends Parasitol. v17 i10. 501-503.
[12]
Fuhrman, S., Cunningham, M.J., Wen, X., Zweiger, G., Seilhamer, J. and Somogyi, R., The application of Shannon entropy in the identification of putative drug targets. Biosystems. v55. 5-14.
[13]
Arkin, A., Shen, P. and Ross, J., A test case of correlation metric construction of a reaction pathway from measurements. Science. v277. 1275-1279.
[14]
Cho SB, Won HH. Machine learning in DNA Microarray analysis for cancer classification. In: Yi-Ping Phoebe Chen, editors. Proceedings of the First Asia-Pacific Bioinformatics Conference. Australian Computer Society; 2003. p. 189-98, ISBN: 0909925976.
[15]
Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R. Advances in knowledge discovery and data mining. Cambridge, MA: AAAI/MIT Press; 1995.
[16]
Kusiak, A., Kern, J.A., Kernstine, K.H. and Tseng, T.L., Autonomous decision-making: a data mining approach. IEEE Trans. Inf. Technol. Biomed. v4 i4. 274-284.
[17]
Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. Technical Report 576. Department of Statistics, University of California, Berkeley, CA; 2000.
[18]
Li, L., Weinberg, C.R., Darden, T.A. and Pedersen, L.G., Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics. v17 i12. 1131-1142.
[19]
Khan, J., Wei, J.S., Ringnér, M., Saal, L.H., Ladanyi, M. and Westermann, F., Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. v7 i6. 673-679.
[20]
Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M. and Haussler, andD., validation of cancer tissue samples using microarray expression data. Bioinformatics. v16 i10. 906-914.
[21]
Eisen MB, Spellman, PT, Brown PO, Bostein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci 1998;95(25):14863-8.
[22]
Hartuv, E., Schmitt, A., Lange, J., Meier-Ewert, S., Lehrach, H. and Shamir, R., An algorithm for clustering cDNA fingerprints. Genomics. v66 i3. 249-256.
[23]
Hyvarinen, A. and Oja, E., Independent component analysis: algorithms and applications. Neural Netw. v13. 411-430.
[24]
Sun, H.X., Zhang, K.X., Du, W.N., Shi, J.X., Jiang, Z.W. and Sun, H., Single nucleotide polymorphisms in CAPN10 gene of Chinese people and its correlation with type 2 diabetes mellitus in Han people of northern China. Biomed. Environ. Sci. v15 i1. 75-82.
[25]
Useche, F., Gao, G., Hanafey, M. and Rafalski, A., High-throughput identification, database storage and analysis of SNPs in EST sequences. Genome Inform. v12. 194-203.
[26]
Gray, I.C., Campbell, D.A. and Spurr, N.K., Single nucleotide polymorphisms as tools in human genetics. Hum. Mol. Genet. v9 i16. 2403-2408.
[27]
Goldberg DE. Genetic algorithms in search, optimization, and machine learning. New York: Addison Wesley Longman Inc.; 1989.
[28]
Holland JH. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. Cambridge, MA: MIT Press; 1975.
[29]
Michalewicz Z. Genetic algorithms + data structures = evolution programs. Berlin: Springer-Verlag; 1992.
[30]
Lawrence D. Handbook of genetic algorithms. New York: Van Nostrand Reinhold; 1991.
[31]
Quinlan R. C 4.5 programs for machine learning. San Meteo CA: Morgan Kaufmann; 1992.
[32]
Witten I, Frank E. Data mining: practical machine learning tools and techniques with java implementations. San Francisco, CA: Morgan Kaufmann; 2000.
[33]
Kohavi, R. and John, G.H., Wrappers for feature subset selection. Artif. Intell. v97 i1-2. 273-324.
[34]
John GH, Kohavi R, Pfleger K. Irrelevant features and the subset selection problem. In Cohen WW, Hirsh H, editors. In: Proceedings of the 11th International Conference on Machine Learning ICML94. San Francisco, CA: Morgan Kaufmann; 1994. p. 121-9.
[35]
Hall MA, Smith LA. Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In: Kumar A, Russell I, editors. Proceedings of the Florida Artificial Intelligence Research Symposium, Orlando, Florida. Menlo Park, CA: AAAI Press; 1999. p. 235-239. ISBN: 1577350804.
[36]
Vafaie H, DeJong K. Genetic algorithms as a tool for restructuring feature space representations, In: Proceedings of the Seventh International Conference on Tools with Artificial Intelligence. Los Alamitos, CA: IEEE Computer Society Press; 1996. p. 8-11. ISBN: 0818673125.
[37]
Zhang L, Zhao Y, Yang Z, Wang J. Feature selection in recognition of handwritten Chinese characters. In: Proceedings of the 2002 International Conference on Machine Learning and Cybernetics. Piscataway, NJ: IEEE; 2002. p. 1158-62. ISBN: 0780375084.

Cited By

View all
  • (2019)Feature Subset Selection using Adaptive Differential EvolutionProceedings of the ACM India Joint International Conference on Data Science and Management of Data10.1145/3297001.3297021(157-163)Online publication date: 3-Jan-2019
  • (2018)Selection of SNP Subsets for Severity of Beta-thalassaemia Classification ProblemProceedings of the 9th International Conference on Computational Systems-Biology and Bioinformatics10.1145/3291757.3291770(1-7)Online publication date: 10-Dec-2018
  • (2017)The similarity-aware relational division database operatorProceedings of the Symposium on Applied Computing10.1145/3019612.3019869(913-914)Online publication date: 3-Apr-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Artificial Intelligence in Medicine
Artificial Intelligence in Medicine  Volume 31, Issue 3
July, 2004
76 pages

Publisher

Elsevier Science Publishers Ltd.

United Kingdom

Publication History

Published: 01 July 2004

Author Tags

  1. Data mining
  2. Drug effectiveness
  3. Feature selection
  4. Genes
  5. Genetic algorithm
  6. Intersection approach
  7. Single nucleotide polymorphisms (SNPs)

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Feature Subset Selection using Adaptive Differential EvolutionProceedings of the ACM India Joint International Conference on Data Science and Management of Data10.1145/3297001.3297021(157-163)Online publication date: 3-Jan-2019
  • (2018)Selection of SNP Subsets for Severity of Beta-thalassaemia Classification ProblemProceedings of the 9th International Conference on Computational Systems-Biology and Bioinformatics10.1145/3291757.3291770(1-7)Online publication date: 10-Dec-2018
  • (2017)The similarity-aware relational division database operatorProceedings of the Symposium on Applied Computing10.1145/3019612.3019869(913-914)Online publication date: 3-Apr-2017
  • (2017)Early diagnosis of breast cancer by gene expression profilesPattern Analysis & Applications10.1007/s10044-016-0574-720:2(567-578)Online publication date: 1-May-2017
  • (2016)Tag SNP selection using clonal selection and majority voting algorithmsInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2016.08220816:4(290-311)Online publication date: 1-Jan-2016
  • (2016)Application of IT in healthcareACM SIGBioinformatics Record10.1145/2983313.29833156:2(1-8)Online publication date: 3-Aug-2016
  • (2014)ReviewExpert Systems with Applications: An International Journal10.1016/j.eswa.2014.01.01141:9(4434-4463)Online publication date: 1-Jul-2014
  • (2011)Recursive Mahalanobis Separability Measure for Gene Subset SelectionIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2010.438:1(266-272)Online publication date: 1-Jan-2011
  • (2009)Computational intelligence for genetic association study in complex diseases: review of theory and applicationsInternational Journal of Computational Intelligence in Bioinformatics and Systems Biology10.1504/IJCIBSB.2009.0240411:1(15-31)Online publication date: 1-Mar-2009
  • (2009)Computational intelligence in bioinformaticsIEEE Transactions on Information Technology in Biomedicine10.1109/TITB.2009.202414413:5(841-847)Online publication date: 1-Sep-2009
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media