More Web Proxy on the site http://driver.im/

article

Data mining and genetic algorithm based gene/SNP selection

Authors:

Shital C. Shah,

Andrew KusiakAuthors Info & Claims

Artificial Intelligence in Medicine, Volume 31, Issue 3

Pages 183 - 196

https://doi.org/10.1016/j.artmed.2004.04.002

Published: 01 July 2004 Publication History

Abstract

Objective: Genomic studies provide large volumes of data with the number of single nucleotide polymorphisms (SNPs) ranging into thousands. The analysis of SNPs permits determining relationships between genotypic and phenotypic information as well as the identification of SNPs related to a disease. The growing wealth of information and advances in biology call for the development of approaches for discovery of new knowledge. One such area is the identification of gene/SNP patterns impacting cure/drug development for various diseases. Methods: A new approach for predicting drug effectiveness is presented. The approach is based on data mining and genetic algorithms. A global search mechanism, weighted decision tree, decision-tree-based wrapper, a correlation-based heuristic, and the identification of intersecting feature sets are employed for selecting significant genes. Results: The feature selection approach has resulted in 85% reduction of number of features. The relative increase in cross-validation accuracy and specificity for the significant gene/SNP set was 10% and 3.2%, respectively. Conclusion: The feature selection approach was successfully applied to data sets for drug and placebo subjects. The number of features has been significantly reduced while the quality of knowledge was enhanced. The feature set intersection approach provided the most significant genes/SNPs. The results reported in the paper discuss associations among SNPs resulting in patient-specific treatment protocols.

References

[1]

NCBI-single nucleotide polymorphism, DbSNP overview-a database of single nucleotide polymorphisms, NCBI. Available at http://www.ncbi.nlm.nih.gov/SNP/get_html.cgi?whichHtml=overview. Accessed on 30 July 2003.

[2]

Herrera S. With the race to chart the human genome over, now the real work begins. Red Herring magazine. 1 April 2001. Available at http://www.redherring.com/mag/issue95/1380018938.html. Accessed on 30 July 2003.

[3]

SNP Consortium, single nucleotide polymorphisms for biomedical research. The SNP Consortium Ltd. Available at http://www.snp.cshl.org/. Accessed on 30 July 2003.

[4]

Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., GaasenBeek, M. and Mesirov, J.P., Molecular classification of cancer: class discovery and class prediction by gene-expression monitoring. Science. v286. 531-537.

[5]

Raychaudhuri, S., Sutphin, P.D., Chang, J.T. and Altman, R.B., Basic microarray analysis: grouping and feature reduction. Trends Biotechnol. v19 i5. 189-193.

[6]

Johnson, J.A. and Evans, W.E., Molecular diagnostics as a predictive tool: genetics of drug efficacy and toxicity. Trends Mol. Med. v8 i6. 300-305.

[7]

NHGRI, Executive summary of the SNP meeting, National Human Genome Research Institute. Available at http://www.genome.gov/10001884. Accessed on 30 July 2003.

[8]

D'haeseleer, P., Liang, S. and Somogyi, R., Genetic network inference: from co-expression clustering to reverse engineering. Bioinformatics. v16. 707-726.

[9]

Kirschner, M., Pujol, G. and Radu, A., Oligonucleotide microarray data mining: search for age-dependent gene expression. Biochem. Biophys. Res. Commun. v298 i5. 772-778.

[10]

Mining DNA sequences to predict sites which mutations cause genetic diseases. Knowl-based Syst. v15 i4. 225-233.

[11]

Oliveira, G. and Johnston, D.A., Mining the schistosome DNA sequence database. Trends Parasitol. v17 i10. 501-503.

[12]

Fuhrman, S., Cunningham, M.J., Wen, X., Zweiger, G., Seilhamer, J. and Somogyi, R., The application of Shannon entropy in the identification of putative drug targets. Biosystems. v55. 5-14.

[13]

Arkin, A., Shen, P. and Ross, J., A test case of correlation metric construction of a reaction pathway from measurements. Science. v277. 1275-1279.

[14]

Cho SB, Won HH. Machine learning in DNA Microarray analysis for cancer classification. In: Yi-Ping Phoebe Chen, editors. Proceedings of the First Asia-Pacific Bioinformatics Conference. Australian Computer Society; 2003. p. 189-98, ISBN: 0909925976.

[15]

Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R. Advances in knowledge discovery and data mining. Cambridge, MA: AAAI/MIT Press; 1995.

[16]

Kusiak, A., Kern, J.A., Kernstine, K.H. and Tseng, T.L., Autonomous decision-making: a data mining approach. IEEE Trans. Inf. Technol. Biomed. v4 i4. 274-284.

[17]

Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. Technical Report 576. Department of Statistics, University of California, Berkeley, CA; 2000.

[18]

Li, L., Weinberg, C.R., Darden, T.A. and Pedersen, L.G., Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics. v17 i12. 1131-1142.

[19]

Khan, J., Wei, J.S., Ringnér, M., Saal, L.H., Ladanyi, M. and Westermann, F., Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. v7 i6. 673-679.

[20]

Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M. and Haussler, andD., validation of cancer tissue samples using microarray expression data. Bioinformatics. v16 i10. 906-914.

[21]

Eisen MB, Spellman, PT, Brown PO, Bostein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci 1998;95(25):14863-8.

[22]

Hartuv, E., Schmitt, A., Lange, J., Meier-Ewert, S., Lehrach, H. and Shamir, R., An algorithm for clustering cDNA fingerprints. Genomics. v66 i3. 249-256.

[23]

Hyvarinen, A. and Oja, E., Independent component analysis: algorithms and applications. Neural Netw. v13. 411-430.

[24]

Sun, H.X., Zhang, K.X., Du, W.N., Shi, J.X., Jiang, Z.W. and Sun, H., Single nucleotide polymorphisms in CAPN10 gene of Chinese people and its correlation with type 2 diabetes mellitus in Han people of northern China. Biomed. Environ. Sci. v15 i1. 75-82.

[25]

Useche, F., Gao, G., Hanafey, M. and Rafalski, A., High-throughput identification, database storage and analysis of SNPs in EST sequences. Genome Inform. v12. 194-203.

[26]

Gray, I.C., Campbell, D.A. and Spurr, N.K., Single nucleotide polymorphisms as tools in human genetics. Hum. Mol. Genet. v9 i16. 2403-2408.

[27]

Goldberg DE. Genetic algorithms in search, optimization, and machine learning. New York: Addison Wesley Longman Inc.; 1989.

[28]

Holland JH. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. Cambridge, MA: MIT Press; 1975.

[29]

Michalewicz Z. Genetic algorithms + data structures = evolution programs. Berlin: Springer-Verlag; 1992.

[30]

Lawrence D. Handbook of genetic algorithms. New York: Van Nostrand Reinhold; 1991.

[31]

Quinlan R. C 4.5 programs for machine learning. San Meteo CA: Morgan Kaufmann; 1992.

[32]

Witten I, Frank E. Data mining: practical machine learning tools and techniques with java implementations. San Francisco, CA: Morgan Kaufmann; 2000.

[33]

Kohavi, R. and John, G.H., Wrappers for feature subset selection. Artif. Intell. v97 i1-2. 273-324.

[34]

John GH, Kohavi R, Pfleger K. Irrelevant features and the subset selection problem. In Cohen WW, Hirsh H, editors. In: Proceedings of the 11th International Conference on Machine Learning ICML94. San Francisco, CA: Morgan Kaufmann; 1994. p. 121-9.

[35]

Hall MA, Smith LA. Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In: Kumar A, Russell I, editors. Proceedings of the Florida Artificial Intelligence Research Symposium, Orlando, Florida. Menlo Park, CA: AAAI Press; 1999. p. 235-239. ISBN: 1577350804.

[36]

Vafaie H, DeJong K. Genetic algorithms as a tool for restructuring feature space representations, In: Proceedings of the Seventh International Conference on Tools with Artificial Intelligence. Los Alamitos, CA: IEEE Computer Society Press; 1996. p. 8-11. ISBN: 0818673125.

[37]

Zhang L, Zhao Y, Yang Z, Wang J. Feature selection in recognition of handwritten Chinese characters. In: Proceedings of the 2002 International Conference on Machine Learning and Cybernetics. Piscataway, NJ: IEEE; 2002. p. 1158-62. ISBN: 0780375084.

Cited By

Krishna GRavi VDey LChaudhury SKrishnapuram RSingla PRoy R(2019)Feature Subset Selection using Adaptive Differential EvolutionProceedings of the ACM India Joint International Conference on Data Science and Management of Data10.1145/3297001.3297021(157-163)Online publication date: 3-Jan-2019
https://dl.acm.org/doi/10.1145/3297001.3297021
Thamwiwatthana EPasupa KTongsima S(2018)Selection of SNP Subsets for Severity of Beta-thalassaemia Classification ProblemProceedings of the 9th International Conference on Computational Systems-Biology and Bioinformatics10.1145/3291757.3291770(1-7)Online publication date: 10-Dec-2018
https://dl.acm.org/doi/10.1145/3291757.3291770
Gonzaga ACordeiro RShin SShin DLencastre M(2017)The similarity-aware relational division database operatorProceedings of the Symposium on Applied Computing10.1145/3019612.3019869(913-914)Online publication date: 3-Apr-2017
https://dl.acm.org/doi/10.1145/3019612.3019869
Show More Cited By

Index Terms

Data mining and genetic algorithm based gene/SNP selection
1. Applied computing
  1. Life and medical sciences
2. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Heuristic function construction
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning algorithms
      1. Feature selection

Index terms have been assigned to the content through auto-classification.

Recommendations

Tag SNP selection via a genetic algorithm

Single Nucleotide Polymorphisms (SNPs) provide valuable information on human evolutionary history and may lead us to identify genetic variants responsible for human complex diseases. Unfortunately, molecular haplotyping methods are costly, laborious, ...
A genetic algorithm for classifying metagenomic data
GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference Companion

The goal of metagenomic analysis is to extract relevant information concerning the organisms that have left their genetic traces in an environmental sample. Each sample is subject to nucleotide sequencing, and obtained DNA fragments are decomposed into k-...
Mining massive SNP data for identifying associated SNPs and uncovering gene relationships
BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

Studies on SNP correlations have been focused on SNPs located on the same chromosome since SNPs on different chromosomes are expected to segregate randomly. Previous studies suggest that SNPs can be associated with each other over long distances and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Artificial Intelligence in Medicine

Artificial Intelligence in Medicine Volume 31, Issue 3

July, 2004

76 pages

ISSN:0933-3657

Issue’s Table of Contents

Copyright © Elsevier B.V. © 2004.

Publisher

Elsevier Science Publishers Ltd.

United Kingdom

Publication History

Published: 01 July 2004

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Krishna GRavi VDey LChaudhury SKrishnapuram RSingla PRoy R(2019)Feature Subset Selection using Adaptive Differential EvolutionProceedings of the ACM India Joint International Conference on Data Science and Management of Data10.1145/3297001.3297021(157-163)Online publication date: 3-Jan-2019
https://dl.acm.org/doi/10.1145/3297001.3297021
Thamwiwatthana EPasupa KTongsima S(2018)Selection of SNP Subsets for Severity of Beta-thalassaemia Classification ProblemProceedings of the 9th International Conference on Computational Systems-Biology and Bioinformatics10.1145/3291757.3291770(1-7)Online publication date: 10-Dec-2018
https://dl.acm.org/doi/10.1145/3291757.3291770
Gonzaga ACordeiro RShin SShin DLencastre M(2017)The similarity-aware relational division database operatorProceedings of the Symposium on Applied Computing10.1145/3019612.3019869(913-914)Online publication date: 3-Apr-2017
https://dl.acm.org/doi/10.1145/3019612.3019869
Salem HAttiya GEl-Fishawy N(2017)Early diagnosis of breast cancer by gene expression profilesPattern Analysis & Applications10.1007/s10044-016-0574-720:2(567-578)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1007/s10044-016-0574-7
(2016)Tag SNP selection using clonal selection and majority voting algorithmsInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2016.08220816:4(290-311)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.1504/IJDMB.2016.082208
Toor RChana I(2016)Application of IT in healthcareACM SIGBioinformatics Record10.1145/2983313.29833156:2(1-8)Online publication date: 3-Aug-2016
https://dl.acm.org/doi/10.1145/2983313.2983315
Esfandiari NBabavalian MMoghadam ATabar V(2014)ReviewExpert Systems with Applications: An International Journal10.1016/j.eswa.2014.01.01141:9(4434-4463)Online publication date: 1-Jul-2014
https://dl.acm.org/doi/10.1016/j.eswa.2014.01.011
Mao KTang W(2011)Recursive Mahalanobis Separability Measure for Gene Subset SelectionIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2010.438:1(266-272)Online publication date: 1-Jan-2011
https://dl.acm.org/doi/10.1109/TCBB.2010.43
Kelemen AVasilakos ALiang Y(2009)Computational intelligence for genetic association study in complex diseases: review of theory and applicationsInternational Journal of Computational Intelligence in Bioinformatics and Systems Biology10.1504/IJCIBSB.2009.0240411:1(15-31)Online publication date: 1-Mar-2009
https://dl.acm.org/doi/10.1504/IJCIBSB.2009.024041
Kelemen AVasilakos ALiang Y(2009)Computational intelligence in bioinformaticsIEEE Transactions on Information Technology in Biomedicine10.1109/TITB.2009.202414413:5(841-847)Online publication date: 1-Sep-2009
https://dl.acm.org/doi/10.1109/TITB.2009.2024144
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents