[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/646420.693671guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Evaluation of Techniques for Classifying Biological Sequences

Published: 06 May 2002 Publication History

Abstract

In recent years we have witnessed an exponential increase in the amount of biological information, either DNA or protein sequences, that has become available in public databases. This has been followed by an increased interest in developingcomp utational techniques to automatically classify these large volumes of sequence data into various categories corresponding to either their role in the chromosomes, their structure, and/or their function. In this paper we evaluate some of the widely-used sequence classification algorithms and develop a framework for modeling sequences in a fashion so that traditional machine learning algorithms, such as support vector machines, can be applied easily. Our detailed experimental evaluation shows that the SVM-based approaches are able to achieve higher classification accuracy compared to the more traditional sequence classification algorithms such as Markov model based techniques and K -nearest neighbor based approaches.

References

[1]
A. Bairoch and R. Apweiler. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Res. , 27 (1):49- 54, 1999.
[2]
Dennis A. Benson, Mark S. Boguski, David J. Lipman, James Ostell, B. F. Francis Ouellette, BArabra A. Rapp, and David L. Wheeler. Gen-Bank. Nucleic Acids Research , 27(1):12-17, 1999.
[3]
W. C. Barker, J. S. Garavelli, D. H. Haft, L. T. Hunt, C. R. Marzec, B. C. Orcutt, G. Y. Srinivasarao, L.S. L. Yeh, R. S. Ledley, H.W. Mewes, F. Pfeiffer, and A. Tsugita. The PIR-International protein sequence database. Nucleic Acids Res. , 27(1):27-32, 1999.
[4]
Richard Durbin, Sean Eddy, Anders Krogh, and Graeme Mitchinson. Biological sequence analysis . Cambridge University Press, 1998.
[5]
Ritu Dhand. Nature Insight: Functional Genomics , volume 405. 2000.
[6]
A. L. Delcher, D. Harmon, S. Kasif, O. White, and S. L. Salzberg. Improved microbial gene identification with glimmer. Nucleic Acid Research , 27(23):4436-4641, 1998.
[7]
M. Deshpande and G. Karypis. Selective markov models for predicting web-page accesses. In First International SIAM Conference on Data Mining , 2001.
[8]
Mukund Deshpande and George Karypis. Evaluation of techniques for classifyingb iological sequence. Technical Report TR-01-033, University of Minnesota, 2001.
[9]
Dan Gusfield. Algorithms on Strings, Trees, and Sequences . Cambridge University Press, 1997.
[10]
Michihiro Kuramochi, Mukund Deshpand, George Karypis, Qing Zhang, and Vivek Kapur. Promoter prediction for prokaryotes. In Passific Symposium on Bioinformatics (submitted) , 2001. Also available as a UMN-CS technical report, TR# 01-030.
[11]
Daniel Kudenko and Haym Hirsh. Feature generation for sequence categorization. In In proceedings of AAAI-98 , 1998.
[12]
Neal Lesh, Mohammed J. Zaki, and Mitsunari Ogihara. Mining features for sequence classification. In 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) , 1999.
[13]
T.M. Mitchell. Machine Learning . WCB/McGraw-Hill, 1997.
[14]
David W. Mount. Bioinformatics: Sequence and Genome Analysis . CSHL Press, 2001.
[15]
Steven L Salzberg, Arthur L. Delcher, Simon Kasif, and Owen White. Microbial gene identification using interpolated markov models. Nucleic Acids Research , 1998.
[16]
V. Vapnik. Statistical Learning Theory . John Wiley, New York, 1998.
[17]
K Wang, S. Zhou, and Y. He. Growing decision trees on supportless assoication rules. In Proceedings of SIGKDD 2000 , 2000.
[18]
Mohamed J. Zaki, Neal Lesh, and Ogihara Mitsunari. Planmine: Predicting plan failures usings equence mining. Intelligence Review, special issue on the Application of Data Mining , 2000.

Cited By

View all
  • (2018)PreFixProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/31794052:1(1-29)Online publication date: 3-Apr-2018
  • (2018)Improved expert selection model for forex tradingFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-6472-312:3(518-527)Online publication date: 1-Jun-2018
  • (2017)A user parameter-free approach for mining robust sequential classification rulesKnowledge and Information Systems10.1007/s10115-016-1002-452:1(53-81)Online publication date: 1-Jul-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
PAKDD '02: Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
May 2002
566 pages
ISBN:3540437045

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 06 May 2002

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)PreFixProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/31794052:1(1-29)Online publication date: 3-Apr-2018
  • (2018)Improved expert selection model for forex tradingFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-6472-312:3(518-527)Online publication date: 1-Jun-2018
  • (2017)A user parameter-free approach for mining robust sequential classification rulesKnowledge and Information Systems10.1007/s10115-016-1002-452:1(53-81)Online publication date: 1-Jul-2017
  • (2015)A PSO-AB classifier for solving sequence classification problemsApplied Soft Computing10.1016/j.asoc.2014.10.02927:C(11-27)Online publication date: 1-Feb-2015
  • (2012)KISInternational Journal of Applied Mathematics and Computer Science10.5555/3063108.306311522:3(711-721)Online publication date: 1-Sep-2012
  • (2012)Efficient Mining of Gap-Constrained Subsequences and Its Various ApplicationsACM Transactions on Knowledge Discovery from Data10.1145/2133360.21333626:1(1-39)Online publication date: 1-Mar-2012
  • (2010)A brief survey on sequence classificationACM SIGKDD Explorations Newsletter10.1145/1882471.188247812:1(40-48)Online publication date: 9-Nov-2010
  • (2009)CONTOURData Mining and Knowledge Discovery10.1007/s10618-008-0100-718:1(1-29)Online publication date: 1-Feb-2009
  • (2007)Multi-represented classification based on confidence estimationProceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining10.5555/1764441.1764449(23-34)Online publication date: 22-May-2007
  • (2007)Frequent Closed Sequence Mining without Candidate MaintenanceIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2007.104319:8(1042-1056)Online publication date: 1-Aug-2007
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media