More Web Proxy on the site http://driver.im/

Article

Evaluation of Techniques for Classifying Biological Sequences

Authors:

Mukund Deshpande,

George KarypisAuthors Info & Claims

PAKDD '02: Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining

Pages 417 - 431

Published: 06 May 2002 Publication History

Abstract

In recent years we have witnessed an exponential increase in the amount of biological information, either DNA or protein sequences, that has become available in public databases. This has been followed by an increased interest in developingcomp utational techniques to automatically classify these large volumes of sequence data into various categories corresponding to either their role in the chromosomes, their structure, and/or their function. In this paper we evaluate some of the widely-used sequence classification algorithms and develop a framework for modeling sequences in a fashion so that traditional machine learning algorithms, such as support vector machines, can be applied easily. Our detailed experimental evaluation shows that the SVM-based approaches are able to achieve higher classification accuracy compared to the more traditional sequence classification algorithms such as Markov model based techniques and K -nearest neighbor based approaches.

References

[1]

A. Bairoch and R. Apweiler. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Res. , 27 (1):49- 54, 1999.

[2]

Dennis A. Benson, Mark S. Boguski, David J. Lipman, James Ostell, B. F. Francis Ouellette, BArabra A. Rapp, and David L. Wheeler. Gen-Bank. Nucleic Acids Research , 27(1):12-17, 1999.

[3]

W. C. Barker, J. S. Garavelli, D. H. Haft, L. T. Hunt, C. R. Marzec, B. C. Orcutt, G. Y. Srinivasarao, L.S. L. Yeh, R. S. Ledley, H.W. Mewes, F. Pfeiffer, and A. Tsugita. The PIR-International protein sequence database. Nucleic Acids Res. , 27(1):27-32, 1999.

[4]

Richard Durbin, Sean Eddy, Anders Krogh, and Graeme Mitchinson. Biological sequence analysis . Cambridge University Press, 1998.

[5]

Ritu Dhand. Nature Insight: Functional Genomics , volume 405. 2000.

[6]

A. L. Delcher, D. Harmon, S. Kasif, O. White, and S. L. Salzberg. Improved microbial gene identification with glimmer. Nucleic Acid Research , 27(23):4436-4641, 1998.

[7]

M. Deshpande and G. Karypis. Selective markov models for predicting web-page accesses. In First International SIAM Conference on Data Mining , 2001.

[8]

Mukund Deshpande and George Karypis. Evaluation of techniques for classifyingb iological sequence. Technical Report TR-01-033, University of Minnesota, 2001.

[9]

Dan Gusfield. Algorithms on Strings, Trees, and Sequences . Cambridge University Press, 1997.

[10]

Michihiro Kuramochi, Mukund Deshpand, George Karypis, Qing Zhang, and Vivek Kapur. Promoter prediction for prokaryotes. In Passific Symposium on Bioinformatics (submitted) , 2001. Also available as a UMN-CS technical report, TR# 01-030.

[11]

Daniel Kudenko and Haym Hirsh. Feature generation for sequence categorization. In In proceedings of AAAI-98 , 1998.

[12]

Neal Lesh, Mohammed J. Zaki, and Mitsunari Ogihara. Mining features for sequence classification. In 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) , 1999.

[13]

T.M. Mitchell. Machine Learning . WCB/McGraw-Hill, 1997.

[14]

David W. Mount. Bioinformatics: Sequence and Genome Analysis . CSHL Press, 2001.

[15]

Steven L Salzberg, Arthur L. Delcher, Simon Kasif, and Owen White. Microbial gene identification using interpolated markov models. Nucleic Acids Research , 1998.

[16]

V. Vapnik. Statistical Learning Theory . John Wiley, New York, 1998.

[17]

K Wang, S. Zhou, and Y. He. Growing decision trees on supportless assoication rules. In Proceedings of SIGKDD 2000 , 2000.

[18]

Mohamed J. Zaki, Neal Lesh, and Ogihara Mitsunari. Planmine: Predicting plan failures usings equence mining. Intelligence Review, special issue on the Application of Data Mining , 2000.

Cited By

Zhang SLiu YMeng WLuo ZBu JYang SLiang PPei DXu JZhang YChen YDong HQu XSong L(2018)PreFixProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/31794052:1(1-29)Online publication date: 3-Apr-2018
https://dl.acm.org/doi/10.1145/3179405
Zhu JWu XXiao JHuang CTang YDeng K(2018)Improved expert selection model for forex tradingFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-6472-312:3(518-527)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1007/s11704-017-6472-3
Egho EGay DBoullé MVoisine NClérot F(2017)A user parameter-free approach for mining robust sequential classification rulesKnowledge and Information Systems10.1007/s10115-016-1002-452:1(53-81)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1007/s10115-016-1002-4
Show More Cited By

Evaluation of Techniques for Classifying Biological Sequences

Recommendations

Classifying protein sequences using hydropathy blocks

The annotation of proteins can be achieved by classifying the protein of interest into a certain known protein family to induce its functional and structural features. This paper presents a new method for classifying protein sequences based upon the ...
Identification and application of repetitive biological sequences
The relation between preset distinguishing sequences and synchronizing sequences
Abstract
We study the relation between synchronizing sequences and preset distinguishing sequences which are some special sequences used in finite state machine based testing. We show that the problems related to preset distinguishing sequences can be ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

PAKDD '02: Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining

May 2002

566 pages

ISBN:3540437045

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 06 May 2002

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang SLiu YMeng WLuo ZBu JYang SLiang PPei DXu JZhang YChen YDong HQu XSong L(2018)PreFixProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/31794052:1(1-29)Online publication date: 3-Apr-2018
https://dl.acm.org/doi/10.1145/3179405
Zhu JWu XXiao JHuang CTang YDeng K(2018)Improved expert selection model for forex tradingFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-6472-312:3(518-527)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1007/s11704-017-6472-3
Egho EGay DBoullé MVoisine NClérot F(2017)A user parameter-free approach for mining robust sequential classification rulesKnowledge and Information Systems10.1007/s10115-016-1002-452:1(53-81)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1007/s10115-016-1002-4
Tsai CChen C(2015)A PSO-AB classifier for solving sequence classification problemsApplied Soft Computing10.1016/j.asoc.2014.10.02927:C(11-27)Online publication date: 1-Feb-2015
https://dl.acm.org/doi/10.1016/j.asoc.2014.10.029
Biedrzycki RArabas J(2012)KISInternational Journal of Applied Mathematics and Computer Science10.5555/3063108.306311522:3(711-721)Online publication date: 1-Sep-2012
https://dl.acm.org/doi/10.5555/3063108.3063115
Li CYang QWang JLi M(2012)Efficient Mining of Gap-Constrained Subsequences and Its Various ApplicationsACM Transactions on Knowledge Discovery from Data10.1145/2133360.21333626:1(1-39)Online publication date: 1-Mar-2012
https://dl.acm.org/doi/10.1145/2133360.2133362
Xing ZPei JKeogh E(2010)A brief survey on sequence classificationACM SIGKDD Explorations Newsletter10.1145/1882471.188247812:1(40-48)Online publication date: 9-Nov-2010
https://dl.acm.org/doi/10.1145/1882471.1882478
Wang JZhang YZhou LKarypis GAggarwal C(2009)CONTOURData Mining and Knowledge Discovery10.1007/s10618-008-0100-718:1(1-29)Online publication date: 1-Feb-2009
https://dl.acm.org/doi/10.1007/s10618-008-0100-7
Aßfalg JKriegel HPryakhin ASchubert M(2007)Multi-represented classification based on confidence estimationProceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining10.5555/1764441.1764449(23-34)Online publication date: 22-May-2007
https://dl.acm.org/doi/10.5555/1764441.1764449
Wang JHan JLi C(2007)Frequent Closed Sequence Mining without Candidate MaintenanceIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2007.104319:8(1042-1056)Online publication date: 1-Aug-2007
https://dl.acm.org/doi/10.1109/TKDE.2007.1043
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents