[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Named entity recognition and classification in biomedical text using classifier ensemble

Published: 01 March 2015 Publication History

Abstract

Named Entity Recognition and Classification NERC is an important task in information extraction for biomedicine domain. Biomedical Named Entities include mentions of proteins, genes, DNA, RNA, etc. which, in general, have complex structures and are difficult to recognise. In this paper, we propose a Single Objective Optimisation based classifier ensemble technique using the search capability of Genetic Algorithm GA for NERC in biomedical texts. Here, GA is used to quantify the amount of voting for each class in each classifier. We use diverse classification methods like Conditional Random Field and Support Vector Machine to build a number of models depending upon the various representations of the set of features and/or feature templates. The proposed technique is evaluated with two benchmark datasets, namely JNLPBA 2004 and GENETAG. Experiments yield the overall F-measure values of 75.97% and 95.90%, respectively. Comparisons with the existing systems show that our proposed system achieves state-of-the-art performance.

References

[1]
Anderson, T.W. and Scolve, S.L. (1978) Introduction to the Statistical Analysis of Data, Houghton Mifflin.
[2]
Ando, R.K. (2007) 'Biocreative II gene mention tagging system at IBM Watson', Proceedings of the 2nd BioCreative Challenge Evaluation Workshop, 23-25 April, Madrid, Spain, pp.101-103.
[3]
Davis, L. (Ed.) (1991) Handbook of Genetic Algorithms, Van Nostrand Reinhold, New York, NY, USA.
[4]
Ekbal, A. and Saha, S. (2011a) 'Weighted vote-based classifier ensemble for named entity recognition: a genetic algorithm-based approach', ACM Transactions on Asian Language Information Processing, Vol. 10, No. 2.
[5]
Ekbal, A. and Saha, S. (2011b) 'A multiobjective simulated annealing approach for classifier ensemble: named entity recognition in Indian languages as case studies', Expert Systems with Applications, Vol. 38, No. 12, pp.14760-14772.
[6]
Finkel, J., Dingare, S., Nguyen, H., Nissim, M., Sinclair, G. and Manning, C. (2004) 'Exploiting context for biomedical entity recognition: from syntax to the web', Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications, JNLPBA'2004, 28-29 August, Geneva, Switzerland, pp.88-91.
[7]
Goldberg, D.E. (1989) Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, New York, NY, USA.
[8]
GuoDong, Z. and Jian, S. (2004) 'Exploring deep knowledge resources in biomedical name recognition', Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, JNLPBA'04, 28-29 August, Geneva, Switzerland, pp.96-99.
[9]
Hanisch, D., Fluck, J., Mevissen, H-T. and Zimmer, R. (2003) 'Playing biology's name game: Identifying protein names in scientific text', Pacific Symposium on Biocomputing, pp.403-414.
[10]
Holland, J.H. (1975) Adaptation in Natural and Artificial Systems, The University of Michigan Press, Ann Arbor, Michigan.
[11]
Huang, H.S., Lin, Y.S., Lin, K.T., Kuo, C.J., Chang, Y.M., Yang, B.H., Chung, I.F. and Hsu, C.N. (2007) 'High-recall gene mention recognition by unification of multiple backward parsing models', Proceedings of the 2nd BioCreative Challenge Evaluation Workshop, 23-25 April, Madrid, Spain, pp.109-111.
[12]
Joachims, T. (1999) Making Large Scale SVM Learning Practical, MIT Press, Cambridge, MA, USA, pp.169-184.
[13]
Jonnalagadda, S. and Leaman, R., Cohen, T. and Gonzalez, G. (2010) 'A distributional semantics approach to simultaneous recognition of multiple classes of named entities', Proceedings of the 11th international Conference on Computational Linguistics and Intelligent Text Processing, CICLing'10, 21-27 March, Iasi, Romania, pp.224-235.
[14]
Kim, J-D., Ohta, T., Tsuruoka, Y., Tateisi, Y. and Collier, N. (2004) 'Introduction to the bioentity recognition task at JNLPBA', Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, JNLPBA'04, 28-29 August, Geneva, Switzerland, pp.70-75.
[15]
Kim, S., Yoon, J., Park, K-M. and Rim, H-C. (2005) 'Two-phase biomedical named entity recognition using a hybrid method', Proceedings of the Second international joint conference on Natural Language Processing, IJCNLP'05, 11-13 October, Jeju, Korea, pp.646-657.
[16]
Kuo, C.J., Chang, Y.M., Huang, H.S., Lin, K.T., Yang, B.H., Lin, Y.S., Hsu, C.N. and Chung, I.F. (2007) 'Rich feature set, unification of bidirectional parsing and dictionary filtering for high fscore gene mention tagging', Proceedings of the 2nd BioCreative Challenge Evaluation Workshop, 23-25 April, Madrid, Spain, pp.105-107.
[17]
Lafferty, J.D., McCallum, A. and Pereira, F.C.N. (2001) 'Conditional random fields: probabilistic models for segmenting and labeling sequence data', Proceedings of the 18th International Conference on Machine Learning ICML'01, 28 June-1 July, Williamstown, MA, USA, pp.282-289.
[18]
Li, L., Fan, W., Huang, D., Dang, Y. and Sun, J. (2012) 'Boosting performance of gene mention tagging system by hybrid methods', Journal of Biomedical Informatics, Vol. 45, No. 1, pp.156-164.
[19]
Li, L., Sun, J. and Huang, D. (2010) 'Boosting performance of gene mention tagging system by classifiers ensemble', Natural Language Processing and Knowledge Engineering (NLP-KE), 21-23 August, Beijing, China, pp.1-4.
[20]
Ohta, T., Tateisi, Y. and Kim, J. (2002) 'The GENIA corpus: an annotated research abstract corpus in molecular biology domain', Proceedings of the 2nd International Conference on Human Language Technology Research, 24-27 March, San Diego, CA, USA, pp.82-861.
[21]
Park, K-M., Kim, S-H., Rim, H-C. and Hwang, Y-S. (2004) 'Me-based biomedical named entity recognition using lexical knowledge', ACM Transactions on Asian Language Information Processing, Vol. 5, No. 1, pp.4-21.
[22]
Saha, S.K., Sarkar, S. and Mitra, P. (2009) 'Feature selection techniques for maximum entropy based biomedical named entity recognition', Journal of Biomedical Informatics, Vol. 42, No. 5, pp.905-911.
[23]
Sahlgren, M., Holst, A. and Kanerva, P. (2008) 'Permutations as a means to encode order in word space', in Sloutsky, V., Love, B. and Mcrae, K. (Eds): Proceedings of the 30th Annual Conference of the Cognitive Science Society, 23-26 July, Washington, DC, USA, pp.1300-1305.
[24]
Settles, B. (2004) 'Biomedical named entity recognition using conditional random fields and rich feature sets', Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, JNLPBA'04, 28-29 August, Geneva, Switzerland, pp.104-107.
[25]
Smith, L., Tanabe, L., Ando, R., Kuo, C.J., Chung, I.F., Hsu, C.N., Lin, Y.S., Klinger, R., Friedrich, C., Ganchev, K., Torii, M., Liu, H., Haddow, B., Struble, C., Povinelli, R., Vlachos, A., et al. (2008) 'Overview of biocreative ii gene mention recognition', Genome Biology, Vol. 9, No. 2.
[26]
Srinivas, M. and Patnaik, L.M. (1994) 'Adaptive probabilities of crossover and mutation in genetic algorithms', IEEE Transactions on Systems, Man and Cybernatics, Vol. 24, No. 4, pp.656-667.
[27]
Tsuruoka, Y. and Tsujii, J. (2003) 'Boosting precision and recall of dictionary-based protein name recognition', Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp.41-48.
[28]
Vapnik, V.N. (1995) The Nature of Statistical Learning Theory, Springer-Verlag New York, Inc., New York, NY, USA.
[29]
Wang, H., Zhao, T., Tan, H. and Zhang, S. (2008) 'Biomedical named entity recognition based on classifiers ensemble', International Journal on Computer Science and Applications, Vol. 5, pp.1-11.

Cited By

View all
  • (2019)Information theoretic-PSO-based feature selection: an application in biomedical entity extractionKnowledge and Information Systems10.1007/s10115-018-1265-z60:3(1453-1478)Online publication date: 1-Sep-2019
  • (2018)Ensemble classifier design selecting important genes based on extracted featuresInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2017.08928219:2(117-149)Online publication date: 23-Dec-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Data Mining and Bioinformatics
International Journal of Data Mining and Bioinformatics  Volume 11, Issue 4
March 2015
109 pages
ISSN:1748-5673
EISSN:1748-5681
Issue’s Table of Contents

Publisher

Inderscience Publishers

Geneva 15, Switzerland

Publication History

Published: 01 March 2015

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Information theoretic-PSO-based feature selection: an application in biomedical entity extractionKnowledge and Information Systems10.1007/s10115-018-1265-z60:3(1453-1478)Online publication date: 1-Sep-2019
  • (2018)Ensemble classifier design selecting important genes based on extracted featuresInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2017.08928219:2(117-149)Online publication date: 23-Dec-2018

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media