More Web Proxy on the site http://driver.im/

research-article

Weighted Vote-Based Classifier Ensemble for Named Entity Recognition: A Genetic Algorithm-Based Approach

Authors:

Sriparna SahaAuthors Info & Claims

ACM Transactions on Asian Language Information Processing (TALIP), Volume 10, Issue 2

Article No.: 9, Pages 1 - 37

https://doi.org/10.1145/1967293.1967296

Published: 01 June 2011 Publication History

Abstract

In this article, we report the search capability of Genetic Algorithm (GA) to construct a weighted vote-based classifier ensemble for Named Entity Recognition (NER). Our underlying assumption is that the reliability of predictions of each classifier differs among the various named entity (NE) classes. Thus, it is necessary to quantify the amount of voting of a particular classifier for a particular output class. Here, an attempt is made to determine the appropriate weights of voting for each class in each classifier using GA. The proposed technique is evaluated for four leading Indian languages, namely Bengali, Hindi, Telugu, and Oriya, which are all resource-poor in nature. Evaluation results yield the recall, precision and F-measure values of 92.08%, 92.22%, and 92.15%, respectively for Bengali; 96.07%, 88.63%, and 92.20%, respectively for Hindi; 78.82%, 91.26%, and 84.59%, respectively for Telugu; and 88.56%, 89.98%, and 89.26%, respectively for Oriya. Finally, we evaluate our proposed approach with the benchmark dataset of CoNLL-2003 shared task that yields the overall recall, precision, and F-measure values of 88.72%, 88.64%, and 88.68%, respectively. Results also show that the vote based classifier ensemble identified by the GA-based approach outperforms all the individual classifiers, three conventional baseline ensembles, and some other existing ensemble techniques. In a part of the article, we formulate the problem of feature selection in any classifier under the single objective optimization framework and show that our proposed classifier ensemble attains superior performance to it.

References

[1]

Alfonseca, E. and Manandhar, S. 1999. An unsupervised method for general named entity recognition and automated concept discovery. In Proceedings of the 16th National Conference on Artificial Intelligence and the Eleventh Conference on Innovative Applications of Artificial Intelligence (AAAI’99/IAAI’99). 474--479.

[2]

Anderson, T. W. and Scolve, S. 1978. Introduction to the Statistical Analysis of Data. Houghton Mifflin.

[3]

Aone, C., Halverson, L., Hampton, T., and Ramos-Santacruz, M. 1998. SRA: Description of the IE2 system used for MUC-7. In Proceedings of the Message Understanding Conference (MUC’98).

[4]

Babych, B. and Hartley, A. 2003. Improving machine translation quality with automatic named entity recognition. In Proceedings of the Conference on the European Chapter of the Association for Computational Linguistics Workshop on Machine Translation and Other Language Technology Tools (EACL’03). 1--8.

Digital Library

[5]

Bennet, S. W., Aone, C., and Lovell, C. 1997. Learning to tag multilingual texts through observation. In Proceedings of Empirical Methods of Natural Language Processing (EMNLP’97). 109--116.

[6]

Bikel, D. M., Schwartz, R. L., and Weischedel, R. M. 1999. An algorithm that learns what’s in a name. Mach. Learn. 34, 1-3, 211--231.

Digital Library

[7]

Borthwick, A. 1999. Maximum entropy approach to named entity recognition. Ph.D. thesis, New York University.

Digital Library

[8]

Borthwick, A., Sterling, J., Agichtein, E., and Grishman, R. 1998. NYU: Description of the MENE named entity system as used in MUC-7. In Proceedings of the Machine Understanding Conference (MUC’98).

[9]

Breiman, L. 1996. Bagging predictors. Mach. Learn. 24, 2, 123--140.

Digital Library

[10]

Carrears, X., Marquez, L., and Padro, L. 2002. Named entity recognition using AdaBoost. In Proceedings of the Conference on Natural Language Learning (CoNLL’02). 167--170.

[11]

Cherkauer, K. 1996. Human expert-level performance on a scientific image analysis task by a system using combined artificial neural networks. In Working Notes of the AAAI Workshop on Integrating Multiple Learned Models (AAAI’96). 15--21.

[12]

Chieu, H. L. and Ng, H. T. 2003. Named entity recognition with a maximum entropy approach. In Proceedings of the Conference on Natural Language Learning (CoNLL’03). 160--163.

Digital Library

[13]

Collins, M. and Singer, Y. 1999. Unsupervised models for named entity classification. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP’99).

[14]

Darroch, J. and Ratcliff, D. 1972. Generalized iterative scaling for log-linear models. Ann. Math. Statist. 43. 1470--1480.

[15]

Dietterich, T. G. 2002. Ensemble methods in machine learning. In Proceedings of the 1st International Workshop in Multiple Classifiers Systems. J. Kittler and F. Roli Eds., Springer.

Digital Library

[16]

Dietterich, T. G. and Bakiri, G. 1995. Solving multiclass learning problems via error correcting output codes. J. Artific. Intell. Res. 2, 263--286.

Digital Library

[17]

Ekbal, A. and Bandyopadhyay, S. 2007. Lexical pattern learning from corpus data for named entity recognition. In Proceedings of the 5th International Conference on Natural Language Processing (ICON’07). 123--128.

[18]

Ekbal, A. and Bandyopadhyay, S. 2008a. Bengali named entity recognition using support vector machine. In Proceedings of the Workshop on Named Entity Recognition for South and South East Asian Languages, 3rd International Joint Conference on Natural Languge Processing (NER-IJCNLP’08). 51--58.

[19]

Ekbal, A. and Bandyopadhyay, S. 2008b. Web-based Bengali news corpus for lexicon development and POS tagging. POLIBITS, 37, 20--29.

[20]

Ekbal, A. and Bandyopadhyay, S. 2008c. A Web-based Bengali news corpus for named entity recognition. Lang. Resour. Eval. 42, 2, 173--182.

[21]

Ekbal, A. and Bandyopadhyay, S. 2009. Voted NER system using appropriate unlabeled data. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS’09). 202--210.

Digital Library

[22]

Ekbal, A., Naskar, S., and Bandyopadhyay, S. 2007. Named entity recognition and transliteration in Bengali. Lingvisticae Investigationes J. 30, 1 (Named Entities: Recognition, Classification and Use Special Issue), 95--114.

[23]

Ekbal, A., Haque, R., and Bandyopadhyay, S. 2008. Named entity recognition in Bengali: A conditional random field approach. In Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP’08). 589--594.

[24]

Ekbal, A. and Saha, S. 2010. Weighted vote-based classifier ensemble selection using genetic algorithm for named entity recognition. In Proceedings of the Conference on Natural Languages in Databases (NLDB’10). 256--267.

Digital Library

[25]

Etzioni, O., Cafarrella, M., Downey, D., Popescu, A. M., Shaked, T., Soderland, S., Weld, D. S., and Yates, A. 2005. Unsupervised named entity extraction from the Web: An experimental study. Artific. Intell. 165. 91--134.

Digital Library

[26]

Florian, R., Ittycheriah, A., Jing, H., and Zhang, T. 2003. Named entity recognition through classifier combination. In Proceedings of the 7th Conference on Natural Language Learning (HLT-NAACL’03).

Digital Library

[27]

Freund, Y. and Schapire, R. 1995a. A decision-theoretic generalization of online learning and an application to boosting. In Proceedings of the 2nd European Conference on Computational Learning Theory (ECCL’95). 23--37.

Digital Library

[28]

Freund, Y. and Schapire, R. E. 1995b. A decision-theoretic generalization of online learning and an application to boosting. In Proceedings of the 2nd European Conference on Computational Learning Theory (ECCL’95). 23--37.

Digital Library

[29]

Goldberg, D. E. 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, New York.

Digital Library

[30]

Holland, J. H. 1975. Adaptation in Natural and Artificial Systems. The University of Michigan Press: AnnArbor.

[31]

Humphreys, K., Gaizauskas, R., Azzam, S., Huyck, C., Mitchell, B., Cunnigham, H., and Wilks, Y. 1998. University of Sheffield: Description of the LaSIE-II System as Used for MUC-7. In Proceedings of the Message Understanding Conference (MUC’98).

[32]

Joachims, T. 1999. Making Large Scale SVM Learning Practical. MIT Press: Cambridge, MA, 169--184.

[33]

Klein, D., Smarr, H. N., and Manning, D. 2003. Named entity recognition with character-level models. In Proceedings of the Conference on Natural Language Learning (CoNLL’03). 188--191.

Digital Library

[34]

Kolen, J. F. and Pollack, J. B. 1991. Back propagation is sensitive to initial conditions. Adv. Neural Inf. Proc. Syst. 860--867.

Digital Library

[35]

Lafferty, J. D., McCallum, A., and Pereira, F. C. N. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the International Conference on Machine Learning (ICML’01). 282--289.

Digital Library

[36]

Li, W. and McCallum, A. 2004. Rapid development of Hindi named entity recognition using conditional random fields and feature induction. ACM Trans. on Asian Lang. Inform. Process. 2, 3, 290--294.

Digital Library

[37]

Lin, D. and Wu, X. 2009. Phrase clustering for discriminative learning. In Proceedings of 47th Annual Meeting of the Association for Computational Learning (ACL’09). 1030--1038.

Digital Library

[38]

Mandl, T. and Womser-Hacker, C. 2005. The effect of named entities on effectiveness in cross-language information retrieval evaluation. In Proceedings of the ACM Symposium on Applied Computing (SAC’05). 1059--1064.

Digital Library

[39]

McCallum, A. and Li, W. 2003. Early results for named entity recognition with conditional random fields, feature induction, and Web-enhanced lexicons. In Proceedings of the Conference on Natural Language Learning (CoNLL’03). 188--191.

Digital Library

[40]

Mikheev, A., Grover, C., and Moens, M. 1998. Description of the LTG system used for MUC-7. In Proceedings of the Message Understanding Conference (MUC’98).

[41]

Mikheev, A., Grover, C., and Moens, M. 1999. Named Entity Recognition without Gazeteers. In Proceedings of the Conference on the European Chapter of the Association for Computational Linguistics (EACL’03). 1--8.

Digital Library

[42]

Miller, S., Crystal, M., Fox, H., Ramshaw, L., Schawartz, R., Stone, R., Weischedel, R., and the Annotation Group. 1998. BBN: Description of the SIFT System as Used for MUC-7. In Proceedings of the Message Understanding Conference (MUC’98).

[43]

Nobata, C., Sekine, S., Isahara, H., and Grishman, R. 2002. Summarization system integrated with named entity tagging and IE pattern discovery. In Proceedings of 3rd International Conference on Language Resources and Evaluation (LREC’02).

[44]

Pasca, M., Lin, D., Bigham, J., Lifchits, A., and Jain, A. 2006. Organizing and searching the World Wide Web of facts - Step one: The one-million fact extraction challenge. In Proceedings of National Conference on Artificial Intelligence (AAAI’06).

Digital Library

[45]

Patel, A., Ramakrishnan, G., and Bhattacharya, P. 2009. Relational learning assisted construction of rule base for Indian language NER. In Proceedings of the 7th International Conference on Natural Language Processing (ICON’09).

[46]

Pizzato, L. A., Molla, D., and Paris, C. 2006. Pseudo relevance feedback using named entities for question answering. In Proceedings of the Australian Language Technology Workshop (ALTW’06). 89--90.

[47]

Riloff, E. and Jones, R. 1999. Learning dictionaries for information extraction by multi-level bootstrapping. In Proceedings of the 16th National Conference on Artificial Intelligence (AAAI’99). 474--479.

Digital Library

[48]

Saha, S., Sarkar, S., and Mitra, P. 2008. A hybrid feature set based maximum entropy Hindi named entity recognition. In Proceedings of the 3rd International Joint Conference in Natural Langauge Processing (IJCNLP’08). 343--350.

[49]

Sekine, S. 1998. Description of the Japanese NE system used for MET-2. In Proceedings of the Message Understanding Conference (MUC’98).

[50]

Seung, H. S., Opper, M., and Sompolinsky, H. 1992. Query by committee. In Proceedings of the ACM Workshop on Computational Learning Theory (CLT’92).

Digital Library

[51]

Sha, F. and Pereira, F. 2003. Shallow parsing with conditional random fields. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’03). 134--141.

Digital Library

[52]

Shinyama, Y. and Sekine, S. 2004. Named entity discovery using comparable news articles. In Proceedings of the International Conference on Computational Linguistics (COLING’04). 848--855.

Digital Library

[53]

Shishtla, P. M., Pingali, P., and Varma, V. 2008. A character n-gram based approach for improved recall in Indian language NER. In Proceedings of the Workshop on Named Entity Recognition for South and South East Asian Languages (IJCNLP’08). 101--108.

[54]

Srihari, R., Niu, C., and Li, W. 2002. A hybrid approach for named entity and sub-type tagging. In Proceedings of 6th Conference on Applied Natural Language Processing (ANLP’02). 247--254.

Digital Library

[55]

Srikanth, P. and Murthy, K. N. 2008. Named entity recognition for Telugu. In Proceedings of the Workshop on Named Entity Recognition for South and South East Asian Languages (IJCNLP’08). 41--50.

[56]

Srinivas, M. and Patnaik, L. M. 1994. Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans. Syst. Man Cybern. 24, 4, 656--667.

[57]

Suzuki, J. and Isozaki, H. 2008. Semi-supervised sequential labeling and segmentation using gigaword scale unlabeled data. In Proceedings of the Human Language Technology Conference (ACL/HLT’08). 665--673.

[58]

Taira, H. and Haruno, M. 1999. Feature selection in SVM text categorization. In Proceedings of National Conference on Artificial Intelligence (AAAI’99).

Digital Library

[59]

Tjong Kim Sang, E. F. and De Meulder, F. 2003. Introduction to the shared task: Language independent named entity recognition. In Proceedings of the 7th Conference on Natural Language Learning (HLT-NAACL’03). 142--147.

Digital Library

[60]

Vapnik, V. N. 1995. The Nature of Statistical Learning Theory. Springer-Verlag Berlin, Germany.

Digital Library

[61]

Vijayakrishna, R. and Sobha, L. 2008. Domain focused named entity recognizer for Tamil using conditional random fields. In Proceedings of the Workshop on Named Entity Recognition for South and South East Asian Languages (IJCNLP’08). 93--100.

[62]

Wolpert, D. 1992. Stacked generalization. Neural Netw. 5, 241--259.

Digital Library

[63]

Wu, D., Ngai, G., and Carput, M. 2003. A stacked, voted, stacked model for named entity recognition. In Proceedings of the Conference on Natural Language Learning (CoNLL’03).

Digital Library

[64]

Yangarber, R., Lin, W., and Grishman, R. 2002. Unsupervised learning of generalized names. In Proceedings of the 19th International Conference on Computational Linguistics (COLING’02). 1--7.

Digital Library

[65]

Yu, X. 2007. Chinese named entity recognition with cascaded hybrid model. In Proceedings of Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (NAACL-HLT’07). 197--200.

Digital Library

Cited By

Balasundaram AShaik AAlroy BSingh AShivaprakash S(2024)Genetic Algorithm Optimized Stacking Approach to Skin Disease DetectionIEEE Access10.1109/ACCESS.2024.341279112(88950-88962)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3412791
Moghaddasi FMoghaddasi MGhaleni MYaseen Z(2024)Fusion-based approach for hydrometeorological drought modeling: a regional investigation for IranEnvironmental Science and Pollution Research10.1007/s11356-024-32598-231:17(25637-25658)Online publication date: 13-Mar-2024
https://doi.org/10.1007/s11356-024-32598-2
Liang SBaker D(2023)Real-time Background Subtraction under Varying Lighting Conditions2023 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA48891.2023.10160223(9317-9323)Online publication date: 29-May-2023
https://doi.org/10.1109/ICRA48891.2023.10160223
Show More Cited By

Index Terms

Weighted Vote-Based Classifier Ensemble for Named Entity Recognition: A Genetic Algorithm-Based Approach
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Multiobjective optimization for classifier ensemble and feature selection: an application to named entity recognition

In this paper, the concept of finding an appropriate classifier ensemble for named entity recognition is posed as a multiobjective optimization (MOO) problem. Our underlying assumption is that instead of searching for the best-fitting feature set for a ...
Classifier Ensemble Selection Using Genetic Algorithm for Named Entity Recognition

In this paper, we propose a classifier ensemble technique based on genetic algorithm (GA) for named entity recognition (NER). We assume that the classifiers based on different feature representations can be effectively combined together using GA to ...
Building Locally Discriminative Classifier Ensemble Through Classifier Fusion Among Nearest Neighbors
PCM 2016: 17th Pacific-Rim Conference on Advances in Multimedia Information Processing - Volume 9916

Many studies on ensemble learning that combines multiple classifiers have shown that, it is an effective technique to improve accuracy and stability of a single classifier. In this paper, we propose a novel discriminative classifier fusion method, which ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian Language Information Processing

ACM Transactions on Asian Language Information Processing Volume 10, Issue 2

June 2011

111 pages

ISSN:1530-0226

EISSN:1558-3430

DOI:10.1145/1967293

Issue’s Table of Contents

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2011

Accepted: 01 January 2011

Revised: 01 January 2011

Received: 01 May 2010

Published in TALIP Volume 10, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
888
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)5

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Balasundaram AShaik AAlroy BSingh AShivaprakash S(2024)Genetic Algorithm Optimized Stacking Approach to Skin Disease DetectionIEEE Access10.1109/ACCESS.2024.341279112(88950-88962)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3412791
Moghaddasi FMoghaddasi MGhaleni MYaseen Z(2024)Fusion-based approach for hydrometeorological drought modeling: a regional investigation for IranEnvironmental Science and Pollution Research10.1007/s11356-024-32598-231:17(25637-25658)Online publication date: 13-Mar-2024
https://doi.org/10.1007/s11356-024-32598-2
Liang SBaker D(2023)Real-time Background Subtraction under Varying Lighting Conditions2023 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA48891.2023.10160223(9317-9323)Online publication date: 29-May-2023
https://doi.org/10.1109/ICRA48891.2023.10160223
Pan MXu ZYuan JLi Y(2023)Research on Abnormal Risk Identification of Safe Power Consumption for Active Customers Based on Cluster Analysis2023 3rd International Conference on Electrical Engineering and Control Science (IC2ECS)10.1109/IC2ECS60824.2023.10493263(961-964)Online publication date: 29-Dec-2023
https://doi.org/10.1109/IC2ECS60824.2023.10493263
Arya NSaha S(2023)Deviation-support based fuzzy ensemble of multi-modal deep learning classifiers for breast cancer prognosis predictionScientific Reports10.1038/s41598-023-47543-513:1Online publication date: 3-Dec-2023
https://doi.org/10.1038/s41598-023-47543-5
Mohawesh RXu SSpringer MJararweh YAl-Hawawreh MMaqsood S(2023)An explainable ensemble of multi-view deep learning model for fake review detectionJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.10164435:8(101644)Online publication date: Sep-2023
https://doi.org/10.1016/j.jksuci.2023.101644
Oh SPark JYang JOh YYi K(2023)Smart classification method to detect irregular nozzle spray patterns inside carbon black reactor using ensemble transfer learningJournal of Intelligent Manufacturing10.1007/s10845-022-01951-y34:6(2729-2745)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1007/s10845-022-01951-y
Komariah KSin B(2022)Enhancing Food Ingredient Named-Entity Recognition with Recurrent Network-Based Ensemble (RNE) ModelApplied Sciences10.3390/app12201031012:20(10310)Online publication date: 13-Oct-2022
https://doi.org/10.3390/app122010310
Akhtar MGhosal DEkbal ABhattacharyya PKurohashi S(2022)All-in-One: Emotion, Sentiment and Intensity Prediction Using a Multi-Task Ensemble FrameworkIEEE Transactions on Affective Computing10.1109/TAFFC.2019.292672413:1(285-297)Online publication date: 1-Jan-2022
https://doi.org/10.1109/TAFFC.2019.2926724
Ullah FZeeshan MUllah IAlam MAl-Absi A(2022)Towards Urdu Name Entity Recognition Using Bi-LSTM-CRF with Self-attentionProceedings of 2nd International Conference on Smart Computing and Cyber Security10.1007/978-981-16-9480-6_38(403-407)Online publication date: 27-May-2022
https://doi.org/10.1007/978-981-16-9480-6_38
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents