[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2739480.2754706acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Evolutionary Learning of Syntax Patterns for Genic Interaction Extraction

Published: 11 July 2015 Publication History

Abstract

There is an increasing interest in the development of techniques for automatic relation extraction from unstructured text. The biomedical domain, in particular, is a sector that may greatly benefit from those techniques due to the huge and ever increasing amount of scientific publications describing observed phenomena of potential clinical interest. In this paper, we consider the problem of automatically identifying sentences that contain interactions between genes and proteins, based solely on a dictionary of genes and proteins and a small set of sample sentences in natural language. We propose an evolutionary technique for learning a classifier that is capable of detecting the desired sentences within scientific publications with high accuracy. The key feature of our proposal, that is internally based on Genetic Programming, is the construction of a model of the relevant syntax patterns in terms of standard part-of-speech annotations. The model consists of a set of regular expressions that are learned automatically despite the large alphabet size involved. We assess our approach on two realistic datasets and obtain 74% accuracy, a value sufficiently high to be of practical interest and that is in line with significant baseline methods.

References

[1]
L. Araujo. Symbiosis of evolutionary techniques and statistical natural language processing. Trans. Evol. Comp, 8(1):14--27, February 2004.
[2]
Lourdes Araujo. How evolutionary algorithms are applied to statistical natural language processing. Artif. Intell. Rev., 28(4):275--303, December 2007.
[3]
Alberto Bartoli, Giorgio Davanzo, Andrea De Lorenzo, Marco Mauri, Eric Medvet, and Enrico Sorio. Automatic generation of regular expressions from examples with genetic programming. In International Conference on Genetic and evolutionary computation (GECCO), pages 1477--1478. ACM, 2012.
[4]
Alberto Bartoli, Giorgio Davanzo, Andrea De Lorenzo, Eric Medvet, and Enrico Sorio. Automatic synthesis of regular expressions from examples. Computer, 47:72--80, 2014.
[5]
Alberto Bartoli, Andrea De Lorenzo, Eric Medvet, and Fabiano Tarlao. Learning text patterns using separate-and-conquer genetic programming. In 18-th European Conference on Genetic Programming (EuroGP), 2015.
[6]
Josh Bongard and Hod Lipson. Active coevolutionary learning of deterministic finite automata. The Journal of Machine Learning Research, 6:1651--1678, 2005.
[7]
Jakramate Bootkrajang, Sun Kim, and Byoung-Tak Zhang. Evolutionary hypernetwork classifiers for protein-proteininteraction sentence filtering. In Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO '09, pages 185--192, New York, NY, USA, 2009. ACM.
[8]
Markus Bundschus, Mathaeus Dejori, Martin Stetter, Volker Tresp, and Hans-Peter Kriegel. Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics, 9(1):207, 2008.
[9]
Edmund K Burke, Steven Gustafson, and Graham Kendall. Diversity in genetic programming: An analysis of measures and correlation with fitness. Evolutionary Computation, IEEE Transactions on, 8(1):47--62, 2004.
[10]
Orlando Cicchello and Stefan C Kremer. Inducing grammars from sparse data sets: a survey of algorithms and results. The Journal of Machine Learning Research, 4:603--632, 2003.
[11]
Edwin D De Jong and Jordan B Pollack. Multi-objective methods for tree size control. Genetic Programming and Evolvable Machines, 4(3):211--233, 2003.
[12]
Colin De La Higuera. A bibliographical study of grammatical inference. Pattern recognition, 38(9):1332--1348, 2005.
[13]
Guy De Pauw. Evolutionary computing as a tool for grammar development. In Proceedings of the 2003 International Conference on Genetic and Evolutionary Computation: PartI, GECCO'03, pages 549--560, Berlin, Heidelberg, 2003. Springer-Verlag.
[14]
Bart Decadt, Bart Decadt, Véronique Hoste, Walter Daelemans, and Antal Van Den Bosch. Gambl, genetic algorithm optimization of memory-based wsd. In In Proceedings of ACL/SIGLEX Senseval-3, pages 108--112, 2004.
[15]
Arianna D'Ulizia, Fernando Ferri, and Patrizia Grifoni. A survey of grammatical inference methods for natural language learning. Artificial Intelligence Review, 36(1):1--27, 2011.
[16]
Pedro G Espejo, Sebastián Ventura, and Francisco Herrera. A survey on the application of genetic programming to classification. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 40(2):121--144, 2010.
[17]
Katrin Fundel, Robert Küffner, and Ralf Zimmer. Relex--relation extraction using dependency parse trees. Bioinformatics, 23(3):365--371, 2007.
[18]
Sonal Gupta, Diana L MacLean, Jeffrey Heer, and Christopher D Manning. Induced lexico-syntactic patterns improve information extraction from online medical forums. Journal of the American Medical Informatics Association, 21(5):902--909, 2014.
[19]
Jörg Hakenberg, Conrad Plake, Ulf Leser, Harald Kirsch, and Dietrich Rebholz-Schuhmann. LLL'05 challenge: Genic interaction extraction-identification of language patterns based on alignment and finite state automata. In Proceedings of the 4th Learning Language in Logic workshop (LLL05), pages 38--45, 2005.
[20]
Laurence Hirsch, Robin Hirsch, and Masoud Saeedi. Evolving lucene search queries for text classification. In Proceedings of the 9th annual conference on Genetic and evolutionary computation, pages 1604--1611. ACM, 2007.
[21]
Evan J Hughes. Evolutionary many-objective optimisation: many once or one many? In Evolutionary Computation, 2005. The 2005 IEEE Congress on, volume 1, pages 222--227. IEEE, 2005.
[22]
Seth Kulick, Ann Bies, Mark Liberman, Mark Mandel, Ryan McDonald, Martha Palmer, Andrew Schein, Lyle Ungar, Scott Winters, and Pete White. Integrated annotation for biomedical information extraction. In Proc. of HLT/NAACL, pages 61--68, 2004.
[23]
Kevin J. Lang, Barak A. Pearlmutter, and Rodney A. Price. Results of the abbadingo one DFA learning competition and a new evidence-driven state merging algorithm. In Grammatical Inference, page 1--12. Springer, 1998.
[24]
W.B. Langdon and M. Harman. Optimizing existing software with genetic programming. Evolutionary Computation, IEEE Transactions on, 19(1):118--135, Feb 2015.
[25]
Jiexun Li, Zhu Zhang, Xin Li, and Hsinchun Chen. Kernel-based learning for biomedical relation extraction. Journal of the American Society for Information Science and Technology, 59(5):756--769, 2008.
[26]
Marina Litvak, Mark Last, and Menahem Friedman. A new approach to improving multilingual summarization using a genetic algorithm. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, pages 927--936, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics.
[27]
Simon M Lucas and T Jeff Reynolds. Learning deterministic finite automata with a smart state labeling evolutionary algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7):1063--1074, 2005.
[28]
Hoifung Poon, Chris Quirk, Charlie DeZiel, and David Heckerman. Literome: -scale genomic knowledge base in the cloud. Bioinformatics, 2014.
[29]
Luis Rodríguez, Ismael García-Varea, and José A. Gámez. On the application of different evolutionary algorithms to the alignment problem in statistical machine translation. Neurocomput., 71(4--6):755--765, January 2008.
[30]
Fabrizio Sebastiani. Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1):1--47, 2002.
[31]
J. Ignacio Serrano. Evolutionary algorithm for noun phrase detection in natural language processing. In Proceedings of the 2005 IEEE Congress on Evolutionary Computing (IEEE Computer Society, 2005.
[32]
Martin Volk, B\"arbel Ripplinger,vSpela Vintar, Paul Buitelaar, Diana Raileanu, and Bogdan Sacaleanu. Semantic annotation for concept-based cross-language medical information retrieval. International Journal of Medical Informatics, 67(1):97--112, 2002.
[33]
Akane Yakushiji, Yusuke Miyao, Tomoko Ohta, Yuka Tateisi, and Jun'ichi Tsujii. Automatic construction of predicate-argument structure patterns for biomedical information extraction. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 284--292. Association for Computational Linguistics, 2006.
[34]
Lin Yao, Cheng-Jie Sun, Xiao-Long Wang, and Xuan Wang. Relationship extraction from biomedical literature using maximum entropy based on rich features. In Machine Learning and Cybernetics (ICMLC), 2010 International Conference on, volume 6, pages 3358--3361, July 2010.

Cited By

View all
  • (2023)ARTE: Automated Generation of Realistic Test Inputs for Web APIsIEEE Transactions on Software Engineering10.1109/TSE.2022.315061849:1(348-363)Online publication date: 1-Jan-2023
  • (2021)Automated generation of realistic test inputs for web APIsProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3473491(1666-1668)Online publication date: 20-Aug-2021
  • (2020)Towards an evolutionary-based approach for natural language processingProceedings of the 2020 Genetic and Evolutionary Computation Conference10.1145/3377930.3390248(985-993)Online publication date: 25-Jun-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '15: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation
July 2015
1496 pages
ISBN:9781450334723
DOI:10.1145/2739480
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. genetic programming
  2. machine learning
  3. programming by example
  4. regular expressions

Qualifiers

  • Research-article

Conference

GECCO '15
Sponsor:

Acceptance Rates

GECCO '15 Paper Acceptance Rate 182 of 505 submissions, 36%;
Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)ARTE: Automated Generation of Realistic Test Inputs for Web APIsIEEE Transactions on Software Engineering10.1109/TSE.2022.315061849:1(348-363)Online publication date: 1-Jan-2023
  • (2021)Automated generation of realistic test inputs for web APIsProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3473491(1666-1668)Online publication date: 20-Aug-2021
  • (2020)Towards an evolutionary-based approach for natural language processingProceedings of the 2020 Genetic and Evolutionary Computation Conference10.1145/3377930.3390248(985-993)Online publication date: 25-Jun-2020
  • (2019)Genetic programming for natural language processingGenetic Programming and Evolvable Machines10.1007/s10710-019-09361-5Online publication date: 23-Jul-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media