[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Combining evidence, specificity, and proximity towards the normalization of gene ontology terms in text

Published: 01 January 2008 Publication History

Abstract

Structured information provided by manual annotation of proteins with Gene Ontology concepts represents a high-quality reliable data source for the research community. However, a limited scope of proteins is annotated due to the amount of human resources required to fully annotate each individual gene product from the literature. We introduce a novel method for automatic identification of GO terms in natural language text. The method takes into consideration several features: (1) the evidence for a GO term given by the words occurring in text, (2) the proximity between the words, and (3) the specificity of the GO terms based on their information content. The method has been evaluated on the BioCreAtIvE corpus and has been compared to current state of the art methods. The precision reached 0.34 at a recall of 0.34 for the identified terms at rank 1. In our analysis, we observe that the identification of GO terms in the "cellular component" subbranch of GO is more accurate than for terms from the other two subbranches. This observation is explained by the average number of words forming the terminology over the different subbranches.

References

[1]
M. A. Harris, J. Clark, A. Ireland, et al., "The Gene Ontology (GO) database and informatics resource," Nucleic Acids Research, vol. 32, pp. D258-D261, 2004.
[2]
V. Lee, E. Camon, E. Dimmer, D. Barrell, and R. Apweiler, "Who tangos with GOA? Use of Gene Ontology Annotation (GOA) for biological interpretation of '-omics' data and for validation of automatic annotation tools," Silico Biology, vol. 5, no. 1, pp. 5-8, 2005.
[3]
S. W. Doniger, N. Salomonis, K. D. Dahlquist, K. Vranizan, S. C. Lawlor, and B. R. Conklin, "MAPPFinder: using gene ontology and GenMAPP to create a global gene-expression profile from microarray data," Genome Biology, vol. 4, p. R7, 2003.
[4]
D. Rebholz-Schuhmann, H. Kirsch, M. Arregui, S. Gaudan, M. Riethoven, and P. Stoehr, "EBIMed--text crunching to gather facts for proteins from medline," Bioinformatics, vol. 23, no. 2, pp. e237-e244, 2007.
[5]
C. Blaschke, E. A. Leon, M. Krallinger, and A. Valencia, "Evaluation of BioCreAtIvE assessment of task 2," BMC Bioinformatics, vol. 6, supplement 1, p. S16, 2005.
[6]
P. Ruch, "Automatic assignment of biomedical categories: toward a generic approach," Bioinformatics, vol. 22, no. 6, pp. 658-664, 2006.
[7]
F. M. Couto, M. J. Silva, and P. M. Coutinho, "Finding genomic ontology terms in text using evidence content," BMC Bioinformatics, vol. 6, supplement 1, p. S21, 2005.
[8]
A. Doms and M. Schroeder, "Go : exploring with the Gene Ontology," Nucleic Acids Research, vol. 33, supplement 2, pp. W783-W786, 2005.
[9]
D. Lin, "An information-theoretic definition of similarity," in Proceedings of the 15th International Conference on Machine Learning, pp. 296-304, Madison, Wis, USA, July 1998.
[10]
E. M. Keen, "Some aspects of proximity searching in text retrieval systems," Journal of Information Science, vol. 18, no. 2, pp. 89-98, 1992.
[11]
A. Yeh, "More accurate tests for the statistical significance of result differences," in Proceedings of the 18th Conference on Computational Linguistics, vol. 2, pp. 947-953, Association for Computational Linguistics, Saarbrücken, Germany, July-August 2000.
[12]
E. Camon, M. Magrane, D. Barrell, et al., "The gene ontology annotation (goa) project: implementation of go in swissprot, trembl, and interpro," Genome Research, vol. 13, no. 4, pp. 662-672, 2003.
[13]
F. M. Couto, M. J. Silva, V. Lee, et al., "GOAnnotator: linking protein GO annotations to evidence text," Journal of Biomedical Discovery and Collaboration, vol. 1, article 19, 2006.
[14]
H. Shatkay, A. Höglund, S. Brady, T. Blum, P. Dönnes, and O. Kohlbacher, "SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data," Bioinformatics, vol. 23, no. 11, pp. 1410-1417, 2007.

Cited By

View all
  • (2015)Recognition of Patient-Related Named Entities in Noisy Tele-Health TextsACM Transactions on Intelligent Systems and Technology10.1145/26514446:4(1-23)Online publication date: 24-Jul-2015
  • (2012)Hybrid pattern matching for complex ontology term recognitionProceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine10.1145/2382936.2382973(289-296)Online publication date: 7-Oct-2012
  • (2010)Unsupervised mapping of sentences to biomedical concepts based on integrated information retrieval model and clusteringProceedings of the First ACM International Conference on Bioinformatics and Computational Biology10.1145/1854776.1854820(322-329)Online publication date: 2-Aug-2010
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image EURASIP Journal on Bioinformatics and Systems Biology
EURASIP Journal on Bioinformatics and Systems Biology  Volume 2008, Issue
January 2008
46 pages
ISSN:1687-4145
EISSN:1687-4153
Issue’s Table of Contents

Publisher

Hindawi Limited

London, United Kingdom

Publication History

Accepted: 15 February 2008
Published: 01 January 2008
Received: 21 November 2007

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2015)Recognition of Patient-Related Named Entities in Noisy Tele-Health TextsACM Transactions on Intelligent Systems and Technology10.1145/26514446:4(1-23)Online publication date: 24-Jul-2015
  • (2012)Hybrid pattern matching for complex ontology term recognitionProceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine10.1145/2382936.2382973(289-296)Online publication date: 7-Oct-2012
  • (2010)Unsupervised mapping of sentences to biomedical concepts based on integrated information retrieval model and clusteringProceedings of the First ACM International Conference on Bioinformatics and Computational Biology10.1145/1854776.1854820(322-329)Online publication date: 2-Aug-2010
  • (2010)Graph-based concept identification and disambiguation for enterprise searchProceedings of the 19th international conference on World wide web10.1145/1772690.1772709(171-180)Online publication date: 26-Apr-2010

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media