[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

An IR-Aided Machine Learning Framework for the BioCreative II.5 Challenge

Published: 01 July 2010 Publication History

Abstract

The team at the University of Wisconsin-Milwaukee developed an information retrieval and machine learning framework. Our framework requires only the standardized training data and depends upon minimal external knowledge resources and minimal parsing. Within the framework, we built our text mining systems and participated for the first time in all three BioCreative II.5 Challenge tasks. The results show that our systems performed among the top five teams for raw F1 scores in all three tasks and came in third place for the homonym ortholog F1 scores for the INT task. The results demonstrated that our IR-based framework is efficient, robust, and potentially scalable.

References

[1]
D. Chen, H.M. Müller, and P.W. Sternberg, "Automatic Document Classification of Biological Literature," BMC Bioinformatics, vol. 7, p. 370, 2006.
[2]
D. Hanisch, K. Fundel, H.T. Mevissen, R. Zimmer, and J. Fluck, "ProMiner: Rule-Based Protein and Gene Entity Recognition," BMC Bioinformatics, vol. 6, pp. S14-S22, 2005.
[3]
K.J. Lee, Y.S. Hwang, S. Kim, and H.C. Rim, "Biomedical Named Entity Recognition Using Two-Phase Model Based on SVMs," J. Biomedical Informatics, vol. 37, pp. 436-447, 2004.
[4]
R. Sætre and K. Sagae, "Syntactic Features for Protein-Protein Interaction Extraction," Proc. Int'l Symp. Languages in Biology and Medicine, 2007.
[5]
A. Rzhetsky, I. Iossifov, T. Koike, M. Krauthammer, P. Kra, M. Morris, H. Yu, P.A. Duboué, W. Weng, W.J. Wilbur, V. Hatzivassiloglou, and C. Friedman, "GeneWays: A System for Extracting, Analyzing, Visualizing, and Integrating Molecular Pathway Data," J. Biomedical Informatics, vol. 37, pp. 43-53, Feb. 2004.
[6]
M. Krauthammer, C.A. Kaufmann, T.C. Gilliam, and A. Rzhetsky, "Molecular Triangulation: Bridging Linkage and Molecular-Network Information for Identifying Candidate Genes in Alzheimer's Disease," Proc. Nat'l Academy of Sciences USA, vol. 101, pp. 15148- 15153, Oct. 2004.
[7]
B.J. Stapley and G. Benoit, "Biobibliometrics: Information Retrieval and Visualization from Co-Occurrences of Gene Names in Medline Abstracts," Proc. Pacific Symp. Biocomputing, pp. 529-540, 2000.
[8]
J. Bandy, D. Milward, and S. McQuay, "Mining Protein-Protein Interactions from Published Literature Using Linguamatics I2E," Methods in Molecular Biology (Clifton, NJ), vol. 563, pp. 3-13, 2009.
[9]
T. Sekimizu, H. Park, and J. Tsujii, "Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts," Proc. Workshop Genome Informatics, vol. 9, pp. 62-71, 1998.
[10]
S.T. Ahmed, D. Chidambaram, H. Davulcu, and C. Baral, "Intex: A Syntactic Role Driven Protein-Protein Interaction Extractor for Bio-Medical Text," Proc. ISMB BioLINK Special Interest Group on Text Data Mining and the ACL Workshop Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pp. 54-61, 2005.
[11]
J. Chiang, H. Yu, and H. Hsu, "GIS: A Biomedical Text-Mining System for Gene Information Discovery," Bioinformatics, vol. 20, pp. 120-121, Jan. 2004.
[12]
J. Xiao, J. Su, G.D. Zhou, and C.L. Tan, "Protein-Protein Interaction Extraction: A Supervised Learning Approach," Proc. Symp. Semantic Mining in Biomedicine, pp. 51-59, 2005.
[13]
L. Hirschman, A. Yeh, C. Blaschke, and A. Valencia, "Overview of BioCreAtIvE: Critical Assessment of Information Extraction for Biology," BMC Bioinformatics, vol. 6, suppl 1, pp. S1-S10, 2005.
[14]
M. Krallinger, F. Leitner, C. Rodriguez-Penagos, and A. Valencia, "Overview of the Protein-Protein Interaction Annotation Extraction Task of Biocreative II," Genome Biology, vol. 9, suppl 2, pp. S4- S22, 2008.
[15]
Y. Niu, D. Otasek, and I. Jurisica, "Evaluation of Linguistic Features Useful in Extraction of Interactions from ; Application to Annotating Known, High-Throughput and Predicted Interactions in I2D," Bioinformatics, vol. 26, pp. 111-119, Jan. 2010.
[16]
B.J. Stapley, L.A. Kelley, and M.J. Sternberg, "Predicting the Sub-Cellular Location of Proteins from Text Using Support Vector Machines," Proc. Pacific Symp. Biocomputing, 2002.
[17]
H. Shatkay and R. Feldman, "Mining the Biomedical Literature in the Genomic Era: An Overview," J. Computational Biology, vol. 10, pp. 821-855, 2003.
[18]
C. Blaschke, M.A. Andrade, C. Ouzounis, and A. Valencia, "Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions," Proc. Int'l Conf. Intelligent Systems for Molecular Biology, pp. 60-67, 1999.
[19]
L. Wong, "A Protein Interaction Extraction System," Proc. Pacific Symp. Biocomputing, 2001.
[20]
U. Pieper, N. Eswar, H. Braberg, M.S. Madhusudhan, F.P. Davis, A.C. Stuart, N. Mirkovic, A. Rossi, M.A. Marti-Renom, A. Fiser, B. Webb, D. Greenblatt, C.C. Huang, T.E. Ferrin, and A. Sali, "MODBASE, a Database of Annotated Comparative Protein Structure Models, and Associated Resources," Nucleic Acids Research, vol. 32, pp. D217-D222, 2004.
[21]
J. Thomas, D. Milward, C. Ouzounis, S. Pulman, and M. Carroll, "Automatic Extraction of Protein Interactions from Scientific Abstracts," Proc. Pacific Symp. Biocomputing, pp. 541-552, 2000.
[22]
C. Friedman, P. Kra, H. Yu, M. Krauthammer, and A. Rzhetsky, "GENIES: A Natural-Language Processing System for the Extraction of Molecular Pathways from Journal Articles," Bioinformatics (Oxford, England), vol. 17, pp. 74-82, 2001.
[23]
N. Daraselia, A. Yuryev, S. Egorov, S. Novichkova, A. Nikitin, and I. Mazo, "Extracting Human Protein Interactions from MEDLINE Using a Full-Sentence Parser," Bioinformatics, vol. 20, pp. 604-611, Mar. 2004.
[24]
D.R. Rhodes, S.A. Tomlins, S. Varambally, V. Mahavisno, T. Barrette, S. Kalyana-Sundaram, D. Ghosh, A. Pandey, and A.M. Chinnaiyan, "Probabilistic Model of the Human Protein-Protein Interaction Network," Nature Biotechnology, vol. 23, pp. 951-959, 2005.
[25]
A. Koike and T. Takagi, "Prediction of Protein-Protein Interaction Sites Using Support Vector Machines," Protein Eng., Design and Selection, vol. 17, pp. 165-173, Feb. 2004.
[26]
R. McDonald and F. Pereira, "Identifying Gene and Protein Mentions in Text Using Conditional Random Fields," BMC Bioinformatics, vol. 6, suppl 1, pp. S6-S12, 2005.
[27]
T. Sandler, A.I. Schein, and L.H. Ungar, "Automatic Term List Generation for Entity Tagging," Bioinformatics, vol. 22, pp. 651-657, 2006.
[28]
A. Morgan, Z. Lu, X. Wang, A. Cohen, J. Fluck, P. Ruch, A. Divoli, K. Fundel, R. Leaman, J. Hakenberg, C. Sun, H. Liu, R. Torres, M. Krauthammer, W. Lau, H. Liu, C. Hsu, M. Schuemie, K.B. Cohen, and L. Hirschman, "Overview of BioCreative II Gene Normalization," Genome Biology, vol. 9, pp. S3-S21, 2008.
[29]
L. Smith, L. Tanabe, R. Ando, C. Kuo, I. Chung, C. Hsu, Y. Lin, R. Klinger, C. Friedrich, K. Ganchev, M. Torii, H. Liu, B. Haddow, C. Struble, R. Povinelli, A. Vlachos, W. Baumgartner, L. Hunter, B. Carpenter, R. Tsai, H. Dai, F. Liu, Y. Chen, C. Sun, S. Katrenko, P. Adriaans, C. Blaschke, R. Torres, M. Neves, P. Nakov, A. Divoli, M. Mana-Lopez, J. Mata, and W.J. Wilbur, "Overview of BioCreative II Gene Mention Recognition," Genome Biology, vol. 9, pp. S2-S20, 2008.
[30]
Y. Sasaki, S. Montemagni, P. Pezik, D. Schuhman, J. Mcnaught, and S. Ananiadou, "{BioLexicon}: {A} Lexical Resource for the Biology Domain," Proc. Third Int'l Symp. Semantic Mining in Biomedicine (SMBM '08), pp. 109-116, 2008.
[31]
H. Yu, G. Hripcsak, and C. Friedman, "Mapping Abbreviations to Full Forms in Biomedical Articles," J. Am. Medical Informatics Assoc., vol. 9, pp. 262-272, May 2002.
[32]
H. Yu and E. Agichtein, "Extracting Synonymous Gene and Protein Terms from Biological Literature," Bioinformatics (Oxford, England), vol. 19, suppl 1, pp. i340-i349, 2003.
[33]
D.S. Hirschberg, "Algorithms for the Longest Common Subsequence Problem," J. ACM, vol. 24, pp. 664-675, 1977.
[34]
W.E. Winkler, "The State of Record Linkage and Current Research Problems," Technical Report RR99-04, Statistical Research Division, United States Census Bureau, 1999.
[35]
A. Zanzoni, L. Montecchi-Palazzi, M. Quondam, G. Ausiello, M. Helmer-Citterich, and G. Cesareni, "MINT: a Molecular INTeraction Database," FEBS Letters, vol. 513, pp. 135-140, Feb. 2002.
[36]
H. Yu, W. Kim, V. Hatzivassiloglou, and J. Wilbur, "A Large Scale, Corpus-Based Approach for Automatically Disambiguating Biomedical Abbreviations," ACM Trans. Information Systems, vol. 24, pp. 380-404, 2006.
[37]
H. Yu, W. Kim, V. Hatzivassiloglou, and W.J. Wilbur, "Using MEDLINE as a Knowledge Source for Disambiguating Abbreviations and Acronyms in Full-Text Biomedical Journal Articles," J. Biomedical Informatics, vol. 40, pp. 150-159, 2007.
[38]
J. Kim, T. Ohta, Y. Tateisi, and J. Tsujii, "GENIA Corpus-- Semantically Annotated Corpus for Bio-Textmining," Bioinformatics, vol. 19, suppl 1, pp. i180-i182, 2003.
[39]
B. Settles, "ABNER: An Open Source Tool for Automatically Tagging Genes, Proteins and Other Entity Names in Text," Bioinformatics, vol. 21, pp. 3191-3192, July 2005.
[40]
Y. Regev, M. Finkelstein-Landau, R. Feldman, M. Gorodetsky, X. Zheng, S. Levy, R. Charlab, C. Lawrence, R.A. Lippert, Q. Zhang, and H. Shatkay, "Rule-Based Extraction of Experimental Evidence in the Biomedical Domain: The KDD Cup 2002 (Task 1)," ACM SIGKDD Exploration Newsletter, vol. 4, pp. 90-92, 2002.
[41]
H. Yu and M. Lee, "Accessing Bioscience Images from Abstract Sentences," Bioinformatics, vol. 22, pp. e547-e556, 2006.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 7, Issue 3
July 2010
192 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 July 2010
Published in TCBB Volume 7, Issue 3

Author Tags

  1. Bioinformatics (genome or protein) databases
  2. information search and retrieval
  3. systems and software
  4. text mining.

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 160
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media