[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3107411.3107442acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article
Public Access

Knowledge Rich Natural Language Queries over Structured Biological Databases

Published: 20 August 2017 Publication History

Abstract

Increasingly, keyword, natural language and NoSQL queries are used for information retrieval from both traditional and non-traditional databases such as web, document, GIS, legal, and health databases. While their popularity are undeniable for obvious reasons, their engineering is far from simple. In most part, semantics and intent preserving mapping of a well understood natural language query expressed over a structured database schema to a structured query language is still a difficult task, and research to tame the complexity is intense. In this paper, we propose a multi-level knowledge-based middleware to facilitate such mappings that separate the conceptual level from the physical level. We augment these multi-level abstractions with a concept reasoner and a query strategy engine to dynamically link arbitrary natural language querying to well defined structured queries. We demonstrate the feasibility of our approach by presenting a Datalog based prototype system, called BioSmart, that can compute responses to arbitrary natural language queries over arbitrary databases once a syntactic classification of the natural language query is made.

References

[1]
GeneCards: The Human Gene Compendium. http://www.genecards.org/. Accessed: June 16, 2017.
[2]
InterProlog 2.1.2: a Java front-end and enhancement for Prolog. http://www.declarativa.com/interprolog/. Accessed: May 20, 2012.
[3]
NCBI BLAST Java Interface (Concordia University). http://users.encs.concordia.ca/$sim$f_kohant/ncbiblast/. Accessed: May 20, 2012.
[4]
Web Services Description Language (WSDL) Version 2.0. http://www.w3.org/TR/wsdl20/. Accessed: June 15, 2012.
[5]
B. Aklilu and K. Culligan. Molecular evolution and functional diversification of replication protein a1 in plants. Frontiers in Plant Science, 7(33), January 2016.
[6]
N. Aletras, D. Tsarapatsanis, D. Preotiuc-Pietro, and V. Lampos. Predicting judicial decisions of the european court of human rights: a natural language processing perspective. PeerJ Computer Science, 2:e93, 2016.
[7]
Y. Amsterdamer, A. Kukliansky, and T. Milo. A natural language interface for querying general and individual knowledge. PVLDB, 8(12):1430--1441, 2015.
[8]
A. Bhattacharjee, A. Islam, M. S. Amin, S. Hossain, S. Hosain, H. M. Jamil, and L. Lipovich. On-the-fly integration and ad hoc querying of life sciences databases using LifeDB. In 20th International Conference on Database and Expert Systems Applications, pages 561--575, Linz, Austria, August 2009.
[9]
M. Calejo. InterProlog: Towards a declarative embedding of logic programming in Java. In IEEE International Conference on Robotics and Automation, pages 714--717. Springer, 2004.
[10]
W. W. Chu. Cobase: A cooperative query answering facility for database systems. In DEXA, Prague, Czech Republic, September 6--8, pages 134--145, 1993.
[11]
B. L. Cook, A. M. Progovac, P. Chen, B. Mullin, S. Hou, and E. Baca-Garcia. Novel use of natural language processing (NLP) to predict suicidal ideation and psychiatric symptoms in a text-based mental health intervention in madrid. Comp. Math. Methods in Medicine, 2016:8708434:1--8708434:8, 2016.
[12]
J. Elhai, A. Taton, J. Massar, J. K. Myers, M. Travers, J. Casey, M. Slupesky, and J. Shrager. BioBIKE: A Web-based, programmable, integrated biological knowledge base. Nucl. Acids Res., 37:W28--32, 2009.
[13]
S. Ferré. Sparklis: An expressive query builder for SPARQL endpoints with guidance in natural language. Semantic Web, 8(3):405--418, 2017.
[14]
Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat. Genet., 25:25--29, 2000.
[15]
E. J. Goldsmith, S. Mendiratta, R. Akella, and K. Dahlgren. Natural language query in the biochemistry and molecular biology domains based on cognition search. Nature, 2008.
[16]
T. Hamon, N. Grabar, and F. Mougin. Querying biomedical linked data with natural language questions. Semantic Web, 8(4):581--599, 2017.
[17]
W. R. Hess, G. Rocap, C. S. Ting, F. Larimer, S. Stilwagen, J. Lamerdin, and S. W. Chisholm. The photosynthetic apparatus of Prochlorococcus: Insights through comparative genomics. Photosynth Res, 70(1):53--71, 2001.
[18]
T. Horiike, R. Minai, D. Miyata, Y. Nakamura, and Y. Tateno. Ortholog-Finder: A tool for constructing an ortholog data set. Genome Biology and Evolution, 8(2):446, 2016.
[19]
J. Huerta-Cepas, S. Capella-Gutiérrez, L. P. Pryszcz, M. Marcet-Houben, and T. Gabaldón. PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome. Nucleic Acids Research, 42(Database-Issue):897--902, 2014.
[20]
IHTSDO. SNOMED-CT. http://www.ihtsdo.org/snomed-ct. Accessed: May 14, 2016.
[21]
H. Jamil. A natural language interface plug-in for cooperative query answering in biological databases. BMC Genomics, 13(Suppl 3):S4, 2012.
[22]
H. M. Jamil, A. Islam, and S. Hossain. A declarative language and toolkit for scientific workflow implementation and execution. International Journal of Business Process Integration and Management, 5(1):3--17, 2010. IEEE SCC/SWF 2009 Special Issue on Scientific Workflows.
[23]
S. W. Joseph and R. Aleliunas. A knowledge-based subsystem for a natural language interface to a database that predicts and explains query failures. In Proceedings of the Seventh International Conference on Data Engineering, April 8--12, Kobe, Japan, pages 80--87, 1991.
[24]
D. Kossmann, F. Ramsak, and S. Rost. Shooting stars in the sky: An online algorithm for skyline queries. In VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, August 20--23, Hong Kong, China, pages 275--286, 2002.
[25]
S. Kumar, M. Nei, J. Dudley, and K. Tamura. MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Briefings in Bioinformatics, 9:299--306, 2008.
[26]
C. Lawrence and S. Riezler. NLmaps: A natural language interface to query openstreetmap. In COLING, December 11--16, Osaka, Japan, pages 6--10, 2016.
[27]
F. Li and H. V. Jagadish. Understanding natural language queries over relational databases. SIGMOD Record, 45(1):6--13, 2016.
[28]
F. Li, T. Pan, and H. V. Jagadish. Schema-free SQL. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22--27, 2014, pages 1051--1062, 2014.
[29]
C. D. Maio, G. Fenza, V. Loia, and M. Parente. Natural language query processing framework for biomedical literature. In IFSA-EUSFLAT-15, Gijón, Spain., June 30., 2015.
[30]
S. Mir, S. Staab, and I. Rojas. Web-prospector - an automatic, site-wide wrapper induction approach for scientific deep-web databases. In BTW, pages 87--106, 2009.
[31]
X. Mou, H. M. Jamil, and R. Rinker. Visual orchestration and autonomous execution of distributed and heterogeneous computational biology pipelines. In IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2016, Shenzhen, China, December 15--18, pages 752--757, 2016.
[32]
A. Nandi and H. V. Jagadish. Qunits: queried units in database search. In CIDR, 2009.
[33]
T. H. Nguyen, H. Nguyen, and J. Freire. PruSM: a prudent schema matching approach for web forms. In CIKM, pages 1385--1388, 2010.
[34]
L. Novik, P. Godfrey, and J. Minker. An architecture for a cooperative database system. In ADBy, pages 3--24, 1994.
[35]
A. P. Peter, K. Lakshmanan, S. Mohandass, S. Varadharaj, S. Thilagar, K. A. Abdul Kareem, P. Dharmar, S. Gopalakrishnan, and U. Lakshmanan. Cyanobacterial knowledgebase (CKB), a compendium of cyanobacterial genomes and proteomes. PLOS ONE, 10(8):1--12, 08 2015.
[36]
L. Safari and J. D. Patrick. Restricted natural language based querying of clinical databases. Journal of Biomedical Informatics, 52:338--353, 2014.
[37]
M. Safran, I. Dalah, J. Alexander, N. Rosen, T. Iny Stein, M. Shmoish, N. Nativ, I. Bahir, T. Doniger, H. Krug, A. Sirota-Madi, T. Olender, Y. Golan, G. Stelzer, A. Harel, and D. Lancet. GeneCards Version 3: the human gene integrator. Database, 2010(0):baq020--, 2010.
[38]
K. Sagonas, T. Swift, and D. S. Warren. XSB as an efficient deductive database engine. SIGMOD Rec., 23(2):442--453, 1994.
[39]
D. Saha, A. Floratou, K. Sankaranarayanan, U. F. Minhas, A. R. Mittal, and F. Özcan. ATHENA: an ontology-driven system for natural language querying over relational data stores. PVLDB, 9(12):1209--1220, 2016.
[40]
K. Z. Sultana, A. Bhattacharjee, M. S. Amin, and H. M. Jamil. A model for contextual cooperative query answering in e-commerce applications. In 8th International Conference on Flexible Query Answering Systems, Roskilde, Denmark, October 26--28, pages 25--36, 2009.
[41]
D. L. Swofford. PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts, 2003.
[42]
P. Thomas, J. Starlinger, A. Vowinkel, S. Arzt, and U. Leser. GeneView: a comprehensive semantic search engine for . Nucleic Acids Research, 40(W1):W585--W591, 2012.
[43]
G. A. Toda, E. Cortez, A. S. da Silva, and E. S. de Moura. A probabilistic approach for automatically filling form-based web interfaces. PVLDB, 4(3):151--160, 2010.
[44]
R. A. Vos, H. Lapp, W. H. Piel, and V. Tannen. TreeBASE2: Rise of the Machines. Nature Precedings, (713), 2010.
[45]
A. Zielezinski, M. Dziubek, J. Sliski, and W. M. Karlowski. ORCAN - a web-based meta-server for real-time detection and functional annotation of orthologs. Bioinformatics, ePub, 2017.

Cited By

View all
  • (2024)Knowledge Synthesis using Large Language Models for a Computational Biology Workflow EcosystemProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3636026(523-530)Online publication date: 8-Apr-2024
  • (2024)Combining computational linguistics with sentence embedding to create a zero-shot NLIDBArray10.1016/j.array.2024.10036824(100368)Online publication date: Dec-2024
  • (2023)Mapping Strategies for Declarative Queries over Online Heterogeneous Biological Databases for Intelligent ResponsesProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing10.1145/3555776.3577652(567-574)Online publication date: 27-Mar-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ACM-BCB '17: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics
August 2017
800 pages
ISBN:9781450347228
DOI:10.1145/3107411
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cooperative query answering
  2. implied query response
  3. logical implication
  4. query mapping
  5. query semantics

Qualifiers

  • Research-article

Funding Sources

Conference

BCB '17
Sponsor:

Acceptance Rates

ACM-BCB '17 Paper Acceptance Rate 42 of 132 submissions, 32%;
Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)68
  • Downloads (Last 6 weeks)8
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Knowledge Synthesis using Large Language Models for a Computational Biology Workflow EcosystemProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3636026(523-530)Online publication date: 8-Apr-2024
  • (2024)Combining computational linguistics with sentence embedding to create a zero-shot NLIDBArray10.1016/j.array.2024.10036824(100368)Online publication date: Dec-2024
  • (2023)Mapping Strategies for Declarative Queries over Online Heterogeneous Biological Databases for Intelligent ResponsesProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing10.1145/3555776.3577652(567-574)Online publication date: 27-Mar-2023
  • (2023)Automatic Hypotheses Testing Over Heterogeneous Biological Databases Using Open Knowledge NetworksInformation Integration and Web Intelligence10.1007/978-3-031-48316-5_34(358-364)Online publication date: 22-Nov-2023
  • (2022)Mapping Declarative Queries to Heterogeneous Biological Databases using Schema Graphs for Intelligent Responses2022 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00026(147-154)Online publication date: Dec-2022
  • (2019)Semantic Understanding of Natural Language Stories for Near Human Question AnsweringFlexible Query Answering Systems10.1007/978-3-030-27629-4_21(215-227)Online publication date: 12-Sep-2019
  • (2018)Parsing Natural Language Queries for Extracting Data from Large-Scale Geospatial Transportation Asset RepositoriesConstruction Research Congress 201810.1061/9780784481295.008(70-79)Online publication date: 29-Mar-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media