Automatic Discovery and Inferencing of Complex Bioinformatics Web Interfaces

Anne H. H. Ngu¹,
Daniel Rocco²,
Terence Critchlow³ &
…
David Buttler³

74 Accesses
9 Citations
Explore all metrics

Abstract

The World Wide Web provides a vast resource to genomics researchers, with Web-based access to distributed data sources such as BLAST sequence homology search interfaces. However, finding the desired scientific information can still be very tedious and frustrating. While there are several known servers on genomic data (e.g., GeneBank, EMBL, NCBI) that are shared and accessed frequently, new data sources are created each day in laboratories all over the world. Sharing these new genomics results is hindered by the lack of a common interface or data exchange mechanism. Moreover, the number of autonomous genomics sources and their rate of change outpace the speed at which they can be manually identified, meaning that the available data is not being utilized to its full potential. An automated system that can find, classify, describe, and wrap new sources without tedious and low-level coding of source-specific wrappers is needed to assist scientists in accessing hundreds of dynamically changing bioinformatics Web data sources through a single interface. A correct classification of any kind of Web data source must address both the capability of the source and the conversation/interaction semantics inherent in the design of the data source. We propose a service class description (SCD)-a meta-data approach for classifying Web data sources that takes into account both the capability and the conversational semantics of the source. The ability to discover the interaction pattern of a Web source leads to increased accuracy in the classification process. Our results show that an SCD-based approach successfully classifies two thirds of BLAST sites with 100% accuracy and two thirds of bioinformatics keyword search sites with around 80% precision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

SCALEUS: Semantic Web Services Integration for Biomedical Applications

Article 18 February 2017

BiOnIC: A Catalog of User Interactions with Biomedical Ontologies

Semantic Integration and Enrichment of Heterogeneous Biological Databases

References

S. F. Altschul, W. Gish, W. Miller, E. W. Meyers, and D. J. Lipman, “Basic local alignment search tool,” Journal of Molecular Biology 215(3), 1990, 403–410.
Article Google Scholar
A. Arasu and H. Garcia-Molina, “Extracting structured data from web pages,” in Proceedings of ACM/SIGMOD Annual Conference on Management of Data,2003, pp. 337–348.
Y. Arens, C. Knoblock, and W. Shen, “Query reformulation for dynamic information integration,” International Journal of Intelligent and Cooperative Information Systems 6(2), 1996, 99–130.
Google Scholar
R. Bayardo et al., “InfoSleuth: Agent-based semantic integration of information in open and dynamic environments,” in Proc. ACM SIGMOD Int'l Conference on Management of Data, 1997.
S. Brin and L. Page, “The anatomy of a large-scale hypertextual Web search engine,” Computer Networks and ISDN Systems 30(1–7), 1998, 107–117.
Google Scholar
S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. D. Ullman, and J. Widom, “The TSIMMIS project: Integration of heterogeneous information sources,” in 16th Meeting of the Information Processing Society of Japan, Tokyo, Japan, 1994, pp. 7–18.
S. B. Davidson, G. C. Overton, V. Tannen, and L. Wong, “BioKleisli: A digital library for biomedical researchers,” Int. J. on Digital Libraries, l(l), 1997, 36–53.
Google Scholar
DBCAT, The Public Catalog of Databases, http://www.infobiogen.fr/services/dbcat/, 2002.
R. B. Doorenbos, O. Etzioni, and D. S. Weld, “A scalable comparison-shopping agent for the world-wide web,” in W. L. Johnson and B. Hayes-Roth (eds), Proceedings of the First International Conference on Autonomous Agents (Agents'97), pp. 39–48, ACM Press, Marina del Rey, CA, USA, 1997.
B. Eckman, Z. Lacroix, and L. Raschid, “Optimized seamless integration of biomolecular data,” in IEEE International Conference on Bioinformatics and Biomedical Egineering, 2001, pp. 23–32.
D. C. Fallside, “XML Schema Part 0: Primer,” Technical report, World Wide Web Consortium, 2001. http://www.w3.org/TRyxnilschema-0/
W. Gish. BLAST, 2002. http://blast.wustl.edu/
R. Gold. HttpUnit. 2003. http://httpunit.sourceforge.net.
L. Haas, P. Schwarz, P. Kodali, E. Kotlar, J. Rice, and W. Swope, “Discoverylink: A system for integrating life sciences data,” IBM Systems Journa 40(2), 2001.
A. Heydon and M. Najork, “Mercator: A scalable, extensible web crawler,” World Wide Web 2(4), 1999, 219–229.
Article Google Scholar
C. A. Knoblock, S. Minton, J. L. Ambite, N. Ashish, I. Mulsea, A. G. Philpot, and S. Tejada. “The ariadne approach to web-based information integration,” International Journal of Cooperative Information Systems (IJCIS) 10(1–2), 2001, 145–169.
Google Scholar
A. Y. Levy, A. Rajaraman, and J. J. Ordille, “Querying heterogeneous information sources using source descriptions,” in Proceedings of the Twenty-second International Conference on Very Large Databases, pp. 251–262, Bombay, India, 1996. VLDB Endowment, Saratoga, CA.
L. Liu, C. Pu, and W. Han, “XWrap: An XML-enabled wrapper construction system for web information sources,” Proceedings of the International Conference on Data Engineering, 2000.
R. Miller and K. Bharat, “SPHINX: A framework for creating personal, site-specific web crawlers,” in Proceedings of the Seventh International World Wide Web Conference, 1998.
G. Modica, A. Gal, and H. M. Jamil, “The use of machine-generated ontologies in dynamic information seeking,” in 9th International Conference on Cooperative Information Systems, CoopIS2001, 2001, pp. 433–448.
National Center for Biotechnology Information. GenBank Statistics, 2003. http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html
NIAS DNA Bank. Growth of daily updates of DNA Sequence Databases, 2003. http://www.dna.affrc.go.jp/htdocs/growth/D-daily.html
NLM/NIH, National Center for Biotechnology Information, 2002. http://www.ncbi.nih.gov,/ 2002.
D. Rocco and T. Critchlow, “Automatic discovery and classification of bioinformatics web sources,” Bioinformatics, Oxford University Press, 19(15), 2003, 1927–1933
P. Srinivasan, J. Mitchell, O. Bodenreider, G. Pant, and F. Menczer, “Web crawling agents for retrieving biomedical information,” in Proceedings of the International Workshop on Agents in Bioinformatics (NETTAB-02), 2002.
G., Mecca V. Crescenzi, and P. Merialdo, “Towards automatic data extraction from large web sites,” in Proceedings of the 27th International Conference on Very Large Data bases, September 2001.
V. Zadorozhny, L. Raschid, M.-E. Vidal, T. Urhan, and L. Bright, “Efficient evaluation of queries in a mediator for websources,” in Proceedings of ACM/SIGMOD Annual Conference on Management of Data, 2002.

Download references

Author information

Authors and Affiliations

Department of Computer Science, Texas State University, San Marcos, TX, 78666, USA
Anne H. H. Ngu
Department of Computer Science, University of West Georgia, Carollton, GA, 30118, USA
Daniel Rocco
Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, 94551, USA
Terence Critchlow & David Buttler

Authors

Anne H. H. Ngu
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Rocco
View author publications
You can also search for this author in PubMed Google Scholar
Terence Critchlow
View author publications
You can also search for this author in PubMed Google Scholar
David Buttler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anne H. H. Ngu.

Additional information

This work was performed under the auspices of the U.S. Department of Energy by University of California, Lawrence Livermore National Laboratory under Contract W-7405-ENG-48. UCRL-JC

This work was performed while the author was a summer faculty scholar at LLNL.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ngu, A.H.H., Rocco, D., Critchlow, T. et al. Automatic Discovery and Inferencing of Complex Bioinformatics Web Interfaces. World Wide Web 8, 463–493 (2005). https://doi.org/10.1007/s11280-005-0509-5

Download citation

Published: 02 August 2005
Issue Date: December 2005
DOI: https://doi.org/10.1007/s11280-005-0509-5

Automatic Discovery and Inferencing of Complex Bioinformatics Web Interfaces

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SCALEUS: Semantic Web Services Integration for Biomedical Applications

BiOnIC: A Catalog of User Interactions with Biomedical Ontologies

Semantic Integration and Enrichment of Heterogeneous Biological Databases

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Automatic Discovery and Inferencing of Complex Bioinformatics Web Interfaces

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SCALEUS: Semantic Web Services Integration for Biomedical Applications

BiOnIC: A Catalog of User Interactions with Biomedical Ontologies

Semantic Integration and Enrichment of Heterogeneous Biological Databases

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now