[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Intelligent Automated Navigation through the Deep Web

  • Conference paper
Advances in Web Intelligence (AWIC 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3034))

Included in the following conference series:

Abstract

The Deep Web, considered as the amount of built-on-demand (non pre-built) Web pages has become a very important part of the Web, not only because of its enormous size (it might considered that it is significantly bigger that the Superficial pre-built Web [12]), but because these Web pages usually contain customized information extracted from databases according to specific user’s requests. These pages are commonly robot-unreachable, usually requiring a login identification process or filling in some forms. Since pages within the Deep Web must be obtained within a navigation process, it is common that a single URL may not be enough for reaching them, so full navigation paths need to be stablished, usually by starting at a well-known URL and following some links and filling in some forms.

On the other hand, Web Intelligence in a Web client might be considered as the property of properly combining several distributed data for solving a specific problem. Automated navigation through the Deep Web needs intelligence in order to reach relevant data which can be further computed. Web automated navigation involves both inter-document and intra-document navigations. Without intelligence at any of these two, Web clients can not stablish proper Web Navigation paths to those relevant data.

This article presents an approach to formalize specifications of automated navigation on the Deep Web. These formalization has been expressed both graphically and textually in a combination of two languages for defining intelligent Web navigation behaviours at Web clients. Running examples of programs developed with these languages have been successfully developed and tested on legacy well known Web sites with low cost and a relatively high robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Curl tool, curl.haxx.se/docs/httpscripting.shtml

  2. Wget tool, sunsite.auc.dk/pub/infosystems/wget/

  3. Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific American (May 2001)

    Google Scholar 

  4. Centeno, V.L., Breuer, P.T., Fernandez, L.S., Kloos, C.D., Perez, J.A.H.: Msc-based language for specifying automated web clients. In: Eighth IEEE Symposium on Computers and Communications, Kemer, Antalya, Turkey, June 30 - July 3, pp. 407–412 (2003)

    Google Scholar 

  5. Centeno, V.L., Fernandez, L.S., Kloos, C.D., Breuer, P.T., Martin, F.P.: Building wrapper agents for the deep web. In: Cueva Lovelle, J.M., Rodríguez, B.M.G., Gayo, J.E.L., Ruiz, M.d.P.P., Aguilar, L.J. (eds.) ICWE 2003. LNCS, vol. 2722, pp. 58–67. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  6. Centeno, V.L., Kloos, C.D., Breuer, P.T., Fernandez, L.S., Cabellos, M.E.G., Perez, J.A.H.: Automation of the deep web with user defined behaviours. In: Menasalvas, E., Segovia, J., Szczepaniak, P.S. (eds.) AWIC 2003. LNCS (LNAI), vol. 2663, pp. 339–348. Springer, Heidelberg (2003)

    Google Scholar 

  7. ITU-T. Recommendation z.120: Message sequence chart (msc). In Formal description techniques (FDT), Geneva, Switzerland (1997)

    Google Scholar 

  8. H. J. and M. B. The jdom project www.jdom.org

  9. Kistler, T., Marais, H.: Webl - a programming language for the web. In: Proceedings of the 7th International World Wide Web Conference. Computer Networks and ISDN Systems, vol. 30, pp. 259–270 (1998)

    Google Scholar 

  10. M. T. Ltd. Sax: The simple api for xml www.megginson.com/sax

  11. Raggett, D.: Clean up your web pages with html tidy. In: Poster 7th International World Wide Web Conference, www.w3.org/People/Raggett/tidy/

  12. Singh, M.P.: Deep web structure. Internet Computing 6(5), 4–5 (2002)

    Article  Google Scholar 

  13. Sun. Package java.net. In JavaTM 2 Platform Standard Edition, www.sun.com/java

  14. W3C. Hypertext markup language (html and xhtml), www.w3.org/MarkUp/

  15. W3C. Libwww - the w3c protocol library, www.w3.org/Library/

  16. W3C. Resource description framework (rdf), www.w3.org/RDF

  17. W3C. Xsl transformations (xslt) version 1.0. W3C Recommendation, November 16 (1999)

    Google Scholar 

  18. W3C. Document object model (dom) level 2. W3C Recommendation, November 13 (2000)

    Google Scholar 

  19. W3C. Xml pointer language (xpointer). W3C Working Draft, August 16 (2002)

    Google Scholar 

  20. W3C. Web ontology language (owl) reference version 1.0. In: W3C Working Draft, February 21 (2003), http://www.w3.org/2001/sw/

  21. W3C. Xml path language (xpath) 2.0. W3C Working Draft May 2 (2003)

    Google Scholar 

  22. W3C. Xquery 1.0: An xml query language. W3C Working Draft May 2 (2003)

    Google Scholar 

  23. Wall, L.: Perl language, v5.004. In Freely available software package (June 1997), ftp://ftp.perl.com/pub/perl/src/CPAN/5.0/perl5.004.tar.gz

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Luque Centeno, V., Delgado Kloos, C., Sánchez Fernández, L., Fernández García, N. (2004). Intelligent Automated Navigation through the Deep Web. In: Favela, J., Menasalvas, E., Chávez, E. (eds) Advances in Web Intelligence. AWIC 2004. Lecture Notes in Computer Science(), vol 3034. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24681-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24681-7_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22009-1

  • Online ISBN: 978-3-540-24681-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics