Abstract
The Deep Web, considered as the amount of built-on-demand (non pre-built) Web pages has become a very important part of the Web, not only because of its enormous size (it might considered that it is significantly bigger that the Superficial pre-built Web [12]), but because these Web pages usually contain customized information extracted from databases according to specific user’s requests. These pages are commonly robot-unreachable, usually requiring a login identification process or filling in some forms. Since pages within the Deep Web must be obtained within a navigation process, it is common that a single URL may not be enough for reaching them, so full navigation paths need to be stablished, usually by starting at a well-known URL and following some links and filling in some forms.
On the other hand, Web Intelligence in a Web client might be considered as the property of properly combining several distributed data for solving a specific problem. Automated navigation through the Deep Web needs intelligence in order to reach relevant data which can be further computed. Web automated navigation involves both inter-document and intra-document navigations. Without intelligence at any of these two, Web clients can not stablish proper Web Navigation paths to those relevant data.
This article presents an approach to formalize specifications of automated navigation on the Deep Web. These formalization has been expressed both graphically and textually in a combination of two languages for defining intelligent Web navigation behaviours at Web clients. Running examples of programs developed with these languages have been successfully developed and tested on legacy well known Web sites with low cost and a relatively high robustness.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Curl tool, curl.haxx.se/docs/httpscripting.shtml
Wget tool, sunsite.auc.dk/pub/infosystems/wget/
Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific American (May 2001)
Centeno, V.L., Breuer, P.T., Fernandez, L.S., Kloos, C.D., Perez, J.A.H.: Msc-based language for specifying automated web clients. In: Eighth IEEE Symposium on Computers and Communications, Kemer, Antalya, Turkey, June 30 - July 3, pp. 407–412 (2003)
Centeno, V.L., Fernandez, L.S., Kloos, C.D., Breuer, P.T., Martin, F.P.: Building wrapper agents for the deep web. In: Cueva Lovelle, J.M., Rodríguez, B.M.G., Gayo, J.E.L., Ruiz, M.d.P.P., Aguilar, L.J. (eds.) ICWE 2003. LNCS, vol. 2722, pp. 58–67. Springer, Heidelberg (2003)
Centeno, V.L., Kloos, C.D., Breuer, P.T., Fernandez, L.S., Cabellos, M.E.G., Perez, J.A.H.: Automation of the deep web with user defined behaviours. In: Menasalvas, E., Segovia, J., Szczepaniak, P.S. (eds.) AWIC 2003. LNCS (LNAI), vol. 2663, pp. 339–348. Springer, Heidelberg (2003)
ITU-T. Recommendation z.120: Message sequence chart (msc). In Formal description techniques (FDT), Geneva, Switzerland (1997)
H. J. and M. B. The jdom project www.jdom.org
Kistler, T., Marais, H.: Webl - a programming language for the web. In: Proceedings of the 7th International World Wide Web Conference. Computer Networks and ISDN Systems, vol. 30, pp. 259–270 (1998)
M. T. Ltd. Sax: The simple api for xml www.megginson.com/sax
Raggett, D.: Clean up your web pages with html tidy. In: Poster 7th International World Wide Web Conference, www.w3.org/People/Raggett/tidy/
Singh, M.P.: Deep web structure. Internet Computing 6(5), 4–5 (2002)
Sun. Package java.net. In JavaTM 2 Platform Standard Edition, www.sun.com/java
W3C. Hypertext markup language (html and xhtml), www.w3.org/MarkUp/
W3C. Libwww - the w3c protocol library, www.w3.org/Library/
W3C. Resource description framework (rdf), www.w3.org/RDF
W3C. Xsl transformations (xslt) version 1.0. W3C Recommendation, November 16 (1999)
W3C. Document object model (dom) level 2. W3C Recommendation, November 13 (2000)
W3C. Xml pointer language (xpointer). W3C Working Draft, August 16 (2002)
W3C. Web ontology language (owl) reference version 1.0. In: W3C Working Draft, February 21 (2003), http://www.w3.org/2001/sw/
W3C. Xml path language (xpath) 2.0. W3C Working Draft May 2 (2003)
W3C. Xquery 1.0: An xml query language. W3C Working Draft May 2 (2003)
Wall, L.: Perl language, v5.004. In Freely available software package (June 1997), ftp://ftp.perl.com/pub/perl/src/CPAN/5.0/perl5.004.tar.gz
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Luque Centeno, V., Delgado Kloos, C., Sánchez Fernández, L., Fernández García, N. (2004). Intelligent Automated Navigation through the Deep Web. In: Favela, J., Menasalvas, E., Chávez, E. (eds) Advances in Web Intelligence. AWIC 2004. Lecture Notes in Computer Science(), vol 3034. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24681-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-24681-7_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22009-1
Online ISBN: 978-3-540-24681-7
eBook Packages: Springer Book Archive