Abstract
The Web has rapidly expanded not only as the largest human knowledge repository, but also as a way to develop online applications that may be used for many different tasks. Wrapper agents may automate these tasks for the user. However, developing these wrapper agents for automating tasks on the Web is rather expensive, and they usually require a lot of maintenance effort since Web pages are not strongly structured.
This paper introduces two programming languages for automating tasks on the legacy Web developing low cost, robust wrapper agents that may navigate the Web emulating browsers and may process data from the Web automating some user’s behaviours. These languages are based on formal methods and W3C standards and are suitable for the legacy deep Web, automating tasks like data mining or information integration from different sources. A platform providing execution support to agents developed in these languages has been provided.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Altavista. www.altavista.com.
Google. www.google.com.
Wget tool. sunsite.auc.dk/pub/infosystems/wget/.
A. S. F. Azavant. Building light-weight wrappers for legacy web data-sources using w4f. International Conference on Very Large Databases (VLDB), 1999.
J. Baeten, H. van Beek, and S. Mauw. An MSC based representation of DiCons. In Proceedings of the 10th SDL Forum, pages 328–347, Copenhagen, Denmark, June 2001.
S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. D. Ullman, and J. Widom. The TSIMMIS project: Integration of heterogeneous information sources. In 16th Meeting of the Information Processing Society of Japan, pages 7–18, Tokyo, Japan, 1994.
C. P. David Buttler, Ling Liu. A fully automated extraction system for the world wide web. IEEE ICDCS-21, April 16–19 2001.
D. Florescu, A. Grunhagen, and D. Kossmann. Xl: An xml programming language for web service specification and composition. In WWW 11th conference, 2002.
R. Goldman, J. McHugh, and J. Widom. From semistructured data to XML: Migrating the lore data model and query language. In Workshop on the Web and Databases (WebDB’ 99), pages 25–30, 1999.
ITU-T. Recommendation z.120: Message sequence chart (msc). In Formal description techniques (FDT), Geneva, Switzerland, 1997.
L. Liu, C. Pu, and W. Han. XWRAP: An XML-enabled wrapper construction system for web information sources. In ICDE, pages 611–621, 2000.
S. Lu, M. Dong, and F. Fotouhi. The semantic web: Opportunities and challenges for next-generation web applications. Information Research, 7(4), 2002. Special Issue on the Semantic Web.
V. Luque-Centeno, L. Sanchez-Fernandez, C. Delgado-Kloos, P. T. Breuer, and M. E. Gonzalo-Cabellos. Standards-based languages for programming web navigation assistants. In 5th IEEE International Workshop on Networked Appliances, pages 70–75, Liverpool, U.K., October 2002.
G. Michaelson. An introduction to functional programming through lambda calculus. In Addison-Wesley, XV, 320 S. — ISBN 0-201-17812-5, 1988.
I. Muslea, S. Minton, and C. A. Knoblock. Hierarchical wrapper induction for semistructured information sources. Autonomous Agents and Multi-Agent Systems, 4(1/2):93–114, 2001.
J. Myllymaki. Effective web data extraction with standard XML technologies. In World Wide Web 10th Conference, Hong Kong, pages 689–696, 2001.
W3C. Policies relating to web accessibility. http://www.w3.org/WAI/Policy/.
W3C. W3c link checker. validator.w3.org/checklink.
W3C. Web content accessibility guidelines 1.0. W3C Recommendation 5-May-1999, 1999.
W3C. Xsl transformations (xslt) version 1.0. W3C Recommendation 16 November 1999, 1999.
W3C. Document object model (dom) level 2. W3C Recommendation 13 November, 2000, 2000.
W3C. Xml path language (xpath) 2.0. W3C Working Draft 15 November 2002, 2002.
W3C. Xml pointer language (xpointer). W3C Working Draft 16 August 2002, 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Centeno, V.L., Fernández, L.S., Kloos, C.D., Breuer, P.T., Martín, F.P. (2003). Building Wrapper Agents for the Deep Web. In: Lovelle, J.M.C., Rodríguez, B.M.G., Gayo, J.E.L., del Puerto Paule Ruiz, M., Aguilar, L.J. (eds) Web Engineering. ICWE 2003. Lecture Notes in Computer Science, vol 2722. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45068-8_9
Download citation
DOI: https://doi.org/10.1007/3-540-45068-8_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40522-1
Online ISBN: 978-3-540-45068-9
eBook Packages: Springer Book Archive