Abstract
In this paper, we propose a set of simple and efficient methods based on standard, free and widely available tools, to store and manipulate large sets of URLs and large parts of the Web graph. Our aim is both to store efficiently the URLs list and the graph in order to manage all the computations in a computer central memory. We also want to make the conversion between URLs and their identifiers as fast as possible, and to obtain all the successors of an URL in the Web graph efficiently. The methods we propose make it possible to obtain a good compromise between these two challenges, and make it possible to manipulate large parts of the Web graph.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. Albert, H. Jeong, and A.-L. Barabasi. Diameter of the world wide web. Nature, 401:130–131, 1999.
Krishna Bharat, Andrei Broder, Monika Henzinger, Puneet Kumar, and Suresh Venkatasubramanian. The Connectivity Server: fast access to linkage information on the Web. Computer Networks and ISDN Systems, 30(1–7):469–477, 1998.
A. Z. Broder, S. R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. L. Wiener. Graph structure in the web. WWW9 / Computer Networks, 33(1–6):309–320, 2000.
M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, 1994.
P. Deutsch. Deflate compressed data format specification version 1.3. Aladdin Enterprises, May 1996. RFC 1951.
P. Deutsch. Gzip file format specification version 4.3. Aladdin Enterprises, May 1996. RFC 1952.
Jirí Dvorsky. Text compression with random access.
Larbin home page. http://larbin.sourceforge.net/index-eng.html.
J. M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. S. Tomkins. The Web as a graph: Measurements, models, and methods. In T. Asano, H. Imai, D. T. Lee, S. Nakano, and T. Tokuyama, editors, Proc. 5th Annual Int. Conf. Computing and Combinatorics, COCOON, number 1627. Springer-Verlag, 1999.
S. Ravi Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the web for emerging cyber-communities. WWW8 / Computer Networks, 31(11–16):1481–1493, 1999.
Haris Lekatsas and Wayne Wolf. Random access decompression using binary arithmetic coding. In Data Compression Conference, pages 306–315, 1999.
Rajiv Wickremesinghe, Raymie Stata, and Janet Wiener. Link compression in the connectivity server. Technical report, Compaq systems research center, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guillaume, JL., Latapy, M., Viennot, L. (2002). Efficient and Simple Encodings for the Web Graph. In: Meng, X., Su, J., Wang, Y. (eds) Advances in Web-Age Information Management. WAIM 2002. Lecture Notes in Computer Science, vol 2419. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45703-8_30
Download citation
DOI: https://doi.org/10.1007/3-540-45703-8_30
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44045-1
Online ISBN: 978-3-540-45703-9
eBook Packages: Springer Book Archive