Abstract
Compact representation of Web and social graphs can be made efficiently with the K 2-tree as it achieves compression ratios about 5 bits per link for web graphs and about 20 bits per link for social graphs. The K 2-tree also enables fast processing of relevant queries such as direct and reverse neighbours in the compressed graph. These two properties make the K 2-tree suitable for inclusion in Web search engines where it is necessary to maintain very large graphs and to process on-line queries on them. Typically these search engines are deployed on dedicated clusters of distributed memory processors wherein the data set is partitioned and replicated to enable low query response time and high query throughput. In this context a practical strategy is simply to distribute the data on the processors and build local data structures for efficient retrieval in each processor. However, the way the data set is distributed on the processors can have a significant impact in performance. In this paper, we evaluate a number of data distribution strategies which are suitable for the K 2-tree and identify the alternative with the best general performance. In our study we consider different data sets and focus on metrics such as overall compression ratio and parallel response time for retrieving direct and reverse neighbours.
SAG and NB were founded by MICIN (PGE and FEDER) grants TIN2009-14560-C03-02, TIN2010-21246-C02-01, and CDTI CEN-20091048 and Xunta de Galicia (co-funded with FEDER) ref. 2010/17. MM was partially funded by research grant FONDEF IDeA CA12I10314.
The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI: 10.1007/978-3-319-02432-5_33
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
The boost graph library: user guide and reference manual. Addison-Wesley Longman Publishing Co., Inc., Boston (2002)
Boldi, P., Codenotti, B., Santini, M., Vigna, S.: Ubicrawler: A scalable fully distributed web crawler. Software: Practice & Experience 34(8), 711–726 (2004)
Boldi, P., Vigna, S.: The WebGraph framework I: Compression techniques. In: WWW, pp. 595–601. ACM Press, Manhattan (2004)
Brisaboa, N.R., Ladra, S., Navarro, G.: k2-trees for compact web graph representation. In: SPIRE, pp. 18–30 (2009)
Brisaboa, N.R., Ladra, S., Navarro, G.: Dacs: Bringing direct access to variable-length codes. In: SPIRE, pp. 392–404 (2009)
Bulu, A., Gilbert, J.R.: The combinatorial blas: design, implementation, and applications. Int. J. High Perform. Comput. Appl. 25(4), 496–509 (2011)
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: OSDI 2012 (2012)
Gregor, D., Lumsdaine, A.: The parallel bgl: A generic library for distributed graph computations. In: POOSC (2005)
Krepska, E., Kielmann, T., Fokkink, W., Bal, H.: Hipg: parallel processing of large-scale graphs. SIGOPS Oper. Syst. Rev. 45(2), 3–13 (2011)
Ladra, S.: Algorithms and Compressed Data Structures for Information Retrieval. PhD thesis, Department of Computer Science, University of A Corun̈a (2011)
Leskovec, L.: Snap: Stanford network analysis platform, http://snap.stanford.edu
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: A new framework for parallel machine learning. In: Grünwald, P., Spirtes, P. (eds.) UAI, pp. 340–349. AUAI Press (2010)
Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD 2010, pp. 135–146. ACM Press, New York (2010)
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Yucheng, L., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning and data mining in the cloud. VLDB 5(8), 716–727 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Álvarez-García, S., Brisaboa, N.R., Gómez-Pantoja, C., Marin, M. (2013). Distributed Query Processing on Compressed Graphs Using K2-Trees. In: Kurland, O., Lewenstein, M., Porat, E. (eds) String Processing and Information Retrieval. SPIRE 2013. Lecture Notes in Computer Science, vol 8214. Springer, Cham. https://doi.org/10.1007/978-3-319-02432-5_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-02432-5_32
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02431-8
Online ISBN: 978-3-319-02432-5
eBook Packages: Computer ScienceComputer Science (R0)