[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

I/O-efficient algorithms for top-k nearest keyword search in massive graphs

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Networks emerging nowadays usually have labels or textual content on the nodes. We model such commonly seen network as an undirected graph G, in which each node is attached with zero or more keywords, and each edge is assigned with a length. On such networks, a novel and useful query is called top-k nearest keyword (\(\mathsf {k\text {-}NK}\)) search. Given a query node q in G and a keyword \(\lambda \), a \(\mathsf {k\text {-}NK}\) query searches k nodes which contain \(\lambda \) and are nearest to q. The \(\mathsf {k\text {-}NK}\) problem has been studied recently in the literature. But most existing solutions assume that the graph as well as the constructed index can fit entirely in memory. As a result, they cannot be applied directly to very large-scale networks which are commonly found in practice, but cannot fit in memory. In this work, we design an I/O-efficient solution, which uses a compact disk index to answer a \(\mathsf {k\text {-}NK}\) query with constant I/Os. The key to an accurate \(\mathsf {k\text {-}NK}\) result is a precise shortest distance estimation in a graph. In our solution, we follow our previous work Qiao et al. (PVLDB 6:901–912, 2013) which uses the shortest path tree as an approximate representation of a graph and uses the tree distance between two nodes as an accurate estimation of the shortest distance between them on a graph. With such representation, the original \(\mathsf {k\text {-}NK}\) query on a graph can be reduced to answering the query on a set of trees and then assembling the results obtained from the trees. We exploit a compact tree-based index and study how to lay out the index to disk. We design a novel technique which decomposes the index tree into paths and subtrees and stores them in disk. Our theoretical analysis shows that the disk-based index is small in size and supports constant query I/Os. Extensive experimental study on massive trees and graphs with billions of edges and keywords verifies our theoretical findings and demonstrates the superiority of our method over the state-of-the-art methods in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://en.wikipedia.org/wiki/Facebook_Graph_Search.

  2. https://maps.google.com/.

  3. https://snap.stanford.edu/data/twitter7.html.

  4. http://www.informatik.uni-trier.de/~ley/db.

References

  1. Kolahdouzan, M.R., Shahabi, C.: Voronoi-based k nearest neighbor search for spatial network databases. In: VLDB, pp. 840–851 (2004)

  2. Samet, H., Sankaranarayanan, J., Alborzi, H.: Scalable network distance browsing in spatial databases. In: SIGMOD, pp. 43–54 (2008)

  3. Sankaranarayanan, J., Samet, H.: Query processing using distance oracles for spatial networks. IEEE Trans. Knowl. Data Eng. 22(8), 1158–1175 (2010)

    Article  Google Scholar 

  4. Bahmani, B., Goel, A.: Partitioned multi-indexing: bringing order to social search. In: WWW, pp. 399–408 (2012)

  5. Qiao, M., Qin, L., Cheng, H., Yu, J.X., Tian, W.: Top-k nearest keyword search on large graphs. PVLDB 6(10), 901–912 (2013)

    Google Scholar 

  6. Jiang, M., Fu, A., Wong, R.: Exact top-k nearest keyword search in large networks. In: SIGMOD, pp. 393–404 (2015)

  7. Sibeyn, J.F., Abello, J., Meyer, U.: Heuristics for semi-external depth first search on directed graphs. In: SPAA, pp. 282–292 (2002)

  8. Zhang, Z., Qin, L., Yu, J.X.: Contract & expand: I/O efficient sccs computing. In: ICDE, pp. 208–219 (2014)

  9. Kyrola, A., Blelloch, G., Guestrin, C.: GraphChi: large-scale graph computation on just a PC. OSDI 12, 31–46 (2012)

    Google Scholar 

  10. Zhu, X., Han, W., Chen, W.: GridGraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: USENIX Annual Technical Conference, pp. 375–386 (2015)

  11. Roy, A., Bindschaedler, L., Malicevic, J., Zwaenepoel, W.: Chaos: Scale-out graph processing from secondary storage. In: SOSP, pp. 410–424 (2015)

  12. Maheshwari, A., Zeh, N.: A survey of techniques for designing I/O-efficient algorithms. In: Algorithms for Memory Hierarchies, pp. 36–61. Springer, Berlin (2003)

  13. Hutchinson, D.A., Maheshwari, A., Zeh, N.: An external memory data structure for shortest path queries. In: COCOON, pp. 51–60 (1999)

  14. Hu, X., Tao, Y., Chung, C.: Massive graph triangulation. In: SIGMOD, pp. 325–336 (2013)

  15. Chu, S., Cheng, J.: Triangle listing in massive networks and its applications. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 21–24 August 2011, pp. 672–680 (2011)

  16. Tao, Y., Papadopoulos, S., Sheng, C., Stefanidis, K.: Nearest keyword search in xml documents. In: SIGMOD, pp. 589–600 (2011)

  17. Bender, M.A., Farach-colton, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) In Latin American Theoretical Informatics, pp. 88–94. Springer, Berlin (2000)

  18. Michalis, P., Bonchi, F., Castillo, C., Gionis, A.: Fast shortest path distance estimation in large networks. In: CIKM, pp. 867–876 (2009)

  19. Jon, K., Slivkins, A., Wexler, T.: Triangulation and embedding using small sets of beacons. In: IEEE Symposium on Foundations of Computer Science, pp. 444–453 (2004)

  20. Vieira, M.V., Fonseca, B.M., Damazio, R., Golgher, P.B., Reis, D.D.C., Ribeiro-Neto, B.: Efficient search ranking in social networks. In: CIKM, pp. 563–572 (2007)

  21. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1(1), 269–271 (1959)

    Article  MathSciNet  MATH  Google Scholar 

  22. Sarma, A.D., Gollapudi, S., Najork, M., Panigrahy, R.: A sketch-based distance oracle for web-scale graphs. In: WSDM, pp. 401–410 (2010)

  23. Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904)

    Article  Google Scholar 

  24. Hristidis, V., Papakonstantinou, Y.: Discover: keyword search in relational databases. In: VLDB, pp. 670–681 (2002)

  25. Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using banks. In: ICDE, pp. 431–440 (2002)

  26. Golenberg, K., Kimelfeld, B., Sagiv, Y.: Keyword proximity search in complex data graphs. In: SIGMOD, pp. 927–940 (2008)

  27. Li, G., Ooi, B.C., Feng, J., Wang, J., Zhou, L.: Ease: efficient and adaptive keyword search on unstructured, semi-structured and structured data. In: SIGMOD, pp. 903–914 (2008)

  28. Qin, L., Yu, J.X., Chang, L., Tao, Y.: Querying communities in relational databases. In: ICDE, pp. 724–735 (2009)

  29. Kargar, M., An, A.: Keyword search in graphs: finding r-cliques. PVLDB 4(10), 681–692 (2011)

    Google Scholar 

  30. Yu, J.X., Qin, L., Chang, L.: Keyword search in relational databases: a survey. IEEE Data Eng. Bull. 33(1), 67–78 (2010)

    Google Scholar 

  31. Yao, B., Tang, M., Li, F.: Multi-approximate-keyword routing in gis data. In: GIS, pp. 201–210 (2011)

  32. Cao, X., Chen, L., Cong, G., Xiao, X.: Keyword-aware optimal route search. PVLDB 5(11), 1136–1147 (2012)

    Google Scholar 

  33. Tao, Y.: A dynamic I/O-efficient structure for one-dimensional top-k range reporting. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS’14, Snowbird, UT, USA, 22–27 June 2014, pp. 256–265 (2014)

Download references

Acknowledgements

The work was supported by The Chinese University of Hong Kong Direct Grant No. 4055048.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Huang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Q., Cheng, H. & Huang, X. I/O-efficient algorithms for top-k nearest keyword search in massive graphs. The VLDB Journal 26, 563–583 (2017). https://doi.org/10.1007/s00778-017-0464-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-017-0464-7

Keywords