[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

TDS: fast answering reachability queries with hierarchical traversal trees

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Reachability query is a fundamental operation in graph applications, including social network analysis, knowledge graph based NLP. Current label+G solutions tend to directly answer reachability queries between near vertices. When two vertices are far apart, they have to traverse the graph, which is time consuming. In this work, we propose a multi-dimensional labeling method called TDS to support fast reachability queries. Firstly, we order the vertices with zero-indegree and traverse the graph from them in forward and reverse order to generate two traversal trees. Furthermore, we assign each vertex in a tree two labels, which indicate the level and subgraph it belongs to. With the four labels, many reachability queries for far apart and near pairs can be answered quickly. Experiments on 16 real graphs demonstrate TDS achieves better query performance in a majority of used graphs versus the state-of-the-art, while it consumes near-linear time and space for indexing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request. All the figures are original.

References

  1. Ai, F.: Research on a large-scale community detection algorithm based on non-weighted graph. Clust. Comput. 22(Supplement), 2555–2562 (2019)

    MATH  Google Scholar 

  2. Song, S., Huang, W., Sun, Y.: Semantic query graph based SPARQL generation from natural language questions. Clust. Comput. 22(Suppl 1), 847–858 (2019)

    Article  MATH  Google Scholar 

  3. Yildirim, H., Chaoji, V., Zaki, M.J.: GRAIL: a scalable index for reachability queries in very large graphs. VLDB J. 21(4), 509–534 (2012)

    Article  MATH  Google Scholar 

  4. Reps, T.W.: Program analysis via graph reachability. In: Logic Programming, Proceedings of the 1997 International Symposium, pp. 5–19. MIT Press (1997)

  5. Jin, R., Hong, H., Wang, H., Ruan, N., Xiang, Y.: Computing label-constraint reachability in graph databases. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 123–134. ACM (2010)

  6. Zhu, J., Nie, Z., Liu, X., Zhang, B., Wen, J.: StatSnowball: a statistical approach to extracting entity relationships. In: Proceedings of the 18th International Conference on World Wide Web, WWW, pp. 101–110. ACM (2009)

  7. Srivastava, S., Singh, A.K.: Fraud detection in the distributed graph database. Clust. Comput. 26(1), 515–537 (2023)

    Article  MATH  Google Scholar 

  8. Jin, R., Xiang, Y., Ruan, N., Wang, H.: Efficiently answering reachability queries on very large directed graphs. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 595–608. ACM (2008)

  9. Schaik, S.J., Moor, O.: A memory efficient reachability data structure through bit vector compression. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 913–924. ACM (2011)

  10. Jin, R., Xiang, Y., Ruan, N., Fuhry, D.: 3-HOP: a high-compression indexing scheme for reachability query. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 813–826. ACM (2009)

  11. Yildirim, H., Chaoji, V., Zaki, M.J.: GRAIL: scalable reachability index for large graphs. Proc. VLDB Endow. 3(1), 276–284 (2010)

    Article  MATH  Google Scholar 

  12. Trißl, S., Leser, U.: Fast and practical indexing and querying of very large graphs. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 845–856. ACM (2007)

  13. Su, J., Zhu, Q., Wei, H., Yu, J.X.: Reachability querying: can it be even faster? IEEE Trans. Knowl. Data Eng. 29(3), 683–697 (2017)

    Article  Google Scholar 

  14. Wei, H., Yu, J.X., Lu, C., Jin, R.: Reachability querying: an independent permutation labeling approach. Proc. VLDB Endow. 7(12), 1191–1202 (2014)

    Article  MATH  Google Scholar 

  15. Yuan, P., You, Y., Zhou, S., Jin, H., Liu, L.: Providing fast reachability query services with MGTag: a multi-dimensional graph labeling method. IEEE Trans. Serv. Comput. 15(2), 1000–1011 (2022)

    Article  MATH  Google Scholar 

  16. Li, L., Hua, W., Zhou, X.: HD-GDD: high dimensional graph dominance drawing approach for reachability query. World Wide Web 20(4), 677–696 (2017)

    Article  MATH  Google Scholar 

  17. Yu, J.X., Cheng, J.: Graph reachability queries: a survey. In: Managing and Mining Graph Data, vol. 40, pp. 181–215. Springer (2010)

  18. Chen, Y., Chen, Y.: An efficient algorithm for answering graph reachability queries. In: Proceedings of the 24th International Conference on Data Engineering, ICDE, pp. 893–902. IEEE Computer Society (2008)

  19. Cheng, J., Huang, S., Wu, H., Fu, A.W.: TF-label: a topological-folding labeling scheme for reachability querying in a large graph. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 193–204. ACM (2013)

  20. Cohen, E., Halperin, E., Kaplan, H., Zwick, U.: Reachability and distance queries via 2-hop labels. SIAM J. Comput. 32(5), 1338–1355 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  21. Jin, R., Ruan, N., Xiang, Y., Wang, H.: Path-tree: an efficient reachability indexing scheme for large directed graphs. ACM Trans. Database Syst. 36(1), 7–1744 (2011)

    Article  MATH  Google Scholar 

  22. Jagadish, H.V.: A compression technique to materialize transitive closure. ACM Trans. Database Syst. 15(4), 558–598 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  23. Agrawal, R., Borgida, A., Jagadish, H.V.: Efficient management of transitive relationships in large data and knowledge bases. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 253–262. ACM (1989)

  24. Chen, Y.: General spanning trees and reachability query evaluation. In: Canadian Conference on Computer Science & Software Engineering, C3S2E, pp. 243–252. ACM (2009)

  25. Wang, H., He, H., Yang, J., Yu, P.S., Yu, J.X.: Dual labeling: answering graph reachability queries in constant time. In: Proceedings of the 22nd International Conference on Data Engineering, ICDE, pp. 75–86. IEEE Computer Society (2006)

  26. Schenkel, R., Theobald, A., Weikum, G.: HOPI: an efficient connection index for complex XML document collections. In: Proceedings of the 9th International Conference on Extending Database Technology, EDBT, vol. 2992, pp. 237–255. Springer (2004)

  27. Cheng, J., Yu, J.X., Lin, X., Wang, H., Yu, P.S.: Fast computation of reachability labeling for large graphs. In: Proceedings of the 10th International Conference on Extending Database Technology, EDBT, vol. 3896, pp. 961–979. Springer (2006)

  28. Cai, J., Poon, C.K.: Path-hop: efficiently indexing large graphs for reachability queries. In: Proceedings of the 19th Conference on Information and Knowledge Management, CIKM, pp. 119–128. ACM (2010)

  29. Jin, R., Wang, G.: Simple, fast, and scalable reachability oracle. Proc. VLDB Endow. 6(14), 1978–1989 (2013)

    Article  MATH  Google Scholar 

  30. Zhu, A.D., Lin, W., Wang, S., Xiao, X.: Reachability queries on large dynamic graphs: a total order approach. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 1323–1334. ACM (2014)

  31. Chen, L., Gupta, A., Kurul, M.E.: Stack-based algorithms for pattern matching on DAGs. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB, pp. 493–504. ACM (2005)

  32. Seufert, S., Anand, A., Bedathur, S.J., Weikum, G.: FERRARI: flexible and efficient reachability range assignment for graph indexing. In: 29th IEEE International Conference on Data Engineering, ICDE, pp. 1009–1020. IEEE Computer Society (2013)

  33. Battista, G.D., Eades, P., Tamassia, R., Tollis, I.G.: Graph Drawing: Algorithms for the Visualization of Graphs. Prentice-Hall, Upper Saddle River (1999)

    MATH  Google Scholar 

  34. Veloso, R.R., Cerf, L., Jr., W.M., Zaki, M.J.: Reachability queries in very large graphs: a fast refined online search approach. In: Proceedings of the 17th International Conference on Extending Database Technology, EDBT, pp. 511–522. Springer (2014)

  35. Sengupta, N., Bagchi, A., Ramanath, M., Bedathur, S.: ARROW: approximating reachability using random walks over web-scale graphs. In: Proceedings of the 35th IEEE International Conference on Data Engineering, ICDE, pp. 470–481. IEEE Computer Society (2019)

  36. Su, Z., Wang, D., Zhang, X., Cui, L., Miao, C.: Efficient reachability query with extreme labeling filter. In: Proceedings of the 5th International Conference on Web Search and Data Mining, WSDM, pp. 966–975. ACM (2022)

  37. Anirban, S., Wang, J., Islam, M.S.: Modular decomposition-based graph compression for fast reachability detection. Data Sci. Eng. 4(3), 193–207 (2019)

    Article  MATH  Google Scholar 

  38. Zhou, J., Zhou, S., Yu, J.X., Wei, H., Chen, Z., Tang, X.: DAG reduction: fast answering reachability queries. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 375–390. ACM (2017)

  39. Zhou, J., Yu, J.X., Qiu, Y., Tang, X., Chen, Z., Du, M.: Fast reachability queries answering based on RCN reduction. IEEE Trans. Knowl. Data Eng. 35(3), 2590–2609 (2023)

    MATH  Google Scholar 

  40. Dietrich, J., Chang, L., Qian, L., Henry, L.M., McCartin, C., Scholz, B.: Efficient sink-reachability analysis via graph reduction. IEEE Trans. Knowl. Data Eng. 34(11), 5321–5335 (2022)

    Article  Google Scholar 

  41. Valstar, L.D.J., Fletcher, G.H.L., Yoshida, Y.: Landmark indexing for evaluation of label-constrained reachability queries. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 345–358. ACM (2017)

  42. Peng, Y., Zhang, Y., Lin, X., Qin, L., Zhang, W.: Answering billion-scale label-constrained reachability queries within microsecond. Proc. VLDB Endow. 13(6), 812–825 (2020)

    Article  MATH  Google Scholar 

  43. Peng, Y., Lin, X., Zhang, Y., Zhang, W., Qin, L.: Answering reachability and k-reach queries on large graphs with label constraints. VLDB J. 31(1), 101–127 (2022)

    Article  MATH  Google Scholar 

  44. Chen, X., Peng, Y., Wang, S., Yu, J.X.: DLCR: efficient indexing for label-constrained reachability queries on large dynamic graphs. Proc. VLDB Endow. 15(8), 1645–1657 (2022)

    Article  MATH  Google Scholar 

  45. Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data (2014)

  46. Leskovec, J., Sosič, R.: SNAP: a general purpose network analysis and graph mining library in C++. http://snap.stanford.edu/snap (2014)

  47. Barabsi, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61932004 and 62072205).

Funding

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61932004 and 62072205).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Daoliang He: conceptualization, methodology, software, validation, writing—original draft. Pingpeng Yuan: conceptualization, methodology, writing—review and editing, supervision, project administration, funding acquisition.

Corresponding authors

Correspondence to Daoliang He or Pingpeng Yuan.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Research involving human and/or animal rights

The study does not involve any human or animal study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, D., Yuan, P. TDS: fast answering reachability queries with hierarchical traversal trees. Cluster Comput 28, 178 (2025). https://doi.org/10.1007/s10586-024-04814-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10586-024-04814-8

Keywords