Abstract
Reachability query is a fundamental operation in graph applications, including social network analysis, knowledge graph based NLP. Current label+G solutions tend to directly answer reachability queries between near vertices. When two vertices are far apart, they have to traverse the graph, which is time consuming. In this work, we propose a multi-dimensional labeling method called TDS to support fast reachability queries. Firstly, we order the vertices with zero-indegree and traverse the graph from them in forward and reverse order to generate two traversal trees. Furthermore, we assign each vertex in a tree two labels, which indicate the level and subgraph it belongs to. With the four labels, many reachability queries for far apart and near pairs can be answered quickly. Experiments on 16 real graphs demonstrate TDS achieves better query performance in a majority of used graphs versus the state-of-the-art, while it consumes near-linear time and space for indexing.
Similar content being viewed by others
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request. All the figures are original.
References
Ai, F.: Research on a large-scale community detection algorithm based on non-weighted graph. Clust. Comput. 22(Supplement), 2555–2562 (2019)
Song, S., Huang, W., Sun, Y.: Semantic query graph based SPARQL generation from natural language questions. Clust. Comput. 22(Suppl 1), 847–858 (2019)
Yildirim, H., Chaoji, V., Zaki, M.J.: GRAIL: a scalable index for reachability queries in very large graphs. VLDB J. 21(4), 509–534 (2012)
Reps, T.W.: Program analysis via graph reachability. In: Logic Programming, Proceedings of the 1997 International Symposium, pp. 5–19. MIT Press (1997)
Jin, R., Hong, H., Wang, H., Ruan, N., Xiang, Y.: Computing label-constraint reachability in graph databases. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 123–134. ACM (2010)
Zhu, J., Nie, Z., Liu, X., Zhang, B., Wen, J.: StatSnowball: a statistical approach to extracting entity relationships. In: Proceedings of the 18th International Conference on World Wide Web, WWW, pp. 101–110. ACM (2009)
Srivastava, S., Singh, A.K.: Fraud detection in the distributed graph database. Clust. Comput. 26(1), 515–537 (2023)
Jin, R., Xiang, Y., Ruan, N., Wang, H.: Efficiently answering reachability queries on very large directed graphs. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 595–608. ACM (2008)
Schaik, S.J., Moor, O.: A memory efficient reachability data structure through bit vector compression. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 913–924. ACM (2011)
Jin, R., Xiang, Y., Ruan, N., Fuhry, D.: 3-HOP: a high-compression indexing scheme for reachability query. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 813–826. ACM (2009)
Yildirim, H., Chaoji, V., Zaki, M.J.: GRAIL: scalable reachability index for large graphs. Proc. VLDB Endow. 3(1), 276–284 (2010)
Trißl, S., Leser, U.: Fast and practical indexing and querying of very large graphs. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 845–856. ACM (2007)
Su, J., Zhu, Q., Wei, H., Yu, J.X.: Reachability querying: can it be even faster? IEEE Trans. Knowl. Data Eng. 29(3), 683–697 (2017)
Wei, H., Yu, J.X., Lu, C., Jin, R.: Reachability querying: an independent permutation labeling approach. Proc. VLDB Endow. 7(12), 1191–1202 (2014)
Yuan, P., You, Y., Zhou, S., Jin, H., Liu, L.: Providing fast reachability query services with MGTag: a multi-dimensional graph labeling method. IEEE Trans. Serv. Comput. 15(2), 1000–1011 (2022)
Li, L., Hua, W., Zhou, X.: HD-GDD: high dimensional graph dominance drawing approach for reachability query. World Wide Web 20(4), 677–696 (2017)
Yu, J.X., Cheng, J.: Graph reachability queries: a survey. In: Managing and Mining Graph Data, vol. 40, pp. 181–215. Springer (2010)
Chen, Y., Chen, Y.: An efficient algorithm for answering graph reachability queries. In: Proceedings of the 24th International Conference on Data Engineering, ICDE, pp. 893–902. IEEE Computer Society (2008)
Cheng, J., Huang, S., Wu, H., Fu, A.W.: TF-label: a topological-folding labeling scheme for reachability querying in a large graph. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 193–204. ACM (2013)
Cohen, E., Halperin, E., Kaplan, H., Zwick, U.: Reachability and distance queries via 2-hop labels. SIAM J. Comput. 32(5), 1338–1355 (2003)
Jin, R., Ruan, N., Xiang, Y., Wang, H.: Path-tree: an efficient reachability indexing scheme for large directed graphs. ACM Trans. Database Syst. 36(1), 7–1744 (2011)
Jagadish, H.V.: A compression technique to materialize transitive closure. ACM Trans. Database Syst. 15(4), 558–598 (1990)
Agrawal, R., Borgida, A., Jagadish, H.V.: Efficient management of transitive relationships in large data and knowledge bases. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 253–262. ACM (1989)
Chen, Y.: General spanning trees and reachability query evaluation. In: Canadian Conference on Computer Science & Software Engineering, C3S2E, pp. 243–252. ACM (2009)
Wang, H., He, H., Yang, J., Yu, P.S., Yu, J.X.: Dual labeling: answering graph reachability queries in constant time. In: Proceedings of the 22nd International Conference on Data Engineering, ICDE, pp. 75–86. IEEE Computer Society (2006)
Schenkel, R., Theobald, A., Weikum, G.: HOPI: an efficient connection index for complex XML document collections. In: Proceedings of the 9th International Conference on Extending Database Technology, EDBT, vol. 2992, pp. 237–255. Springer (2004)
Cheng, J., Yu, J.X., Lin, X., Wang, H., Yu, P.S.: Fast computation of reachability labeling for large graphs. In: Proceedings of the 10th International Conference on Extending Database Technology, EDBT, vol. 3896, pp. 961–979. Springer (2006)
Cai, J., Poon, C.K.: Path-hop: efficiently indexing large graphs for reachability queries. In: Proceedings of the 19th Conference on Information and Knowledge Management, CIKM, pp. 119–128. ACM (2010)
Jin, R., Wang, G.: Simple, fast, and scalable reachability oracle. Proc. VLDB Endow. 6(14), 1978–1989 (2013)
Zhu, A.D., Lin, W., Wang, S., Xiao, X.: Reachability queries on large dynamic graphs: a total order approach. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 1323–1334. ACM (2014)
Chen, L., Gupta, A., Kurul, M.E.: Stack-based algorithms for pattern matching on DAGs. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB, pp. 493–504. ACM (2005)
Seufert, S., Anand, A., Bedathur, S.J., Weikum, G.: FERRARI: flexible and efficient reachability range assignment for graph indexing. In: 29th IEEE International Conference on Data Engineering, ICDE, pp. 1009–1020. IEEE Computer Society (2013)
Battista, G.D., Eades, P., Tamassia, R., Tollis, I.G.: Graph Drawing: Algorithms for the Visualization of Graphs. Prentice-Hall, Upper Saddle River (1999)
Veloso, R.R., Cerf, L., Jr., W.M., Zaki, M.J.: Reachability queries in very large graphs: a fast refined online search approach. In: Proceedings of the 17th International Conference on Extending Database Technology, EDBT, pp. 511–522. Springer (2014)
Sengupta, N., Bagchi, A., Ramanath, M., Bedathur, S.: ARROW: approximating reachability using random walks over web-scale graphs. In: Proceedings of the 35th IEEE International Conference on Data Engineering, ICDE, pp. 470–481. IEEE Computer Society (2019)
Su, Z., Wang, D., Zhang, X., Cui, L., Miao, C.: Efficient reachability query with extreme labeling filter. In: Proceedings of the 5th International Conference on Web Search and Data Mining, WSDM, pp. 966–975. ACM (2022)
Anirban, S., Wang, J., Islam, M.S.: Modular decomposition-based graph compression for fast reachability detection. Data Sci. Eng. 4(3), 193–207 (2019)
Zhou, J., Zhou, S., Yu, J.X., Wei, H., Chen, Z., Tang, X.: DAG reduction: fast answering reachability queries. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 375–390. ACM (2017)
Zhou, J., Yu, J.X., Qiu, Y., Tang, X., Chen, Z., Du, M.: Fast reachability queries answering based on RCN reduction. IEEE Trans. Knowl. Data Eng. 35(3), 2590–2609 (2023)
Dietrich, J., Chang, L., Qian, L., Henry, L.M., McCartin, C., Scholz, B.: Efficient sink-reachability analysis via graph reduction. IEEE Trans. Knowl. Data Eng. 34(11), 5321–5335 (2022)
Valstar, L.D.J., Fletcher, G.H.L., Yoshida, Y.: Landmark indexing for evaluation of label-constrained reachability queries. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 345–358. ACM (2017)
Peng, Y., Zhang, Y., Lin, X., Qin, L., Zhang, W.: Answering billion-scale label-constrained reachability queries within microsecond. Proc. VLDB Endow. 13(6), 812–825 (2020)
Peng, Y., Lin, X., Zhang, Y., Zhang, W., Qin, L.: Answering reachability and k-reach queries on large graphs with label constraints. VLDB J. 31(1), 101–127 (2022)
Chen, X., Peng, Y., Wang, S., Yu, J.X.: DLCR: efficient indexing for label-constrained reachability queries on large dynamic graphs. Proc. VLDB Endow. 15(8), 1645–1657 (2022)
Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data (2014)
Leskovec, J., Sosič, R.: SNAP: a general purpose network analysis and graph mining library in C++. http://snap.stanford.edu/snap (2014)
Barabsi, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant Nos. 61932004 and 62072205).
Funding
This work was supported by the National Natural Science Foundation of China (Grant Nos. 61932004 and 62072205).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Daoliang He: conceptualization, methodology, software, validation, writing—original draft. Pingpeng Yuan: conceptualization, methodology, writing—review and editing, supervision, project administration, funding acquisition.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Research involving human and/or animal rights
The study does not involve any human or animal study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
He, D., Yuan, P. TDS: fast answering reachability queries with hierarchical traversal trees. Cluster Comput 28, 178 (2025). https://doi.org/10.1007/s10586-024-04814-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10586-024-04814-8