Abstract
Graph differential dependencies (GDDs) are a novel class of integrity constraints in property graphs for capturing and expressing the semantics of difference in graph data. They are more expressive, and subsume other graph dependencies; and thus, are more useful for addressing many real-world graph data quality/management problems. In this paper, we study the general discovery problem for GDDs – the task of finding a non-redundant and succinct set of GDDs that hold in a given property graph. Indeed, we present characterisations of GDDs based on their semantics, extend existing data structures, and device pruning strategies to enable our proposed level-wise discovery algorithm, GDDMiner, returns a minimal cover of valid GDDs efficiently. Further, we perform experiments over three real-world graphs to demonstrate the feasibility, scalability, and effectiveness of our solution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
we adopt this graph pattern counting metric due to its anti-monotone property.
References
Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. VLDB J. 24(4), 557–581 (2015). https://doi.org/10.1007/s00778-015-0389-y
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499. Santiago, Chile (1994)
Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., Lakhal, L.: Mining minimal non-redundant association rules using frequent closed itemsets. In: Lloyd, J., et al. (eds.) CL 2000. LNCS (LNAI), vol. 1861, pp. 972–986. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44957-4_65
Bringmann, B., Nijssen, S.: What is frequent in a single graph? In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 858–863. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68125-0_84
Caruccio, L., Deufemia, V., Polese, G.: Mining relaxed functional dependencies from data. Data Min. Knowl. Disc. 34(2), 443–477 (2020)
Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P.: Grami: frequent subgraph and pattern mining in a single large graph. Proc. VLDB Endowment 7(7), 517–528 (2014)
Fan, W.: Big graphs: challenges and opportunities. Proc. VLDB Endowment 15(12), 3782–3797 (2022)
Fan, W., Geng, L., Jin, R., Lu, P., Tugay, R., Yu, W.: Linking entities across relations and graphs. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 634–647. IEEE (2022)
Fan, W., Hu, C., Liu, X., Lu, P.: Discovering graph functional dependencies. ACM Trans. Database Syst. (TODS) 45(3), 1–42 (2020)
Fan, W., Lu, P.: Dependencies for graphs. ACM Trans. Database Syst. (TODS) 44(2), 1–40 (2019)
Fan, W., Lu, P., Tian, C., Zhou, J.: Deducing certain fixes to graphs. Proc. VLDB Endowment 12(7), 752–765 (2019)
Feng, Z., et al.: A schema-driven synthetic knowledge graph generation approach with extended graph differential dependencies (gdd x s). IEEE Access 9, 5609–5639 (2020)
Kwashie, S., Liu, J., Li, J., Ye, F.: Mining differential dependencies: a subspace clustering approach. In: Wang, H., Sharaf, M.A. (eds.) ADC 2014. LNCS, vol. 8506, pp. 50–61. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08608-8_5
Kwashie, S., Liu, J., Li, J., Ye, F.: Efficient discovery of differential dependencies through association rules mining. In: Sharaf, M.A., Cheema, M.A., Qi, J. (eds.) ADC 2015. LNCS, vol. 9093, pp. 3–15. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19548-3_1
Kwashie, S., Liu, L., Liu, J., Stumptner, M., Li, J., Yang, L.: Certus: an effective entity resolution approach with graph differential dependencies (GDDs). Proc. VLDB Endowment 12(6), 653–666 (2019)
Lin, P., Song, Q., Wu, Y.: Fact checking in knowledge graphs with ontological subgraph patterns. Data Sci. Eng. 3(4), 341–358 (2018)
Liu, D., et al.: An efficient approach for discovering graph entity dependencies (GEDs). arXiv preprint arXiv:2301.06264 (2023)
Liu, J., Kwashie, S., Li, J., Liu, L., Bewong, M.: Linking graph entities with multiplicity and provenance. In: 2nd International Workshop on EntitY REtrieval: EYRE 2019, pp. 1–7. Association for Computing Machinery (ACM) (2019)
Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data-a review. IEEE Trans. Knowl. Data Eng. 24(2), 251–264 (2010)
Mhedhbi, A., Salihoglu, S.: Optimizing subgraph queries by combining binary and worst-case optimal joins. Proc. VLDB Endowment, vol. 12. no. 11 (2019)
Song, Q., Lin, P., Ma, H., Wu, Y.: Explaining missing data in graphs: a constraint-based approach. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 1476–1487. IEEE (2021)
Song, S., Chen, L.: Differential dependencies: reasoning and discovery. ACM Trans. Database Syst. (TODS) 36(3), 1–41 (2011)
Song, S., Gao, F., Huang, R., Wang, C.: Data dependencies extended for variety and veracity: a family tree. IEEE Trans. Knowl. Data Eng. 34(10), 4717–4736 (2020)
Zhou, G., et al.: FASTAGEDS: fast approximate graph entity dependency discovery. arXiv preprint arXiv:2304.02323 (2023)
Acknowledgement
This research project was supported in part by the following grant schemes: the Innovation fund of Chinese Marine Defense Technology Innovation Center under Grant JJ-2021-722-04; the Fundamental Research Funds for the Chinese Central Universities under Grant 2662023XXPY004; the Inner Mongolia Key Scientific and Technological Project under Grant 2021SZD0099.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Y. et al. (2024). Discovering Graph Differential Dependencies. In: Bao, Z., Borovica-Gajic, R., Qiu, R., Choudhury, F., Yang, Z. (eds) Databases Theory and Applications. ADC 2023. Lecture Notes in Computer Science, vol 14386. Springer, Cham. https://doi.org/10.1007/978-3-031-47843-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-47843-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47842-0
Online ISBN: 978-3-031-47843-7
eBook Packages: Computer ScienceComputer Science (R0)