[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Discovering Graph Differential Dependencies

  • Conference paper
  • First Online:
Databases Theory and Applications (ADC 2023)

Abstract

Graph differential dependencies (GDDs) are a novel class of integrity constraints in property graphs for capturing and expressing the semantics of difference in graph data. They are more expressive, and subsume other graph dependencies; and thus, are more useful for addressing many real-world graph data quality/management problems. In this paper, we study the general discovery problem for GDDs – the task of finding a non-redundant and succinct set of GDDs that hold in a given property graph. Indeed, we present characterisations of GDDs based on their semantics, extend existing data structures, and device pruning strategies to enable our proposed level-wise discovery algorithm, GDDMiner, returns a minimal cover of valid GDDs efficiently. Further, we perform experiments over three real-world graphs to demonstrate the feasibility, scalability, and effectiveness of our solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 47.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 59.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    we adopt this graph pattern counting metric due to its anti-monotone property.

References

  1. Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. VLDB J. 24(4), 557–581 (2015). https://doi.org/10.1007/s00778-015-0389-y

    Article  Google Scholar 

  2. Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499. Santiago, Chile (1994)

    Google Scholar 

  3. Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., Lakhal, L.: Mining minimal non-redundant association rules using frequent closed itemsets. In: Lloyd, J., et al. (eds.) CL 2000. LNCS (LNAI), vol. 1861, pp. 972–986. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44957-4_65

    Chapter  Google Scholar 

  4. Bringmann, B., Nijssen, S.: What is frequent in a single graph? In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 858–863. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68125-0_84

    Chapter  Google Scholar 

  5. Caruccio, L., Deufemia, V., Polese, G.: Mining relaxed functional dependencies from data. Data Min. Knowl. Disc. 34(2), 443–477 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  6. Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P.: Grami: frequent subgraph and pattern mining in a single large graph. Proc. VLDB Endowment 7(7), 517–528 (2014)

    Article  Google Scholar 

  7. Fan, W.: Big graphs: challenges and opportunities. Proc. VLDB Endowment 15(12), 3782–3797 (2022)

    Article  Google Scholar 

  8. Fan, W., Geng, L., Jin, R., Lu, P., Tugay, R., Yu, W.: Linking entities across relations and graphs. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 634–647. IEEE (2022)

    Google Scholar 

  9. Fan, W., Hu, C., Liu, X., Lu, P.: Discovering graph functional dependencies. ACM Trans. Database Syst. (TODS) 45(3), 1–42 (2020)

    Article  MathSciNet  Google Scholar 

  10. Fan, W., Lu, P.: Dependencies for graphs. ACM Trans. Database Syst. (TODS) 44(2), 1–40 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  11. Fan, W., Lu, P., Tian, C., Zhou, J.: Deducing certain fixes to graphs. Proc. VLDB Endowment 12(7), 752–765 (2019)

    Article  Google Scholar 

  12. Feng, Z., et al.: A schema-driven synthetic knowledge graph generation approach with extended graph differential dependencies (gdd x s). IEEE Access 9, 5609–5639 (2020)

    Article  Google Scholar 

  13. Kwashie, S., Liu, J., Li, J., Ye, F.: Mining differential dependencies: a subspace clustering approach. In: Wang, H., Sharaf, M.A. (eds.) ADC 2014. LNCS, vol. 8506, pp. 50–61. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08608-8_5

    Chapter  Google Scholar 

  14. Kwashie, S., Liu, J., Li, J., Ye, F.: Efficient discovery of differential dependencies through association rules mining. In: Sharaf, M.A., Cheema, M.A., Qi, J. (eds.) ADC 2015. LNCS, vol. 9093, pp. 3–15. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19548-3_1

    Chapter  Google Scholar 

  15. Kwashie, S., Liu, L., Liu, J., Stumptner, M., Li, J., Yang, L.: Certus: an effective entity resolution approach with graph differential dependencies (GDDs). Proc. VLDB Endowment 12(6), 653–666 (2019)

    Article  Google Scholar 

  16. Lin, P., Song, Q., Wu, Y.: Fact checking in knowledge graphs with ontological subgraph patterns. Data Sci. Eng. 3(4), 341–358 (2018)

    Article  Google Scholar 

  17. Liu, D., et al.: An efficient approach for discovering graph entity dependencies (GEDs). arXiv preprint arXiv:2301.06264 (2023)

  18. Liu, J., Kwashie, S., Li, J., Liu, L., Bewong, M.: Linking graph entities with multiplicity and provenance. In: 2nd International Workshop on EntitY REtrieval: EYRE 2019, pp. 1–7. Association for Computing Machinery (ACM) (2019)

    Google Scholar 

  19. Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data-a review. IEEE Trans. Knowl. Data Eng. 24(2), 251–264 (2010)

    Article  Google Scholar 

  20. Mhedhbi, A., Salihoglu, S.: Optimizing subgraph queries by combining binary and worst-case optimal joins. Proc. VLDB Endowment, vol. 12. no. 11 (2019)

    Google Scholar 

  21. Song, Q., Lin, P., Ma, H., Wu, Y.: Explaining missing data in graphs: a constraint-based approach. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 1476–1487. IEEE (2021)

    Google Scholar 

  22. Song, S., Chen, L.: Differential dependencies: reasoning and discovery. ACM Trans. Database Syst. (TODS) 36(3), 1–41 (2011)

    Article  Google Scholar 

  23. Song, S., Gao, F., Huang, R., Wang, C.: Data dependencies extended for variety and veracity: a family tree. IEEE Trans. Knowl. Data Eng. 34(10), 4717–4736 (2020)

    Article  Google Scholar 

  24. Zhou, G., et al.: FASTAGEDS: fast approximate graph entity dependency discovery. arXiv preprint arXiv:2304.02323 (2023)

Download references

Acknowledgement

This research project was supported in part by the following grant schemes: the Innovation fund of Chinese Marine Defense Technology Innovation Center under Grant JJ-2021-722-04; the Fundamental Research Funds for the Chinese Central Universities under Grant 2662023XXPY004; the Inner Mongolia Key Scientific and Technological Project under Grant 2021SZD0099.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xi Guo or Zaiwen Feng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Y. et al. (2024). Discovering Graph Differential Dependencies. In: Bao, Z., Borovica-Gajic, R., Qiu, R., Choudhury, F., Yang, Z. (eds) Databases Theory and Applications. ADC 2023. Lecture Notes in Computer Science, vol 14386. Springer, Cham. https://doi.org/10.1007/978-3-031-47843-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47843-7_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47842-0

  • Online ISBN: 978-3-031-47843-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics