[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Mining Keys for Graphs

Published: 02 July 2024 Publication History

Abstract

Keys for graphs are a class of data quality rules that use topological and value constraints to uniquely identify entities in a data graph. They have been studied to support object identification, knowledge fusion, data deduplication, and social network reconciliation. Manual specification and discovery of graph keys is tedious and infeasible over large-scale graphs. To make GKeys useful in practice, we study the GKey discovery problem, and present GKMiner, an algorithm that mines keys over graphs. Our algorithm discovers keys in a graph via frequent subgraph expansion, and notably, identifies recursive keys, i.e., where the unique identification of an entity type is dependent upon the identification of another entity type. We introduce the key properties, minimality and support, which effectively help to reduce the space of candidate keys. GKMiner uses a set of auxillary structures to summarize an input graph, and to identify likely candidate keys for greater pruning efficiency and evaluation of the search space. Our evaluation shows that identifying and using recursive keys in entity linking, lead to improved accuracy, over keys found using existing graph key mining techniques.

References

[1]
Dong X.L., Gabrilovich E., Heitz G., Horn W., Murphy K., Sun S., Zhang W., From data fusion to knowledge fusion, VLDB 7 (10) (2014) 881–892.
[2]
Akhtar W., Cortés-Calabuig A., Paredaens J., Constraints in RDF, in: International Workshop on Semantics in Data and Knowledge Bases, Springer, 2010, pp. 23–39.
[3]
M. Atencia, M. Chein, M. Croitoru, J. David, M. Leclère, N. Pernelle, F. Saïs, F. Scharffe, D. Symeonidou, Defining key semantics for the RDF datasets: experiments and evaluations, in: International Conference on Conceptual Structures, 2014, pp. 65–78.
[4]
Huhtala Y., Kärkkäinen J., Porkka P., Toivonen H., TANE: An efficient algorithm for discovering functional and approximate dependencies, Comput. J. 42 (2) (1999) 100–111.
[5]
Buneman P., Davidson S., Fan W., Hara C., Tan W.-C., Keys for XML, Comput. Netw. 39 (5) (2002) 473–487.
[6]
Fan W., Lu P., Dependencies for graphs, ACM Trans. Database Syst. 44 (2) (2019) 1–40.
[7]
Langouri M.A., Mansfield A., Chiang F., Wu Y., Inconsistency detection with temporal graph functional dependencies, in: 39th IEEE International Conference on Data Engineering, ICDE 2023, IEEE, 2023, pp. 464–476.
[8]
Fan W., Wang X., Wu Y., Xu J., Association rules with graph patterns, Proc. VLDB Endow. 8 (12) (2015) 1502–1513.
[9]
Fan W., Fu W., Jin R., Lu P., Tian C., Discovering association rules from big graphs, Proc. VLDB Endow. 15 (7) (2022) 1479–1492.
[10]
Fan W., Fan Z., Tian C., Dong X.L., Keys for graphs, Proc. VLDB Endow. 8 (12) (2015) 1590–1601.
[11]
Ma H., Alipourlangouri M., Wu Y., Chiang F., Pi J., Ontology-based entity matching in attributed graphs, Proc. VLDB Endow. 12 (10) (2019) 1195–1207.
[12]
Hellings J., Gyssens M., Paredaens J., Wu Y., Implication and axiomatization of functional constraints on patterns with an application to the RDF data model, in: Foundations of Information and Knowledge Systems, Springer, 2014, pp. 250–269.
[13]
Lehmann J., Isele R., Jakob M., Jentzsch A., Kontokostas D., Mendes P.N., Hellmann S., Morsey M., Van Kleef P., Auer S., et al., Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia, Semant. Web (2015).
[14]
Abiteboul S., Hull R., Vianu V., Foundations of Databases, Vol. 8, Addison-Wesley Reading, 1995.
[15]
Birnick J., Bläsius T., Friedrich T., Naumann F., Papenbrock T., Schirneck M., Hitting set enumeration with partial information for unique column combination discovery, Proc. VLDB Endow. 13 (12) (2020) 2270–2283.
[16]
Wei Z., Leck U., Link S., Discovery and ranking of embedded uniqueness constraints, Proc. VLDB Endow. 12 (13) (2019) 2339–2352.
[17]
Atencia M., David J., Scharffe F., Keys and pseudo-keys detection for web datasets cleansing and interlinking, in: International Conference on Knowledge Engineering and Knowledge Management, Springer, 2012, pp. 144–153.
[18]
Pernelle N., Saïs F., Symeonidou D., An automatic key discovery approach for data linking, J. Web Semant. 23 (2013) 16–30.
[19]
T. Soru, E. Marx, A.-C. Ngonga Ngomo, ROCKER: A refinement operator for key discovery, in: Proceedings of the 24th International Conference on World Wide Web, 2015, pp. 1025–1033.
[20]
W. Fan, Y. Wu, J. Xu, Functional dependencies for graphs, in: SIGMOD International Conference on Management of Data, 2016, pp. 1843–1857.
[21]
Deng T., Hou L., Han Z., Keys as features for graph entity matching, in: 2020 IEEE 36th International Conference on Data Engineering (ICDE), IEEE, 2020, pp. 1974–1977.
[22]
R. Angles, A. Bonifati, S. Dumbrava, G. Fletcher, K.W. Hare, J. Hidders, V.E. Lee, B. Li, L. Libkin, W. Martens, et al., Pg-keys: Keys for property graphs, in: Proceedings of the 2021 International Conference on Management of Data, 2021, pp. 2423–2436.
[23]
Link S., Neo4j keys, in: International Conference on Conceptual Modeling, Springer, 2020, pp. 19–33.
[24]
Skavantzos P., Zhao K., Link S., Uniqueness constraints on property graphs, in: International Conference on Advanced Information Systems Engineering, Springer, 2021, pp. 280–295.
[25]
Heise A., Quiané-Ruiz J.-A., Abedjan Z., Jentzsch A., Naumann F., Scalable discovery of unique column combinations, Proc. VLDB Endow. 7 (4) (2013) 301–312.
[26]
Y. Sismanis, P. Brown, P.J. Haas, B. Reinwald, Gordian: efficient and scalable discovery of composite keys, in: Proceedings of the 32nd International Conference on Very Large Data Bases, 2006, pp. 691–702.
[27]
Baskaran S., Keller A., Chiang F., Golab L., Szlichta J., Efficient discovery of ontology functional dependencies, in: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, in: CIKM ’17, Association for Computing Machinery, New York, NY, USA, ISBN 9781450349185, 2017, pp. 1847–1856,. https://doi.org/10.1145/3132847.3132879.
[28]
Zheng Z., Zheng L., Alipourlangouri M., Chiang F., Golab L., Szlichta J., Baskaran S., Contextual data cleaning with ontology functional dependencies, J. Data and Information Quality (ISSN ) 14 (3) (2022),. https://doi.org/10.1145/3524303.
[29]
Noronha L., Chiang F., Discovery of temporal graph functional dependencies, in: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, in: CIKM ’21, Association for Computing Machinery, New York, NY, USA, ISBN 9781450384469, 2021, pp. 3348–3352,. https://doi.org/10.1145/3459637.3482087.
[30]
Symeonidou D., Armant V., Pernelle N., Saïs F., Sakey: Scalable almost key discovery in RDF data, in: International Semantic Web Conference, Springer, 2014, pp. 33–49.
[31]
Symeonidou D., Galárraga L., Pernelle N., Saïs F., Suchanek F., VICKEY: mining conditional keys on knowledge bases, in: International Semantic Web Conference, Springer, 2017, pp. 661–677.
[32]
Fan W., Hu C., Liu X., Lu P., Discovering graph functional dependencies, ACM Trans. Database Syst. 45 (3) (2020) 1–42.
[33]
M. Alipourlangouri, F. Chiang, KeyMiner: Discovering Keys for Graphs, in: VLDB Workshop, 2018.
[34]
Namaki M.H., Wu Y., Song Q., Lin P., Ge T., Discovering graph temporal association rules, in: CIKM, ACM, 2017, pp. 1697–1706.
[35]
Berlingerio M., Bonchi F., Bringmann B., Gionis A., Mining graph evolution rules, in: Machine Learning and Knowledge Discovery in Databases, 2009, pp. 115–130.
[36]
Castano S., Ferrara A., Montanelli S., Varese G., Ontology and instance matching, in: Paliouras G., Spyropoulos C.D., Tsatsaronis G. (Eds.), Knowledge-Driven Multimedia Information Extraction and Ontology Evolution: Bridging the Semantic Gap, Springer Berlin Heidelberg, Berlin, Heidelberg, ISBN 978-3-642-20795-2, 2011, pp. 167–195,.
[37]
Azmy M., Shi P., Lin J., Ilyas I.F., Matching entities across different knowledge graphs with graph embeddings, 2019, CoRR, abs/1903.06607. URL http://arxiv.org/abs/1903.06607.
[38]
R. Zass, A. Shashua, Probabilistic graph and hypergraph matching, in: 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8, https://doi.org/10.1109/CVPR.2008.4587500.
[39]
Fan W., Wang X., Wu Y., Incremental graph pattern matching, ACM Trans. Database Syst. 38 (3) (2013) 1–47.
[40]
Peterson J.L., Silberschatz A., Operating System Concepts, Addison-Wesley Longman Publishing Co., Inc., 1985.
[42]
Mahdisoltani F., Biega J., Suchanek F., YAGO3: A knowledge base from multilingual wikipedias, in: CIDR, 2014.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Data & Knowledge Engineering
Data & Knowledge Engineering  Volume 150, Issue C
Mar 2024
247 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 02 July 2024

Author Tags

  1. Graph keys
  2. Key mining
  3. Recursive keys

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media