[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Making It Tractable to Detect and Correct Errors in Graphs

Published: 16 December 2024 Publication History

Abstract

This article develops Hercules, a system for entity resolution (ER), conflict resolution (CR), timeliness deduction (TD), and missing value/link imputation (MI) in graphs. It proposes GCR+s, a class of graph cleaning rules (GCR) that support not only predicates for ER and CR but also temporal orders to deduce timeliness and data extraction to impute missing data. As opposed to previous graph rules, GCR+s are defined with a dual graph pattern to accommodate irregular structures of schemaless graphs and adopt patterns of a star form to reduce the complexity. We show that while the implication and satisfiability problems are intractable for GCR+s, it is in polynomial time to detect and correct errors with GCR+s. Underlying Hercules, we train a ranking model to predict the temporal orders on attributes and embed it as a predicate of GCR+s. We provide an algorithm for discovering GCR+s by combining the generations of patterns and predicates. We also develop a method for conducting ER, CR, TD, and MI in the same process to improve the overall quality of graphs by leveraging their interactions and chasing with GCR+s; we show that the method has the Church–Rosser property under certain conditions. Using real-life and synthetic graphs, we empirically verify that Hercules is 53% more accurate than the state-of-the-art graph cleaning systems and performs comparably in efficiency and scalability.

References

[1]
2017. Wikidata Vandalism Dataset. Retrieved from https://www.wsdm-cup-2017.org/vandalism-detection.html
[2]
2021. DBLP Collaboration Network. Retrieved from https://snap.stanford.edu/data/com-DBLP.html
[3]
[4]
2022. DBpedia. Retrieved from http://www.dbpedia.org
[5]
2022. WikiData. Retrieved from https://www.wikidata.org/
[6]
Ziawasch Abedjan, Patrick Schulze, and Felix Naumann. 2014. DFD: Efficient functional dependency discovery. In CIKM. 949–958.
[7]
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley.
[8]
Naser Ahmadi, Viet-Phi Huynh, Venkata Vamsikrishna Meduri, Stefano Ortona, and Paolo Papotti. 2020. Mining expressive rules in knowledge graphs. ACM J. Data Inf. Qual. 12, 2 (2020), 8:1–8:27.
[9]
João Paulo Aires and Felipe Meneguzzi. 2017. Norm conflict identification using deep learning. In AAMAS Workshops. 194–207.
[10]
Waseem Akhtar, Alvaro Cortés-Calabuig, and Jan Paredaens. 2010. Constraints in RDF. In SDKB.
[11]
Arvind Arasu, Michaela Götz, and Raghav Kaushik. 2010. On active learning of record matching packages. In SIGMOD. 783–794.
[12]
Arvind Arasu, Christopher Ré, and Dan Suciu. 2009. Large-scale deduplication with constraints using dedupalog. In ICDE. 952–963.
[13]
Marcelo Arenas, Leopoldo Bertossi, and Jan Chomicki. 1999. Consistent query answers in inconsistent databases. In PODS. 68–79.
[14]
Abdallah Arioua and Angela Bonifati. 2018. User-guided repairing of inconsistent knowledge bases. In EDBT.
[15]
Hiba Arnaout, Trung-Kien Tran, Daria Stepanova, Mohamed Hassan Gad-Elrab, Simon Razniewski, and Gerhard Weikum. 2022. Utilizing language model probes for knowledge graph repair. In Wiki Workshop 2022.
[16]
Rayhana Baghli and Bruno Traverson. 2014. Verbalization of business rules—Application to OCL constraints in the utility domain. In MODELSWARD. 348–355.
[17]
Zeinab Bahmani, Leopoldo E. Bertossi, and Nikolaos Vasiloglou. 2017. ERBlox: Combining matching dependencies with machine learning for entity resolution. Int. J. Approx. Reason. 83 (2017), 118–141.
[18]
Leopoldo Bertossi. 2011. Database Repairing and Consistent Query Answering. Morgan & Claypool Publishers.
[19]
Leopoldo E. Bertossi, Solmaz Kolahi, and Laks V. S. Lakshmanan. 2013. Data cleaning and query answering with matching dependencies and matching functions. Theory Comput. Syst. 52, 3 (2013), 441–482.
[20]
Indrajit Bhattacharya and Lise Getoor. 2007. Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data 1, 1 (2007), 5.
[21]
Tobias Bleifuß, Sebastian Kruse, and Felix Naumann. 2017. Efficient denial constraint discovery with hydra. Proc. VLDB 11, 3 (2017), 311–323.
[22]
Aleksandar Bojchevski and Stephan Günnemann. 2019. Certifiable robustness to graph perturbations. In NeurIPS.
[23]
Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating sentences from a continuous space. In CoNLL. 10–21.
[24]
David A. Bright, Russell Brewer, and Carlo Morselli. 2021. Using social network analysis to study crime: Navigating the challenges of criminal justice records. Soc. Netw. 66 (2021), 50–64.
[25]
Businesswire. 2022. Over 80 Percent of Companies Rely on Stale Data for Decision-Making. Retrieved from https://www.businesswire.com/news/home/20220511005403/en/Over-80-Percent-of-Companies-Rely-on-Stale-Data-for-Decision-Making
[26]
Yang Cao, Wenfei Fan, and Wenyuan Yu. 2013. Determining the relative accuracy of attributes. In SIGMOD. 565–576.
[27]
Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. 2010. Toward an architecture for never-ending language learning. In AAAI. 1306–1313.
[28]
Karel Cemus and Tomas Cerny. 2017. Automated extraction of business documentation in enterprise information systems. ACM SIGAPP Appl. Comput. Rev. 16, 4 (2017), 5–13.
[29]
Lihan Chen, Sihang Jiang, Jingping Liu, Chao Wang, Sheng Zhang, Chenhao Xie, Jiaqing Liang, Yanghua Xiao, and Rui Song. 2022. Rule mining over knowledge graphs via reinforcement learning. Knowl. Based Syst. 242 (2022).
[30]
Meiqi Chen, Yuan Zhang, Xiaoyu Kou, Yuntao Li, and Yan Zhang. 2021. r-GAT: Relational graph attention network for multi-relational graphs. CoRR abs/2109.05922 (2021).
[31]
Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP. 1724–1734.
[32]
Xu Chu, Ihab F. Ilyas, and Paraschos Koutris. 2016. Distributed data deduplication. Proc. VLDB 9, 11 (2016), 864–875.
[33]
Xu Chu, Ihab F. Ilyas, and Paolo Papotti. 2013. Discovering denial constraints. Proc. VLDB 6, 13 (2013), 1498–1509.
[34]
Gao Cong, Wenfei Fan, Floris Geerts, Xibei Jia, and Shuai Ma. 2007. Improving data quality: Consistency and accuracy. In VLDB. 315–326.
[35]
Alvaro Cortés-Calabuig and Jan Paredaens. 2012. Semantics of constraints in RDFS. In AMW.
[36]
Pádraig Cunningham and Sarah Jane Delany. 2020. Underestimation bias and underfitting in machine learning. In TAILOR. 20–31.
[37]
Sanjib Das, Paul Suganthan G. C., AnHai Doan, Jeffrey F. Naughton, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, Vijay Raghavendra, and Youngchoon Park. 2017. Falcon: Scaling up hands-off crowdsourced entity matching to build cloud services. In SIGMOD. 1431–1446.
[38]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT.
[39]
Benjamin Doerr. 2020. Probabilistic tools for the analysis of randomized optimization heuristics. In Theory of Evolutionary Computation. 1–87.
[40]
Mohamad Dolatshah, Mathew Teoh, Jiannan Wang, and Jian Pei. 2018. Cleaning crowdsourced labels using oracles for statistical classification. Proc. VLDB 12, 4 (2018), 376–389.
[41]
Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq R. Joty, Mourad Ouzzani, and Nan Tang. 2018. Distributed representations of tuples for entity resolution. Proc. VLDB 11, 11 (2018), 1454–1467.
[42]
Jonathan A. Edlow and Peter J. Pronovost. 2023. Misdiagnosis in the emergency department: Time for a system solution. J. Am. Med. Assoc. 329, 8 (2023), 631–632.
[43]
Mohammed Elseidy, Ehab Abdelhamid, Spiros Skiadopoulos, and Panos Kalnis. 2014. GRAMI: Frequent subgraph and pattern mining in a single large graph. Proc. VLDB 7, 7 (2014), 517–528.
[44]
Christos Faloutsos, Danai Koutra, and Joshua T. Vogelstein. 2013. DELTACON: A principled massive-graph similarity function. In SDM. 162–170.
[45]
Jicong Fan, Yuqian Zhang, and Madeleine Udell. 2020. Polynomial matrix completion for missing data imputation and transductive learning. In IAAI. 3842–3849.
[46]
Wenfei Fan. 2022. Big graphs: Challenges and opportunities. Proc. VLDB 15, 12 (2022), 3782–3797.
[47]
Wenfei Fan, Zhe Fan, Chao Tian, and Xin Luna Dong. 2015. Keys for graphs. Proc. VLDB 8, 12 (2015), 1590–1601.
[48]
Wenfei Fan, Wenzhi Fu, Ruochun Jin, Muyang Liu, Ping Lu, and Chao Tian. 2023. Making it tractable to catch duplicates and conflicts in graphs. Proc. ACM Manag. Data 1, 1 (2023), 86:1–86:28.
[49]
Wenfei Fan, Wenzhi Fu, Ruochun Jin, Ping Lu, and Chao Tian. 2022. Discovering association rules from big graphs. Proc. VLDB 15, 7 (2022), 1479–1492.
[50]
Wenfei Fan, Hong Gao, Xibei Jia, Jianzhong Li, and Shuai Ma. 2011. Dynamic constraints for record matching. VLDB J. 20, 4 (2011), 495–520.
[51]
Wenfei Fan, Ling Ge, Ruochun Jin, Ping Lu, and Wenyuan Yu. 2022. Linking entities across relations and graphs. In ICDE.
[52]
Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. 2008. Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. 33, 1 (2008), 6:1–6:48.
[53]
Wenfei Fan, Floris Geerts, Jianzhong Li, and Ming Xiong. 2011. Discovering conditional functional dependencies. IEEE Trans. Knowl. Data Eng. 23, 5 (2011), 683–698.
[54]
Wenfei Fan, Floris Geerts, Nan Tang, and Wenyuan Yu. 2014. Conflict resolution with data currency and consistency. J. Data Inf. Qual. 5, 1–2 (2014), 6:1–6:37.
[55]
Wenfei Fan, Ziyan Han, Yaoshu Wang, and Min Xie. 2023. Discovering Top-k rules using subjective and objective criteria. Proc. ACM Manag. Data 1, 1 (2023), 70:1–70:29.
[56]
Wenfei Fan, Chunming Hu, Xueli Liu, and Ping Lu. 2020. Discovering graph functional dependencies. ACM Trans. Database Syst. 45, 3 (2020), 15:1–15:42.
[57]
Wenfei Fan, Ruochun Jin, Muyang Liu, Ping Lu, Chao Tian, and Jingren Zhou. 2020. Capturing associations in graphs. Proc. VLDB 13, 11 (2020), 1863–1876.
[58]
Wenfei Fan, Ruochun Jin, Ping Lu, Chao Tian, and Ruiqi Xu. 2022. Towards event prediction in temporal graphs. Proc. VLDB 15, 9 (2022), 1861–1874.
[59]
Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and Wenyuan Yu. 2012. Towards certain fixes with editing rules and master data. VLDB J. 21, 2 (2012), 213–238.
[60]
Wenfei Fan, Xueli Liu, Ping Lu, and Chao Tian. 2020. Catching numeric inconsistencies in graphs. ACM Trans. Database Syst. 45, 2 (2020), 9:1–9:47.
[61]
Wenfei Fan and Ping Lu. 2019. Dependencies for graphs. ACM Trans. Database Syst. 44, 2 (2019), 5:1–5:40.
[62]
Wenfei Fan, Ping Lu, Kehan Pang, Ruochun Jin, and Wenyuan Yu. 2024. Linking entities across relations and graphs. ACM Trans. Database Syst. 49, 1 (2024), 2:1–2:50.
[63]
Wenfei Fan, Ping Lu, and Chao Tian. 2020. Unifying logic rules and machine learning for entity enhancing. Sci. Chin. Inf. Sci. 63, 7 (2020).
[64]
Wenfei Fan, Ping Lu, Chao Tian, and Jingren Zhou. 2019. Deducing certain fixes to graphs. Proc. VLDB 12, 7 (2019), 752–765.
[65]
Wenfei Fan and Chao Tian. 2022. Incremental graph computations: Doable and undoable. ACM Trans. Database Syst. 47, 2 (2022), 6:1–6:44.
[66]
Wenfei Fan, Chao Tian, Yanghao Wang, and Qiang Yin. 2021. Parallel discrepancy detection and incremental detection. Proc. VLDB 14, 8 (2021), 1351–1364.
[67]
Wenfei Fan, Resul Tugay, Yaoshu Wang, Min Xie, and Muhammad Asif Ali. 2023. Learning and deducing temporal orders. Proc. VLDB 16, 8 (2023), 1944–1957.
[68]
Wenfei Fan, Yinghui Wu, and Jingbo Xu. 2016. Functional dependencies for graphs. In SIGMOD. 1843–1857.
[69]
Nausheen Fatma, Manoj Chinnakotla, and Manish Shrivastava. 2017. The unusual suspects: Deep learning based mining of interesting entity trivia from knowledge graphs. In AAAI.
[70]
Annamaria Ficara, Lucia Cavallaro, Francesco Curreri, Giacomo Fiumara, Pasquale De Meo, Ovidiu Bagdasar, Wei Song, and Antonio Liotta. 2021. Criminal networks analysis in missing data scenarios through graph distances. PLoS One 16, 8 (2021), e0255067.
[71]
Peter A. Flach and Iztok Savnik. 1999. Database dependency discovery: A machine learning approach. AI Commun. 12, 3 (1999), 139–160.
[72]
Luis Galárraga, Christina Teflioudi, Katja Hose, and Fabian M. Suchanek. 2015. Fast rule mining in ontological knowledge bases with AMIE+. VLDB J. 24, 6 (2015), 707–730.
[73]
Luis Antonio Galárraga, Christina Teflioudi, Katja Hose, and Fabian Suchanek. 2013. AMIE: Association rule mining under incomplete evidence in ontological knowledge bases. In WWW.
[74]
Kun Gao, Katsumi Inoue, Yongzhi Cao, and Hanpin Wang. 2022. Learning first-order rules with differentiable logic program semantics. In IJCAI. 3008–3014.
[75]
Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple contrastive learning of sentence embeddings. In EMNLP. 6894–6910.
[76]
Alberto García-Durán, Sebastijan Dumancic, and Mathias Niepert. 2018. Learning sequence encoders for temporal knowledge graph completion. In EMNLP. 4816–4821.
[77]
Michael Garey and David Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company.
[78]
Congcong Ge, Yunjun Gao, Honghui Weng, Chong Zhang, Xiaoye Miao, and Baihua Zheng. 2020. KGClean: An embedding powered knowledge graph cleaning framework. CoRR abs/2004.14478 (2020).
[79]
Liqiang Geng and Howard J. Hamilton. 2006. Interestingness measures for data mining: A survey. ACM Comput. Surv. 38, 3 (2006), 9.
[80]
Lukasz Golab, Howard Karloff, Flip Korn, Divesh Srivastava, and Bei Yu. 2008. On generating near-optimal tableaux for conditional functional dependencies. Proc. VLDB 1, 1 (2008), 376–390.
[81]
Songtao Guo, Xin Luna Dong, Divesh Srivastava, and Remi Zajac. 2010. Record linkage with uniqueness constraints and erroneous values. Proc. VLDB 3, 1 (2010), 417–428.
[82]
Mahboubeh Haddad, Fereshte Sheybani, HamidReza Naderi, Mohammad Saeed Sasan, Mona Najaf Najafi, Malihe Sedighi, and Atena Seddigh. 2021. Errors in diagnosing infectious diseases: A physician survey. Front. Med. 8 (2021), 779454.
[83]
Shuang Hao, Chengliang Chai, Guoliang Li, Nan Tang, Ning Wang, and Xiang Yu. 2023. HOFD: An outdated fact detector for knowledge bases. IEEE Trans. Know. Data Eng. (2023), 1–14.
[84]
Yuan He, Jiaoyan Chen, Denvar Antonyrajah, and Ian Horrocks. 2022. BERTMap: A BERT-Based ontology alignment system. In AAAI. 5684–5691.
[85]
Alireza Heidari, Joshua McGrath, Ihab F. Ilyas, and Theodoros Rekatsinas. 2019. HoloDetect: Few-shot learning for error detection. In SIGMOD. 829–846.
[86]
Jelle Hellings, Marc Gyssens, Jan Paredaens, and Yuqing Wu. 2016. Implication and axiomatization of functional and constant constraints. Ann. Math. Artif. Intell. 76, 3–4 (2016), 251–279.
[87]
Wenjie Hu, Yang Yang, Ziqiang Cheng, Carl Yang, and Xiang Ren. 2021. Time-series event prediction with evolutionary state graph. In WSDM.
[88]
Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. 2019. Knowledge graph embedding based question answering. In WSDM. 105–113.
[89]
Yka Huhtala, Juha Kärkkäinen, Pasi Porkka, and Hannu Toivonen. 1999. TANE: An efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42, 2 (1999), 100–111.
[90]
Eyke Hüllermeier and Stijn Vanderlooy. 2010. Combining predictions in pairwise classification: An optimal adaptive voting strategy and its relation to weighted voting. Pattern Recogn. 43, 1 (2010), 128–142.
[91]
Witold Jacak and Karin Pröll. 2011. Neural networks based system for cancer diagnosis support. In EUROCAST.
[92]
Wei Jin, Yao Ma, Xiaorui Liu, Xianfeng Tang, Suhang Wang, and Jiliang Tang. 2020. Graph structure learning for robust graph neural networks. In KDD. 66–74.
[93]
Seyed Mehran Kazemi and David Poole. 2018. SimplE: Embedding for link prediction in knowledge graphs. In NeurIPS. 4289–4300.
[94]
Anthony C. Klug. 1988. On conjunctive queries containing inequalities. J. ACM 35, 1 (1988), 146–160.
[95]
Lars Kolb, Andreas Thor, and Erhard Rahm. 2012. Dedoop: Efficient deduplication with hadoop. Proc. VLDB 5, 12 (2012), 1878–1881.
[96]
Lingzhen Kong, Lina Wang, Wenwen Gong, Chao Yan, Yucong Duan, and Lianyong Qi. 2022. LSH-aware multitype health data prediction with privacy preservation in edge environment. World Wide Web 25, 5 (2022), 1793–1808.
[97]
Hanna Köpcke, Andreas Thor, and Erhard Rahm. 2010. Evaluation of entity resolution approaches on real-world match problems. Proc. VLDB 3, 1 (2010), 484–493.
[98]
Clyde P. Kruskal, Larry Rudolph, and Marc Snir. 1990. A complexity theory of efficient parallel algorithms. Theor. Comput. Sci. 71, 1 (1990), 95–132.
[99]
Selasi Kwashie, Jixue Liu, Jiuyong Li, Lin Liu, Markus Stumptner, and Lujing Yang. 2019. Certus: An effective entity resolution approach with graph differential dependencies (GDDs). Proc. VLDB 12, 6 (2019), 653–666.
[100]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.
[101]
Manuel Leone, Stefano Huber, Akhil Arora, Alberto García-Durán, and Robert West. 2022. A critical re-evaluation of neural methods for entity alignment. Proc. VLDB 15, 8 (2022), 1712–1725.
[102]
Bing Li, Wei Wang, Yifang Sun, Linhan Zhang, Muhammad Asif Ali, and Yi Wang. 2020. GraphER: Token-centric entity resolution with graph convolutional neural networks. In AAAI.
[103]
Manling Li, Qi Zeng, Ying Lin, Kyunghyun Cho, Heng Ji, Jonathan May, Nathanael Chambers, and Clare Voss. 2020. Connecting the dots: Event graph schema induction with path language modeling. In EMNLP. 684–695.
[104]
Zixuan Li, Xiaolong Jin, Wei Li, Saiping Guan, Jiafeng Guo, Huawei Shen, Yuanzhuo Wang, and Xueqi Cheng. 2021. Temporal knowledge graph reasoning based on evolutional representation learning. In SIGIR.
[105]
Yi Heng Lim, Qi Zhu, Joshua Selfridge, and Muhammad Firmansyah Kasim. 2024. Parallelizing non-linear sequential models over the sequence length. In ICLR.
[106]
Xi Victoria Lin, Richard Socher, and Caiming Xiong. 2018. Multi-hop knowledge graph reasoning with reward shaping. In EMNLP.
[107]
Yankai Lin, Zhiyuan Liu, Huanbo Luan, Maosong Sun, Siwei Rao, and Song Liu. 2015. Modeling relation paths for representation learning of knowledge bases. In EMNLP. 705–714.
[108]
Ying Lin, Han Wang, Jiangning Chen, Tong Wang, Yue Liu, Heng Ji, Yang Liu, and Premkumar Natarajan. 2021. Personalized entity resolution with dynamic heterogeneous knowledge graph representations. CoRR abs/2104.02667 (2021).
[109]
Ashley Little. 2020. Outdated Data: Worse Than No Data? Retrieved from https://info.aldensys.com/joint-use/outdated-data-is-worse-than-no-data
[110]
Stéphane Lopes, Jean-Marc Petit, and Lotfi Lakhal. 2000. Efficient discovery of functional dependencies and Armstrong relations. In EDBT. Springer, 350–364.
[111]
Mohammad Mahdavi and Ziawasch Abedjan. 2020. Baran: Effective error correction via a unified context representation and transfer learning. Proc. VLDB 13, 11 (2020), 1948–1961.
[112]
Mohammad Mahdavi, Ziawasch Abedjan, Raul Castro Fernandez, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, and Nan Tang. 2019. Raha: A configuration-free error detection system. In SIGMOD. 865–882.
[113]
Stephen Merity, Nitish Shirish Keskar, and Richard Socher. 2018. Regularizing and optimizing LSTM language models. In ICLR.
[115]
Michael Mitzenmacher and Eli Upfal. 2005. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press.
[116]
Euhyun Moon and Eric C. Cyr. 2022. Parallel training of GRU networks with a multi-grid solver for long sequences. In ICLR.
[117]
Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep learning for entity matching: A design space exploration. In SIGMOD. 19–34.
[118]
Boris Muzellec, Julie Josse, Claire Boyer, and Marco Cuturi. 2020. Missing data imputation using optimal transport. In ICML.
[119]
Mohammad Hossein Namaki, Yinghui Wu, Qi Song, Peng Lin, and Tingjian Ge. 2017. Discovering graph temporal association rules. In CIKM.
[120]
Noel Novelli and Rosine Cicchetti. 2001. Fun: An efficient algorithm for mining functional and embedded dependencies. In ICDT. 189–203.
[121]
Daniel Obraczka, Jonathan Schuchart, and Erhard Rahm. 2021. EAGER: Embedding-assisted entity resolution for knowledge graphs. CoRR abs/2101.06126 (2021).
[122]
Karolina Okrasa and Pawel Rzazewski. 2021. Fine-grained complexity of the graph homomorphism problem for bounded-treewidth graphs. SIAM J. Comput. 50, 2 (2021), 487–508.
[123]
Stefano Ortona, Venkata Vamsikrishna Meduri, and Paolo Papotti. 2018. Robust discovery of positive and negative rules in knowledge bases. In ICDE. 1168–1179.
[124]
Thorsten Papenbrock and Felix Naumann. 2016. A hybrid approach to functional dependency discovery. In SIGMOD.
[125]
Aldo Pareja, Giacomo Domeniconi, Jie Chen, Tengfei Ma, Toyotaro Suzumura, Hiroki Kanezashi, Tim Kaler, Tao B. Schardl, and Charles E. Leiserson. 2020. EvolveGCN: Evolving graph convolutional networks for dynamic graphs. In AAAI. 5363–5370.
[126]
Dongwon Park, Dong Un Kang, Jisoo Kim, and Se Young Chun. 2020. Multi-temporal recurrent neural networks for progressive non-uniform single image deblurring with incremental temporal training. In ECCV, Vol. 12351. 327–343.
[127]
Heiko Paulheim. 2017. Knowledge graph refinement: A survey of approaches and evaluation methods. Semant. Web 8, 3 (2017), 489–508.
[128]
Eduardo H. M. Pena, Eduardo C. de Almeida, and Felix Naumann. 2019. Discovery of approximate (and exact) denial constraints. Proc. VLDB 13, 3 (2019), 266–278.
[129]
Maksim Podkorytov, Daniel Bis, and Xiuwen Liu. 2021. How can the [MASK] know? The sources and limitations of knowledge in BERT. In IJCNN. 1–8.
[130]
Kun Qian, Lucian Popa, and Prithviraj Sen. 2017. Active learning for large-scale entity resolution. In CIKM. 1379–1388.
[131]
Meng Qu, Junkun Chen, Louis-Pascal A. C. Xhonneux, Yoshua Bengio, and Jian Tang. 2021. RNNLogic: Learning logic rules for reasoning on knowledge graphs. In ICLR.
[132]
Kashif Rabbani, Matteo Lissandrini, and Katja Hose. 2023. Extraction of validating shapes from very large knowledge graphs. Proc. VLDB 16, 5 (2023), 1023–1032.
[133]
Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, and Christopher Ré. 2017. HoloClean: Holistic data repairs with probabilistic inference. Proc. VLDB 10, 11 (2017), 1190–1201.
[134]
Ryan A. Rossi and Nesreen K. Ahmed. 2015. The network data repository with interactive graph analytics and visualization. In AAAI.
[135]
Tara Safavi and Danai Koutra. 2020. CoDEx: A comprehensive knowledge graph completion benchmark. In EMNLP. 8328–8350.
[136]
Marcus Schaefer and Christopher Umans. 2002. Completeness in the polynomial-time hierarchy: A compendium. SIGACT News 33, 3 (2002), 32–49.
[137]
Philipp Schirmer, Thorsten Papenbrock, Ioannis Koumarelas, and Felix Naumann. 2020. Efficient discovery of matching dependencies. ACM Trans. Database Syst. 45, 3 (2020), 1–33.
[138]
Philipp Schirmer, Thorsten Papenbrock, Sebastian Kruse, Felix Naumann, Dennis Hempfing, Torben Mayer, and Daniel Neuschäfer-Rube. 2019. DynFD: Functional dependency discovery in dynamic datasets. In EDBT.
[139]
Michael Sejr Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In ESWC.
[140]
Chao Shang, Yun Tang, Jing Huang, Jinbo Bi, Xiaodong He, and Bowen Zhou. 2019. ConvTransE Implementation. Retrieved from https://github.com/JD-AI-Research-Silicon-Valley/SACN
[141]
Pengpeng Shao, Dawei Zhang, Guohua Yang, Jianhua Tao, Feihu Che, and Tong Liu. 2022. Tucker decomposition-based temporal knowledge graph completion. Knowl. Based Syst. 238 (2022), 107841.
[142]
Victor S. Sheng and Jing Zhang. 2019. Machine learning with crowdsourcing: A brief summary of the past research and future directions. In AAAI. 9837–9843.
[143]
Kartik Shenoy, Filip Ilievski, Daniel Garijo, Daniel Schwabe, and Pedro A. Szekely. 2022. A study of the quality of Wikidata. J. Web Semant. 72 (2022), 100679.
[144]
Rohit Singh, Venkata Vamsikrishna Meduri, Ahmed K. Elmagarmid, Samuel Madden, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, Armando Solar-Lezama, and Nan Tang. 2017. Synthesizing entity matching rules by examples. Proc. VLDB 11, 2 (2017), 189–202.
[145]
Julie Smiley. 2016. Missing Data and its Impact on Clinical Research. Retrieved from https://blogs.oracle.com/health-sciences/post/missing-data-and-its-impact-on-clinical-research
[146]
Indro Spinelli, Simone Scardapane, and Aurelio Uncini. 2020. Missing data imputation with adversarially-trained graph convolutional networks. Neural Netw. 129 (2020), 249–260.
[147]
Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: A core of semantic knowledge. In WWW. 697–706.
[148]
Katia P. Sycara. 1993. Machine learning for intelligent support of conflict resolution. Decis. Supp. Syst. 10, 2 (1993), 121–136.
[149]
Xianfeng Tang, Yandong Li, Yiwei Sun, Huaxiu Yao, Prasenjit Mitra, and Suhang Wang. 2020. Transferring robustness for graph neural network against poisoning attacks. In WSDM. 600–608.
[150]
Thomas Pellissier Tanon and Fabian M. Suchanek. 2021. Neural knowledge base repairs. In ESWC.
[151]
Yufei Tao. 2018. Massively parallel entity matching with linear classification in low dimensional space. In ICDT. 20:1–20:19.
[152]
Thong Tran and Tru H. Cao. 2013. Automatic detection of outdated information in wikipedia infoboxes. Res. Comput. Sci. 70 (2013), 211–222.
[153]
Leslie G. Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (1990), 103–111.
[154]
Ron van der Meyden. 1997. The complexity of querying indefinite data about linearly ordered domains. J. Comput. Syst. Sci. 54, 1 (1997), 113–135.
[155]
Larysa Visengeriyeva and Ziawasch Abedjan. 2018. Metadata-driven error detection. In SSDBM. 1:1–1:12.
[156]
Denny Vrandecic and Markus Krötzsch. 2014. Wikidata: A free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78–85.
[157]
Binghui Wang, Jinyuan Jia, Xiaoyu Cao, and Neil Zhenqiang Gong. 2021. Certified robustness of graph neural networks against adversarial structural perturbation. In KDD. 1645–1653.
[158]
Jiannan Wang, Tim Kraska, Michael J. Franklin, and Jianhua Feng. 2012. CrowdER: Crowdsourcing entity resolution. Proc. VLDB 5, 11 (2012), 1483–1494.
[159]
Tobias Weller and Heiko Paulheim. 2021. Evidential relational-graph convolutional networks for entity classification in knowledge graphs. In CIKM. 3533–3537.
[160]
Steven Euijong Whang and Hector Garcia-Molina. 2013. Joint entity resolution on multiple datasets. VLDB J. 22, 6 (2013), 773–795.
[161]
Eric Wong and J. Zico Kolter. 2018. Provable defenses against adversarial examples via the convex outer adversarial polytope. In ICML. 5283–5292.
[162]
Richard Wu, Aoqian Zhang, Ihab F. Ilyas, and Theodoros Rekatsinas. 2020. Attention-based learning for missing data imputation in HoloClean. In MLSys.
[163]
Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang, Rui Yan, and Dongyan Zhao. 2019. Relation-aware entity alignment for heterogeneous knowledge graphs. In IJCAI. 5278–5284.
[164]
Catharine M. Wyss, Chris Giannella, and Edward L. Robertson. 2001. FastFDs: A heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances—Extended abstract. In DaWak.
[165]
Yuhao Yang, Chao Huang, Lianghao Xia, and Chenliang Li. 2022. Knowledge graph contrastive learning for recommendation. In SIGIR. 1434–1443.
[166]
H. Yao, H. Hamilton, and C. Butz. 2002. FD_Mine: Discovering functional dependencies in a database using equivalences. In IEEE ICDM. 1–15.
[167]
Rex Ying, A. Wang, Jiaxuan You, and Jure Leskovec. 2020. Frequent subgraph mining by walking in order embedding space. In ICML Workshops.
[168]
Jinsung Yoon, James Jordon, and Mihaela van der Schaar. 2018. GAIN: Missing data imputation using generative adversarial nets. In ICML. 5675–5684.
[169]
Jiaxuan You, Xiaobai Ma, Daisy Yi Ding, Mykel J. Kochenderfer, and Jure Leskovec. 2020. Handling missing data with graph representation learning. In NeurIPS.
[170]
Xiangxiang Zeng, Xinqi Tu, Yuansheng Liu, Xiangzheng Fu, and Yansen Su. 2022. Toward better drug discovery with knowledge graph. Curr. Opin. Struct. Biol. 72 (2022), 114–126.
[171]
Dongxiang Zhang, Long Guo, Xiangnan He, Jie Shao, Sai Wu, and Heng Tao Shen. 2018. A graph-theoretic fusion framework for unsupervised entity resolution. In ICDE. 713–724.
[172]
Ge Zhang, Jia Wu, Jian Yang, Amin Beheshti, Shan Xue, Chuan Zhou, and Quan Z. Sheng. 2021. FRAUDRE: Fraud detection dual-resistant to graph inconsistency and imbalance. In ICDM.
[173]
Kai Zhang, Qian Yu, Kai Lei, and Kuai Xu. 2014. Characterizing tweeting behaviors of sina weibo users via public data streaming. In WAIM, Vol. 8485. 294–297.
[174]
Qinggang Zhang, Junnan Dong, Keyu Duan, Xiao Huang, Yezi Liu, and Linchuan Xu. 2022. Contrastive knowledge graph error detection. In CIKM.
[175]
Yunjia Zhang, Zhihan Guo, and Theodoros Rekatsinas. 2020. A statistical perspective on discovering functional dependencies in noisy data. In SIGMOD. 861–876.
[176]
Jing Zheng, Jian Liu, Chuan Shi, Fuzhen Zhuang, Jingzhi Li, and Bin Wu. 2017. Recommendation in heterogeneous information network via dual similarity regularization. Int. J. Data Sci. Anal. 3 (2017), 35–48.
[177]
Zheng Zheng, Tri Minh Quach, Ziyi Jin, Fei Chiang, and Mostafa Milani. 2019. CurrentClean: Interactive change exploration and cleaning of stale data. In CIKM. 2917–2920.
[178]
Ziyue Zhong, Meihui Zhang, Ju Fan, and Chenxiao Dou. 2022. Semantics driven embedding learning for effective entity alignment. In ICDE. 2127–2140.
[179]
Linhong Zhu, Majid Ghasemi-Gol, Pedro Szekely, Aram Galstyan, and Craig A. Knoblock. 2016. Unsupervised entity resolution on multi-type graphs. In ISWC.
[180]
Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. 2021. Graph contrastive learning with adaptive augmentation. In WWW. 2069–2080.
[181]
Daniel Zügner and Stephan Günnemann. 2019. Certifiable robustness and robust training for graph convolutional networks. In KDD. 246–256.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 49, Issue 4
December 2024
198 pages
EISSN:1557-4644
DOI:10.1145/3613725
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 December 2024
Online AM: 02 November 2024
Accepted: 01 August 2024
Revised: 13 May 2024
Received: 27 December 2023
Published in TODS Volume 49, Issue 4

Check for updates

Author Tags

  1. Entity resolution
  2. conflict resolution
  3. timeliness deduction
  4. missing data imputation
  5. graph cleaning rules

Qualifiers

  • Research-article

Funding Sources

  • Royal Society Wolfson Research Merit Award
  • Fundamental Research Funds for the Central Universities, NSFC
  • NSFC

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 140
    Total Downloads
  • Downloads (Last 12 months)140
  • Downloads (Last 6 weeks)72
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media