Abstract
Recently more and more information sources are connected together and become a sort of complex graphs that can be exploited not only as a structured and semi-structured data such as rdb or xml, RDF or NoSQL, but also as many kinds of unstructured data such as web, bioinformatics, genometrics, patents, social media, knowlege graphs, IoT, hidden graph from deep learning results. State of the art studies have suggested methods of presenting data as hyper-graphs in search queries and finding search results in subgraphs or graph embeddings rather than a list of individual node results. We study the problem of retrieving top-k graph results with the query relevances; that is, given a set of query keywords Q on a graph G, we aim to find a subgraph g of G such that g is highly related to Q and closely linked under g. In order to consider the relevant graph results and the connectivity simultaneously, we present an effective algorithm graph threshold algorithm (GTA) based on a threshold algorithm (TA) which works efficiently in non-graph structure. We show that GTA does not need unnecessary searches under given objective functions, and prove the existence of an upper bound of the size of subgraph for top-k results, called \({\textit{hop}}_{{\textit{max}}}\), which makes it efficient to find the combined results. Finally, we conduct the performance studies on real and synthetic graphs, which demonstrate that our algorithm significantly outperforms conventional approaches with respect to time complexity and cost consumption.
Similar content being viewed by others
References
Abu-Aisheh Z, Raveaux R, Ramel J (2020) Efficient k-nearest neighbors search in graph space. Pattern Recognit Lett 134:77–86
Amsterdamer Y, Kukliansky A, Milo T (2015) A natural language interface for querying general and individual knowledge. Proc VLDB Endow 8(12):1430–1441
Anand D, Gadiya S, Sethi A (2020) Histographs: graphs in histopathology. In: Tomaszewski JE, Ward AD (eds) Medical Imaging 2020, SPIE Proceedings, vol 11320. SPIE, p 113200O
Chen J, Li B, Wang J, Zhao Y, Yao L, Xiong Y (2020) Knowledge graph enhanced third-party library recommendation for mobile application development. IEEE Access 8:42436–42446
Chen X, Pan L (2018) A survey of graph cuts/graph search based medical image segmentation. IEEE Rev Biomed Eng 11:112–124
Cheng J, Zeng X, Yu JX (2013) Top-\(k\) graph pattern matching over large graphs. In: ICDE. IEEE, pp 1033–1044
Choi D, Park CS, Chung YD (2019) Progressive top-\(k\) subarray query processing in array databases. Proc VLDB Endow 12(9):989–1001
Christmann P, Saha Roy R, Abujabal A, Singh J, Weikum G (2019) Look before you hop: conversational question answering over knowledge graphs using judicious context expansion. In: CIKM, pp 729–738
Ding B, Yu JX, Wang S, Qin L, Zhang X, Lin, X (2007) Finding top-\(k\) min-cost connected trees in databases. In: IEEE International Conference on Data Engineering. IEEE, pp 836–845
Ding X, Jia J, Li J, Liu J, Jin H (2014) Top-\(k\) similarity matching in large graphs with attributes. In: International Conference on Database Systems for Advanced Applications. Springer, Berlin, pp 156–170
Eom CS, Lee CC, Lee W, Leung CK (2020) Effective privacy preserving data publishing by vectorization. Inf Sci 527:311–328
Fagin R (1999) Combining fuzzy information from multiple systems. J Comput Syst Sci 58(1):83–99
Fagin R, Lotem A, Naor M (2003) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66(4):614–656
Han X, Yue L, Dong Y, Xu Q, Xie G, Xu X (2020) Efficient hybrid algorithm based on moth search and fireworks algorithm for solving numerical and constrained engineering optimization problems. J Supercomput 76(12):9404–9429
He H, Wang H, Yang J, Yu PS (2007) Blinks: ranked keyword searches on graphs. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. ACM, pp 305–316
Ho VT, Pal K, Kleer N, Berberich K, Weikum G (2020) Entities with quantities: extraction, search, and ranking. In: WSDM, pp 833–836
Hormozi N (2019) Disambiguation and result expansion in keyword search over relational databases. In: International Conference on Data Engineering (ICDE). IEEE, pp 2101–2105
Hristidis V, Papakonstantinou Y, Gravano L (2003) Efficient IR-style keyword search over relational databases. In: International Conference on Very Large Databases. Elsevier, Amsterdam, pp 850–861
Ilyas IF, Beskales G, Soliman MA (2008) A survey of top-\(k\) query processing techniques in relational database systems. ACM Comput Surv 40(4):11:1–11:58
Jenkins P, Zhao J, Vinicombe H, Subramanian A, Prasad A, Dobi A, Li E, Guo Y (2020) Natural language annotations for search engine optimization. In: WWW, pp 2856–2862
Jin J, Luo J, Khemmarat S, Dong F, Gao L (2019) GStar: an efficient framework for answering top-\(k\) star queries on billion-node knowledge graphs. World Wide Web 22(4):1611–1638
Kargar M, An A (2011) Keyword search in graphs: finding r-cliques. Proc VLDB 4(10):681–692
Kasneci G, Ramanath M, Sozio M, Suchanek FM, Weikum G (2009) Star: Steiner-tree approximation in relationship graphs. In: IEEE International Conference on Data Engineering. IEEE, pp 868–879
Kim J, Lee W, Song JJ, Lee SB (2017) Optimized combinatorial clustering for stochastic processes. Clust Comput 20(2):1135–1148
Kou L, Markowsky G, Berman L (1981) A fast algorithm for Steiner trees. Acta Informatica 15(2):141–145
Kwon HY, Whang KY (2016) Scalable and efficient processing of top-\(k\) multiple-type integrated queries. World Wide Web 19(6):1051–1075
Lee W, Leung CKS (2009) Structural top-\(k\) web navigation with inclusive query. In: IEEE International Conference on Industrial Technology. IEEE, pp 1–6
Lee W, Leung CK, Lee JJ (2011) Mobile web navigation in digital ecosystems using rooted directed trees. IEEE Trans Ind Electron 58(6):2154–2162
Li WS, Candan KS, Vu Q, Agrawal D (2001) Retrieving and organizing web pages by “information unit”. In: WWW’01, pp 230–244
Li G, Ooi BC, Feng J, Wang J, Zhou L (2008) Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM, pp 903–914
Lin Z, Yang D, Yin X (2020) Patient similarity via joint embeddings of medical knowledge graph and medical entity descriptions. IEEE Access 8:156663–156676
Luo C, Carey MJ (2020) LSM-based storage techniques: a survey. VLDB J 29(1):393–418
Luo Y, Lin X, Wang W, Zhou X (2007) Spark: top-\(k\) keyword query in relational databases. In: ACM SIGMOD International Conference on Management of Data, SIGMOD’07. ACM, New York, pp 115–126
Mohanty M, Ramanath M (2019) Insta-search: towards effective exploration of knowledge graphs. In: International Conference on Information and Knowledge Management, pp 2909–2912
Mottin D, Lissandrini M, Velegrakis Y, Palpanas T (2016) Exemplar queries: a new way of searching. VLDB J Int J Very Large Data Bases 25(6):741–765
Shang J, Wang C, Wang C, Guo G, Qian J (2020) An attribute-based community search method with graph refining. J Supercomput 76(10):7777–7804
Shimomura LC, Oyamada RS, Vieira MR, Kaster DS (2021) A survey on graph-based methods for similarity searches in metric spaces. Inf Syst 95:101507
Srinidhi CL, Aparna P, Rajan J (2019) Automated method for retinal artery/vein separation via graph search metaheuristic approach. IEEE Trans Image Process 28(6):2705–2718
Sun Z, Tang J, Du P, Deng ZH, Nie JY (2019) Divgraphpointer: a graph pointer network for extracting diverse keyphrases. In: SIGIR, pp 755–764
Tang B, Mouratidis K, Yiu ML, Chen Z (2019) Creating top ranking options in the continuous option and preference space. Proc VLDB Endow 12(10):1181–1194
Tiakas E, Valkanas G, Papadopoulos AN, Manolopoulos Y, Gunopulos D (2016) Processing top-\(k\) dominating queries in metric spaces. ACM Trans Database Syst 40(4):23:1–23:38
Tian Q, Li J, Liu H (2019) A method for guaranteeing wireless communication based on a combination of deep and shallow learning. IEEE Access 7:38688–38695
Yang S, Han F, Wu Y, Yan X (2016) Fast top-\(k\) search in knowledge graphs. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, pp 990–1001
Zou L, Huang R, Wang H, Yu JX, He W, Zhao D (2014) Natural language question answering over RDF: a graph data driven approach. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, pp 313–324
Acknowledgements
This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2019S1A5C2A03081234).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lee, W., Song, J.J., Lee, C.C. et al. Graph threshold algorithm. J Supercomput 77, 9827–9847 (2021). https://doi.org/10.1007/s11227-021-03665-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03665-z