[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Butterfly counting on uncertain bipartite graphs

Published: 01 October 2021 Publication History

Abstract

When considering uncertain bipartite networks, the number of instances of the popular graphlet structure the butterfly may be used as an important metric to quickly gauge information about the network. This Uncertain Butterfly Count has practical usages in a variety of areas such as biomedical/biological fields, E-Commerce and road networks. In this paper we formally define the uncertain butterfly structure (in which the existential probability of the butterfly is greater than or equal to some user-defined threshold t) as well as the Uncertain Butterfly Counting Problem (to determine the number of unique instances of this structure on any uncertain bipartite network). We then examine exact solutions by proposing a non-trivial baseline (UBFC) as well as an improved solution (IUBFC) which reduces the time complexity and employs heuristics to further reduce the runtime in practice. In addition to exact solutions, we propose two approximate solutions via sampling, UBS and PES, which can be used to quickly estimate the Uncertain Butterfly Count, a powerful tool when the exact count is unnecessary. Using a range of networks with different edge existential probability distributions, we validate the efficiency and effectiveness of our solutions.

References

[1]
S. Abiteboul, P. Kanellakis, and G. Grahne. On the representation and querying of sets of possible worlds. SIGMOD, pages 34--48, 1987.
[2]
M. Al Hasan and V. S. Dave. Triangle counting in large networks: a review. WIREs DMKD, 8(2):e1226, 2018.
[3]
S. Asthana, O. D. King, F. D. Gibbons, and F. P. Roth. Predicting protein complex membership using probabilistic network reliability. Genome Res., 14(6):1170--1175, 2004.
[4]
J. E. Bartlett, J. W. Kortlik, and C. C. Higgins. Organizational research: Determining appropriate sample size in survey research. ITLPJ, 19(1):43--50, 2001.
[5]
L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. SIGKDD, page 16--24, 2008.
[6]
F. Bonchi, F. Gullo, A. Kaltenbrunner, and Y. Volkovich. Core decomposition of uncertain graphs. SIGKDD, pages 1316--1325, 2014.
[7]
T. Dallas, A. W. Park, and J. M. Drake. Predicting cryptic links in host-parasite networks. PLOS Computational Biology, 13:1--15, 05 2017.
[8]
K. Han, F. Gui, X. Xiao, J. Tang, Y. He, Z. Cao, and H. Huang. Efficient and effective algorithms for clustering uncertain graphs. PVLDB, 12(6):667--680, 2019.
[9]
X. Huang, W. Lu, and L. Lakshmanan. Truss decomposition of probabilistic graphs: Semantics and algorithms. SIGMOD, pages 77--90, 06 2016.
[10]
A. Khan and L. Chen. On uncertain graphs modelling and queries. PVLDB, 8(12):2042--2043, 2015.
[11]
N. Korovaiko and A. Thomo. Trust prediction from user-item ratings. Soc. Netw. Anal. Min., 3:749--759, 2013.
[12]
R.-H. Li, J. X. Yu, R. Mao, and T. Jin. Efficient and accurate query evaluation on uncertain graphs via recursive stratified sampling. ICDE, pages 892--903, 2014.
[13]
P. Lind, M. C. Gonzalez, and H. Herrmann. Cycles and clustering in bipartite networks. Phys, review. E., 72:056127, 12 2005.
[14]
B. Liu, L. Yuan, X. Lin, L. Qin, W. Zhang, and J. Zhou. Efficient (α, β)-core computation in bipartite graphs. VLDBJ, 29:1075--1099, 2020.
[15]
B. Lyu, L. Qin, X. Lin, Y. Zhang, Z. Qian, and J. Zhou. Maximum biclique search at billion scale. PVLDB, 13(9):1359--1372, May 2020.
[16]
C. Ma, R. Cheng, L. V. S. Lakshamanan, T. Grubernmannm, Y. Fang, and X. Li. Linc: A motif counting algorithm for uncertain graphs. PVLDB, 13(2):155--168, 2019.
[17]
P. Parchas, F. Gullo, D. Papadias, and F. Bonchi. The pursuit of a good possible world: Extracting representative instances of uncertain graphs. SIGMOD, pages 967--978, 2014.
[18]
G. A. Pavlopoulos, P. I. Kontou, A. Pavlopoulou, C. Bouyioukos, E. Markou, and P. G. Bagos. Bipartite graphs in systems biology and medicine: a survey of methods and applications. GigaScience, 7(4), 02 2018.
[19]
R. Peeters. The maximum edge biclique problem is np-complete. Discret. Appl. Math., 131(3):651--654, 2003.
[20]
C. Phillips, K. Wang, E. Baker, J. Bubier, E. Chesler, and M. Langston. On finding and enumerating maximal and maximum k-partite cliques in k-partite graphs. Algorithms, 12(1):23, 2019.
[21]
M. Potamias, F. Bonchi, A. Gionis, and G. Kollios. k-nearest neighbors in uncertain graphs. PVLDB, 3(1):997--1008, 2010.
[22]
S.-V. Sanei-Mehri, A. E. Sariyuce, and S. Tirthapura. Butterfly counting in bipartite networks. SIGKDD, 24:2150--2160, 2018.
[23]
S.-V. Sanei-Mehri, Y. Zhang, A. E. Sariyüce, and S. Tirthapura. Fleet: Butterfly estimation from a bipartite graph stream. CIKM, page 1201--1210, 2019.
[24]
B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. WWW, pages 285--295, 2010.
[25]
I. Sungur, Y. Ren, F. Ordóñez, M. Dessouky, and H. Zhong. A model and algorithm for the courier delivery problem with uncertainty. Transportation Science, 44(2):193--205, 2010.
[26]
S. Suthram, T. Shlomi, E. Ruppin, R. Sharan, and T. Ideker. A direct comparison of protein interaction confidence assignment schemes. BMC Bioinformatics, 7:360--370, 2006.
[27]
J. G. Walker, M. Plein, E. R. Morgan, and P. A. Vesk. Uncertain links in host-parasite networks: lessons for parasite transmission in a multi-host system. Philos. Trans. R. Soc. B, 372, 2017.
[28]
J. Wang, A. W. Fu, and J. Cheng. Rectangle counting in large bipartite graphs. BigData, pages 17--24, 2014.
[29]
K. Wang, X. Lin, L. Qin, W. Zhang, and Y. Zhang. Vertex priority based butterfly counting for large-scale bipartite networks. PVLDB, 12(10):1139--1152, 2019.
[30]
K. Wang, X. Lin, L. Qin, W. Zhang, and Y. Zhang. Efficient bitruss decomposition for large-scale bipartite graphs. ICDE, pages 661--672, 2020.
[31]
B. Wilder, A. Yadav, N. Immorlica, E. Rice, and M. Tambe. Uncharted but not uninfluenced: Influence maximization with an uncertain network. AAMAS, 16:1305--1313, 2017.
[32]
M. M. Wolf, M. Deveci, J. W. Berry, S. D. Hammond, and S. Rajamanickam. Fast linear algebra-based triangle counting with kokkoskernels. HPEC, pages 1--7, 2017.
[33]
Y. Yuan, L. Chen, and G. Wang. "efficiently answering probability threshold-based shortest path queries over uncertain graphs. DASFAA, pages 155--170, 2010.
[34]
Y. Yuan, G. Wang, L. Chen, and H. Wang. Efficient keyword search on uncertain graph data. TKDE, 25(12):2767--2779, 2013.
[35]
Y. Zeng, Y. Tong, and L. Chen. Last-mile delivery made practical: An efficient route planning framework with theoretical guarantees. PVLDB, 13(3):320--333, 2020.
[36]
Y. Zhang, C. A. Phillips, G. L. Rogers, E. J. Baker, E. J. Chesler, and M. A. Langston. On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types. BMC Bioinformatics, 15(1):110, 2014.
[37]
B. Zhao, J. Wang, M. Li, F. Wu, and Y. Pan. Detecting protein complexes based on uncertain graph model. IEEE/ACM Trans. Comput. Biol. Bioinform, 11(3):486--497, 2014.
[38]
A. Zhou, Y. Wang, and L. Chen. Finding large diverse communities on networks: The edge maximum k*-partite clique. PVLDB, 13(12):2576--2589, July 2020.
[39]
T. Zhou, J. Ren, M. c. v. Medo, and Y.-C. Zhang. Bipartite network projection and personal recommendation. Phys. Rev. E, 76:046115, Oct 2007.
[40]
Z. Zou. Bitruss decomposition of bipartite graphs. DASFAA, page 218--233, 2016.
[41]
Z. Zou and R. Zhu. Truss decomposition of uncertain graphs. Knowl. Inf. Syst., 50(1):197--230, 2017.

Cited By

View all
  • (2024)Efficient Maximal Frequent Group Enumeration in Temporal Bipartite GraphsProceedings of the VLDB Endowment10.14778/3681954.368199717:11(3243-3255)Online publication date: 30-Aug-2024
  • (2024)Efficient Index for Temporal Core Queries over Bipartite GraphsProceedings of the VLDB Endowment10.14778/3681954.368196517:11(2813-2825)Online publication date: 1-Jul-2024
  • (2024)FABLE: Approximate Butterfly Counting in Bipartite Graph Stream with Duplicate EdgesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679812(2158-2167)Online publication date: 21-Oct-2024
  • Show More Cited By

Index Terms

  1. Butterfly counting on uncertain bipartite graphs
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 15, Issue 2
        October 2021
        247 pages
        ISSN:2150-8097
        Issue’s Table of Contents

        Publisher

        VLDB Endowment

        Publication History

        Published: 01 October 2021
        Published in PVLDB Volume 15, Issue 2

        Badges

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)46
        • Downloads (Last 6 weeks)4
        Reflects downloads up to 04 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Efficient Maximal Frequent Group Enumeration in Temporal Bipartite GraphsProceedings of the VLDB Endowment10.14778/3681954.368199717:11(3243-3255)Online publication date: 30-Aug-2024
        • (2024)Efficient Index for Temporal Core Queries over Bipartite GraphsProceedings of the VLDB Endowment10.14778/3681954.368196517:11(2813-2825)Online publication date: 1-Jul-2024
        • (2024)FABLE: Approximate Butterfly Counting in Bipartite Graph Stream with Duplicate EdgesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679812(2158-2167)Online publication date: 21-Oct-2024
        • (2024)Efficient processing of coverage centrality queries on road networksWorld Wide Web10.1007/s11280-024-01248-527:3Online publication date: 12-Apr-2024
        • (2024)Parallelization of butterfly counting on hierarchical memoryThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00856-x33:5(1453-1484)Online publication date: 1-Sep-2024
        • (2023)Efficient Temporal Butterfly Counting and Enumeration on Temporal Bipartite GraphsProceedings of the VLDB Endowment10.14778/3636218.363622317:4(657-670)Online publication date: 1-Dec-2023
        • (2023)Maximum Balanced (k,ϵ)-Bitruss Detection in Signed Bipartite GraphProceedings of the VLDB Endowment10.14778/3632093.363209917:3(332-344)Online publication date: 1-Nov-2023
        • (2023)SageProceedings of the VLDB Endowment10.14778/3565838.356584415:13(3897-3910)Online publication date: 20-Jan-2023
        • (2023)Scalable Approximate Butterfly and Bi-triangle Counting for Large Bipartite NetworksProceedings of the ACM on Management of Data10.1145/36267531:4(1-26)Online publication date: 12-Dec-2023
        • (2023)Efficient Biclique Counting in Large Bipartite GraphsProceedings of the ACM on Management of Data10.1145/35889321:1(1-26)Online publication date: 30-May-2023
        • Show More Cited By

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media