More Web Proxy on the site http://driver.im/

research-article

Open access

G2A2: Graph Generator with Attributes and Anomalies

Authors:

Wu-chun FengAuthors Info & Claims

CF '24: Proceedings of the 21st ACM International Conference on Computing Frontiers

Pages 3 - 11

https://doi.org/10.1145/3649153.3649206

Published: 02 July 2024 Publication History

Abstract

Many data-mining applications use dynamic attributed graphs to represent relational information; but due to security and privacy concerns, there is a dearth of publicly available datasets that can be represented as dynamic attributed graphs. Even when such datasets are available, they do not have ground truth that can be useful for classification problems, e.g., anomaly detection. Thus, researchers commonly generate synthetic graphs using either statistical or deep generative (DG) methods. However, neither approach produces ground truth. Statistical methods struggle to replicate intricate patterns found in real-world dynamic attributed graphs, while DG methods require a significant number of graphs for training.

To address these shortcomings, we present G2A2, an automated graph generator with attributes and anomalies, which encompasses (1) probabilistic models to generate a dynamic bipartite graph, representing realistic time-evolving connections between two independent sets of entities, (2) realistic injection of anomalies for ground truth using a novel algorithm that captures the general properties of graph anomalies across domains, and (3) generative adversarial network (GAN) model to produce realistic attributes, learned from an existing real-world dataset. We also show that G2A2 is scalable and can generate a graph with a million edges in under a minute of computing time. Using the maximum mean discrepancy (MMD) metric to evaluate the realism of a G2A2-generated graph against three real-world graphs, G2A2 outperforms Kronecker graph generation by reducing the MMD distance by up to six-fold (6x).

References

[1]

A. Alzaatreh, Carl Lee, F. Famoye, and Indranil K. Ghosh. 2016. The generalized Cauchy family of distributions with applications. Journal of Statistical Distributions and Applications 3 (2016), 1--16.

[2]

L. Benguigui and M. Marinov. 2015. A classification of natural and social distributions Part one: the descriptions. arXiv:1507.03408 [physics.soc-ph]

[3]

S. Bhatia, R. Liu, B. Hooi, M. Yoon, K. Shin, and C. Faloutsos. 2022. Real-Time Anomaly Detection in Edge Streams. ACM Trans. Knowl. Discov. Data 16, 4 (2022), 22 pages. https://doi.org/10.1145/3494564

Digital Library

[4]

M. Chakrabarti, L. Heath, and N. Ramakrishnan. 2017. New methods to generate massive synthetic networks. arXiv:1705.08473 [cs.SI]

[5]

V. Chandola, A. Banerjee, and V. Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surv. 41 (2009), 15:1--15:58.

[6]

Z. Chen and A. Sun. 2020. Anomaly Detection on Dynamic Bipartite Graph with Burstiness. In 2020 IEEE International Conference on Data Mining (ICDM). IEEE, Sorrento, Italy, 966--971. https://doi.org/10.1109/ICDM50108.2020.00110

[7]

K. Ding, J. Li, R. Bhanushali, and H. Liu. 2019. Deep Anomaly Detection on Attributed Networks. Proceedings of the 2019 SIAM International Conference on Data Mining, Canada, 594--602. https://doi.org/10.1137/1.9781611975673.67

[8]

P. S. Efraimidis and P. G. Spirakis. 2006. Weighted random sampling with a reservoir. Inf. Process. Lett. 97 (2006), 181--185.

Digital Library

[9]

N. Eikmeier and D. F. Gleich. 2017. Revisiting Power-law Distributions in Spectra of Real World Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA, 817--826. https://doi.org/10.1145/3097983.3098128

Digital Library

[10]

P. Erdös and A. Rényi. 1959. On Random Graphs I. Publicationes Mathematicae Debrecen 6 (1959), 290.

[11]

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. 2012. A Kernel Two-Sample Test. J. Mach. Learn. Res. 13 (2012), 723--773.

Digital Library

[12]

A. Grover and J. Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA). ACM, New York, NY, USA, 855--864. https://doi.org/10.1145/2939672.2939754

Digital Library

[13]

Q. Gu and P. Liu. 2012. Denial of Service Attacks. Vol. 3. John Wiley and Sons, online, 454--468. https://doi.org/10.1002/9781118256107.ch29

[14]

X. Guo and L. Zhao. 2020. A Systematic Survey on Deep Generative Models for Graph Generation. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (2020), 5370--5390.

Digital Library

[15]

M. Kim and J. Leskovec. 2010. Multiplicative Attribute Graph Model of Real-World Networks. In Algorithms and Models for the Web-Graph, R. Kumar and D. Sivakumar (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 62--73.

[16]

C. Kolias, G. Kambourakis, A. Stavrou, and J. Voas. 2017. DDoS in the IoT: Mirai and Other Botnets. Computer 50 (2017), 80--84.

Digital Library

[17]

S. Kumar, X. Zhang, and J. Leskovec. 2019. Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, New York, NY, USA, 1269--1278. https://doi.org/10.1145/3292500.3330895

Digital Library

[18]

J. Leskovec, D. Chakrabarti, J. Kleinberg, and C. Faloutsos. 2005. Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication. In Knowledge Discovery in Databases. Springer, Berlin, 133--145.

[19]

J. Leskovec, J. Kleinberg, and C. Faloutsos. 2005. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM, New York, NY, USA, 177--187. https://doi.org/10.1145/1081870.1081893

Digital Library

[20]

J. Leskovec and R. Sosič. 2016. SNAP: A General-Purpose Network Analysis and Graph-Mining Library. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 1 (2016), 1.

Digital Library

[21]

R. Liao, Y. Li, Y. Song, S. Wang, W. L. Hamilton, D. Duvenaud, R. Urtasun, and R. Zemel. 2019. Efficient graph generation with graph recurrent attention networks. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vol. 383. Curran Associates Inc., Red Hook, NY, USA, 11 pages.

[22]

X. Ma, J. Wu, S. Xue, J. Yang, C. Zhou, Q. Z. Sheng, H. Xiong, and L. Akoglu. 2023. A Comprehensive Survey on Graph Anomaly Detection With Deep Learning. IEEE Transactions on Knowledge and Data Engineering 35, 12 (Dec 2023), 12012--12038. https://doi.org/10.1109/TKDE.2021.3118815

Digital Library

[23]

A. Nottingham, M. Gardner, J. Collyer, J. Lang, M. Veeraraghavan, M. Buchanan, J. Hiser, and J. Davidson. 2019. P-CORE Dataset.

[24]

National Institute of Standards and Technology. 2001. Security Requirements for Cryptographic Modules. Technical Report Federal Information Processing Standards Publications 140-2, Change Notice 2 Dec. 03, 2002. U.S. Department of Commerce, Washington, D.C. https://doi.org/10.6028/nist.fips.140-2

[25]

F. Saracco, R. Di Clemente, A. Gabrielli, and T. Squartini. 2015. Randomizing bipartite networks: the case of the World Trade Web. Scientific Reports 5 (2015), 18 pages.

[26]

L. Theis, A. v. d. Oord, and M. Bethge. 2016. A note on the evaluation of generative models. In 4th International Conference on Learning Representations. ICLR, Caribe Hilton, USA, 10 pages.

[27]

L. van der Maaten and G. E. Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research 9 (2008), 2579--2605.

[28]

D. J.Watts and S. H. Strogatz. 1998. Collective dynamics of 'small-world' networks. Nature 393, 6684 (1998), 440--442. https://doi.org/10.1038/30918

[29]

L. Xu, M. Skoularidou, A. Cuesta-Infante, and K. Veeramachaneni. 2019. Modeling Tabular data using Conditional GAN. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc., Vancouver, Canada.

[30]

M. Yoon, B. Hooi, K. Shin, and C. Faloutsos. 2019. Fast and Accurate Anomaly Detection in Dynamic Graphs with a Two-Pronged Approach. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, New York, USA, 647--657. https://doi.org/10.1145/3292500.3330946

Digital Library

[31]

J. You, R. Ying, X. Ren, W. L. Hamilton, and J. Leskovec. 2018. GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models. In ICML (Proceedings of Machine Learning Research, Vol. 80). PMLR, Stockholm, Sweden, 5694--5703.

[32]

W. Yu, W. Cheng, C. C. Aggarwal, K. Zhang, H. Chen, and W. Wang. 2018. Net-Walk: A Flexible Deep Embedding Approach for Anomaly Detection in Dynamic Networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Association for Computing Machinery, New York, NY, USA, 2672--2681. https://doi.org/10.1145/3219819.3220024

Digital Library

[33]

P. Zhang, J. Wang, X. Li, M. Li, Z. Di, and Y. Fan. 2008. Clustering coefficient and community structure of bipartite networks. Physica A: Statistical Mechanics and its Applications 387, 27 (2008), 6 pages. https://doi.org/10.1016/j.physa.2008.09.006

[34]

S. Zhang, M. Tao, X. Niu, and F. Huffer. 2020. Time-Varying Gaussian-Cauchy Mixture Models for Financial Risk Management. arXiv:2002.06102 [stat.AP]

[35]

W. Zhang, L. Zhang, D. Pfoser, and L. Zhao. 2021. Disentangled Dynamic Graph Deep Generation. Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), virtual, 738--746. https://doi.org/10.1137/1.9781611976700.83

[36]

D. Zhou, L. Zheng, J. Han, and J. He. 2020. A Data-Driven Graph Generative Model for Temporal Interaction Networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, New York, USA, 401--411. https://doi.org/10.1145/3394486.3403082

Digital Library

[37]

Y. Zhu, Y. Du, Y. Wang, Y. Xu, J. Zhang, Q. Liu, and S. Wu. 2022. A Survey on Deep Graph Generation: Methods and Applications. In Proceedings of the First Learning on Graphs Conference (Proceedings of Machine Learning Research, Vol. 198). PMLR, virtual, 47:1--47:21.

Index Terms

G2A2: Graph Generator with Attributes and Anomalies
1. Networks
  1. Network properties
    1. Network structure
      1. Topology analysis and generation
2. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation

Recommendations

On 2-factors of a bipartite graph

In this article, we consider the following problem: Given a bipartite graph G and a positive integer k, when does G have a 2-factor with exactly k components? We will prove that if G = (V₁, V₂, E) is a bipartite graph with |V₁| = |V₂| = n ≥ 2k + 1 and δ ...
Dynamic Coloring on Restricted Graph Classes
Algorithms and Complexity
Abstract
A proper k-coloring of a graph is an assignment of colors from the set to the vertices of the graph such that no two adjacent vertices receive the same color. Given a graph , the Dynamic Coloring problem asks to find a proper k-coloring of G such ...
Roman domination on strongly chordal graphs

Given real numbers b a >0, an ( a , b )- Roman dominating function of a graph G =( V , E ) is a function f : V {0, a , b } such that every vertex v with f ( v )=0 has a neighbor u with f ( u )= b . An independent/connected/total ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CF '24: Proceedings of the 21st ACM International Conference on Computing Frontiers

May 2024

345 pages

ISBN:9798400705977

DOI:10.1145/3649153

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

NSF (National Science Foundation)

Conference

CF '24

Sponsor:

SIGMICRO

CF '24: 21st ACM International Conference on Computing Frontiers

May 7 - 9, 2024

Ischia, Italy

Acceptance Rates

CF '24 Paper Acceptance Rate 33 of 105 submissions, 31%;

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Sponsor:
sigmicro

22nd ACM International Conference on Computing Frontiers

May 28 - 30, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
106
Total Downloads

Downloads (Last 12 months)106
Downloads (Last 6 weeks)24

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents