[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3649153.3649206acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article
Open access

G2A2: Graph Generator with Attributes and Anomalies

Published: 02 July 2024 Publication History

Abstract

Many data-mining applications use dynamic attributed graphs to represent relational information; but due to security and privacy concerns, there is a dearth of publicly available datasets that can be represented as dynamic attributed graphs. Even when such datasets are available, they do not have ground truth that can be useful for classification problems, e.g., anomaly detection. Thus, researchers commonly generate synthetic graphs using either statistical or deep generative (DG) methods. However, neither approach produces ground truth. Statistical methods struggle to replicate intricate patterns found in real-world dynamic attributed graphs, while DG methods require a significant number of graphs for training.
To address these shortcomings, we present G2A2, an automated graph generator with attributes and anomalies, which encompasses (1) probabilistic models to generate a dynamic bipartite graph, representing realistic time-evolving connections between two independent sets of entities, (2) realistic injection of anomalies for ground truth using a novel algorithm that captures the general properties of graph anomalies across domains, and (3) generative adversarial network (GAN) model to produce realistic attributes, learned from an existing real-world dataset. We also show that G2A2 is scalable and can generate a graph with a million edges in under a minute of computing time. Using the maximum mean discrepancy (MMD) metric to evaluate the realism of a G2A2-generated graph against three real-world graphs, G2A2 outperforms Kronecker graph generation by reducing the MMD distance by up to six-fold (6x).

References

[1]
A. Alzaatreh, Carl Lee, F. Famoye, and Indranil K. Ghosh. 2016. The generalized Cauchy family of distributions with applications. Journal of Statistical Distributions and Applications 3 (2016), 1--16.
[2]
L. Benguigui and M. Marinov. 2015. A classification of natural and social distributions Part one: the descriptions. arXiv:1507.03408 [physics.soc-ph]
[3]
S. Bhatia, R. Liu, B. Hooi, M. Yoon, K. Shin, and C. Faloutsos. 2022. Real-Time Anomaly Detection in Edge Streams. ACM Trans. Knowl. Discov. Data 16, 4 (2022), 22 pages. https://doi.org/10.1145/3494564
[4]
M. Chakrabarti, L. Heath, and N. Ramakrishnan. 2017. New methods to generate massive synthetic networks. arXiv:1705.08473 [cs.SI]
[5]
V. Chandola, A. Banerjee, and V. Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surv. 41 (2009), 15:1--15:58.
[6]
Z. Chen and A. Sun. 2020. Anomaly Detection on Dynamic Bipartite Graph with Burstiness. In 2020 IEEE International Conference on Data Mining (ICDM). IEEE, Sorrento, Italy, 966--971. https://doi.org/10.1109/ICDM50108.2020.00110
[7]
K. Ding, J. Li, R. Bhanushali, and H. Liu. 2019. Deep Anomaly Detection on Attributed Networks. Proceedings of the 2019 SIAM International Conference on Data Mining, Canada, 594--602. https://doi.org/10.1137/1.9781611975673.67
[8]
P. S. Efraimidis and P. G. Spirakis. 2006. Weighted random sampling with a reservoir. Inf. Process. Lett. 97 (2006), 181--185.
[9]
N. Eikmeier and D. F. Gleich. 2017. Revisiting Power-law Distributions in Spectra of Real World Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA, 817--826. https://doi.org/10.1145/3097983.3098128
[10]
P. Erdös and A. Rényi. 1959. On Random Graphs I. Publicationes Mathematicae Debrecen 6 (1959), 290.
[11]
A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. 2012. A Kernel Two-Sample Test. J. Mach. Learn. Res. 13 (2012), 723--773.
[12]
A. Grover and J. Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA). ACM, New York, NY, USA, 855--864. https://doi.org/10.1145/2939672.2939754
[13]
Q. Gu and P. Liu. 2012. Denial of Service Attacks. Vol. 3. John Wiley and Sons, online, 454--468. https://doi.org/10.1002/9781118256107.ch29
[14]
X. Guo and L. Zhao. 2020. A Systematic Survey on Deep Generative Models for Graph Generation. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (2020), 5370--5390.
[15]
M. Kim and J. Leskovec. 2010. Multiplicative Attribute Graph Model of Real-World Networks. In Algorithms and Models for the Web-Graph, R. Kumar and D. Sivakumar (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 62--73.
[16]
C. Kolias, G. Kambourakis, A. Stavrou, and J. Voas. 2017. DDoS in the IoT: Mirai and Other Botnets. Computer 50 (2017), 80--84.
[17]
S. Kumar, X. Zhang, and J. Leskovec. 2019. Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, New York, NY, USA, 1269--1278. https://doi.org/10.1145/3292500.3330895
[18]
J. Leskovec, D. Chakrabarti, J. Kleinberg, and C. Faloutsos. 2005. Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication. In Knowledge Discovery in Databases. Springer, Berlin, 133--145.
[19]
J. Leskovec, J. Kleinberg, and C. Faloutsos. 2005. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM, New York, NY, USA, 177--187. https://doi.org/10.1145/1081870.1081893
[20]
J. Leskovec and R. Sosič. 2016. SNAP: A General-Purpose Network Analysis and Graph-Mining Library. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 1 (2016), 1.
[21]
R. Liao, Y. Li, Y. Song, S. Wang, W. L. Hamilton, D. Duvenaud, R. Urtasun, and R. Zemel. 2019. Efficient graph generation with graph recurrent attention networks. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vol. 383. Curran Associates Inc., Red Hook, NY, USA, 11 pages.
[22]
X. Ma, J. Wu, S. Xue, J. Yang, C. Zhou, Q. Z. Sheng, H. Xiong, and L. Akoglu. 2023. A Comprehensive Survey on Graph Anomaly Detection With Deep Learning. IEEE Transactions on Knowledge and Data Engineering 35, 12 (Dec 2023), 12012--12038. https://doi.org/10.1109/TKDE.2021.3118815
[23]
A. Nottingham, M. Gardner, J. Collyer, J. Lang, M. Veeraraghavan, M. Buchanan, J. Hiser, and J. Davidson. 2019. P-CORE Dataset.
[24]
National Institute of Standards and Technology. 2001. Security Requirements for Cryptographic Modules. Technical Report Federal Information Processing Standards Publications 140-2, Change Notice 2 Dec. 03, 2002. U.S. Department of Commerce, Washington, D.C. https://doi.org/10.6028/nist.fips.140-2
[25]
F. Saracco, R. Di Clemente, A. Gabrielli, and T. Squartini. 2015. Randomizing bipartite networks: the case of the World Trade Web. Scientific Reports 5 (2015), 18 pages.
[26]
L. Theis, A. v. d. Oord, and M. Bethge. 2016. A note on the evaluation of generative models. In 4th International Conference on Learning Representations. ICLR, Caribe Hilton, USA, 10 pages.
[27]
L. van der Maaten and G. E. Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research 9 (2008), 2579--2605.
[28]
D. J.Watts and S. H. Strogatz. 1998. Collective dynamics of 'small-world' networks. Nature 393, 6684 (1998), 440--442. https://doi.org/10.1038/30918
[29]
L. Xu, M. Skoularidou, A. Cuesta-Infante, and K. Veeramachaneni. 2019. Modeling Tabular data using Conditional GAN. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc., Vancouver, Canada.
[30]
M. Yoon, B. Hooi, K. Shin, and C. Faloutsos. 2019. Fast and Accurate Anomaly Detection in Dynamic Graphs with a Two-Pronged Approach. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, New York, USA, 647--657. https://doi.org/10.1145/3292500.3330946
[31]
J. You, R. Ying, X. Ren, W. L. Hamilton, and J. Leskovec. 2018. GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models. In ICML (Proceedings of Machine Learning Research, Vol. 80). PMLR, Stockholm, Sweden, 5694--5703.
[32]
W. Yu, W. Cheng, C. C. Aggarwal, K. Zhang, H. Chen, and W. Wang. 2018. Net-Walk: A Flexible Deep Embedding Approach for Anomaly Detection in Dynamic Networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Association for Computing Machinery, New York, NY, USA, 2672--2681. https://doi.org/10.1145/3219819.3220024
[33]
P. Zhang, J. Wang, X. Li, M. Li, Z. Di, and Y. Fan. 2008. Clustering coefficient and community structure of bipartite networks. Physica A: Statistical Mechanics and its Applications 387, 27 (2008), 6 pages. https://doi.org/10.1016/j.physa.2008.09.006
[34]
S. Zhang, M. Tao, X. Niu, and F. Huffer. 2020. Time-Varying Gaussian-Cauchy Mixture Models for Financial Risk Management. arXiv:2002.06102 [stat.AP]
[35]
W. Zhang, L. Zhang, D. Pfoser, and L. Zhao. 2021. Disentangled Dynamic Graph Deep Generation. Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), virtual, 738--746. https://doi.org/10.1137/1.9781611976700.83
[36]
D. Zhou, L. Zheng, J. Han, and J. He. 2020. A Data-Driven Graph Generative Model for Temporal Interaction Networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, New York, USA, 401--411. https://doi.org/10.1145/3394486.3403082
[37]
Y. Zhu, Y. Du, Y. Wang, Y. Xu, J. Zhang, Q. Liu, and S. Wu. 2022. A Survey on Deep Graph Generation: Methods and Applications. In Proceedings of the First Learning on Graphs Conference (Proceedings of Machine Learning Research, Vol. 198). PMLR, virtual, 47:1--47:21.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '24: Proceedings of the 21st ACM International Conference on Computing Frontiers
May 2024
345 pages
ISBN:9798400705977
DOI:10.1145/3649153
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Author Tags

  1. Graph generation
  2. bipartite graphs
  3. dynamic graphs
  4. ground truth

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

CF '24
Sponsor:

Acceptance Rates

CF '24 Paper Acceptance Rate 33 of 105 submissions, 31%;
Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 106
    Total Downloads
  • Downloads (Last 12 months)106
  • Downloads (Last 6 weeks)24
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media