More Web Proxy on the site http://driver.im/

research-article

NodeMerge: Template Based Efficient Data Reduction For Big-Data Causality Analysis

Authors:

Qun LiAuthors Info & Claims

CCS '18: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security

Pages 1324 - 1337

https://doi.org/10.1145/3243734.3243763

Published: 15 October 2018 Publication History

Abstract

Today's enterprises are exposed to sophisticated attacks, such as Advanced Persistent Threats~(APT) attacks, which usually consist of stealthy multiple steps. To counter these attacks, enterprises often rely on causality analysis on the system activity data collected from a ubiquitous system monitoring to discover the initial penetration point, and from there identify previously unknown attack steps. However, one major challenge for causality analysis is that the ubiquitous system monitoring generates a colossal amount of data and hosting such a huge amount of data is prohibitively expensive. Thus, there is a strong demand for techniques that reduce the storage of data for causality analysis and yet preserve the quality of the causality analysis. To address this problem, in this paper, we propose NodeMerge, a template based data reduction system for online system event storage. Specifically, our approach can directly work on the stream of system dependency data and achieve data reduction on the read-only file events based on their access patterns. It can either reduce the storage cost or improve the performance of causality analysis under the same budget. Only with a reasonable amount of resource for online data reduction, it nearly completely preserves the accuracy for causality analysis. The reduced form of data can be used directly with little overhead. To evaluate our approach, we conducted a set of comprehensive evaluations, which show that for different categories of workloads, our system can reduce the storage capacity of raw system dependency data by as high as 75.7 times, and the storage capacity of the state-of-the-art approach by as high as 32.6 times. Furthermore, the results also demonstrate that our approach keeps all the causality analysis information and has a reasonably small overhead in memory and hard disk.

Supplementary Material

MP4 File (p1324-jee.mp4)

Download
383.62 MB

References

[1]

abcNEWS. 2015. Anthem Cyber Attack. http://abcnews.go.com/Business/anthem-cyber-attack-things-happen-personal-information/story?id=28747729 Retrieved August 2017 from

[2]

J. A. Ambrose, J. Peddersen, S. Parameswaran, A. Labios, and Y. Yachide. 2014. SDG2KPN: System Dependency Graph to function-level KPN generation of legacy code for MPSoCs 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC). 267--273.

[3]

The Linux audit framework. 2016. https://wiki.archlinux.org/index.php/Audit_framework.

[4]

Adam Bates, Wajih Ul Hassan, Kevin Butler, Alin Dobra, Bradley Reaves, Patrick Cable, Thomas Moyer, and Nabil Schear. 2017 a. Transparent Web Service Auditing via Network Provenance Functions Proceedings of the 26th International Conference on World Wide Web (WWW '17).

Digital Library

[5]

Adam Bates, Dave (Jing) Tian, Grant Hernandez, Thomas Moyer, Kevin R. B. Butler, and Trent Jaeger. 2017 b. Taming the Costs of Trustworthy Provenance Through Policy Reduction. ACM Trans. Internet Technol. Vol. 17, 4 (Sept. 2017).

Digital Library

[6]

Sören Bleikertz, Carsten Vogel, and Thomas Groß. 2014. Cloud radar: near real-time detection of security failures in dynamic virtualized infrastructures. In Proceedings of the 30th Annual Computer Security Applications Conference. ACM, 26--35.

Digital Library

[7]

Tom Brant and Joel Santo Domingo. 2018. SSD vs. HDD: What's the Difference? https://www.pcmag.com/article2/0,2817,2404258,00.asp.

[8]

Adriane P. Chapman, H. V. Jagadish, and Prakash Ramanan. 2008. Efficient Provenance Storage. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08). ACM, New York, NY, USA, 993--1006.

Digital Library

[9]

DARKReading. 2011. Sony reports 24.5 million more accounts hacked. http://www.darkreading.com/attacks-and-breaches/sony-reports-245-million-more-accounts-hacked/d/d-id/1097499 Retrieved August 2017 from

[10]

Barbara Filkins. 2016. IT Security Spending Trends. https://www.sans.org/reading-room/whitepapers/analyst/security-spending-trends-36697.

[11]

Forbes. 2017. Equifax Data Breach Impacts 143 Million Americans. https://www.forbes.com/sites/leemathews/2017/09/07/equifax-data-breach-impacts-143-million-americans/a7bd9db356f8.

[12]

Peng Gao, Xusheng Xiao, Ding Li, Zhichun Li, Kangkook Jee, Zhenyu Wu, Chung Hwan Kim, Sanjeev R. Kulkarni, and Prateek Mittal. 2018 a. SAQL: A Stream-based Query System for Real-Time Abnormal System Behavior Detection 27th USENIX Security Symposium (USENIX Security 18). USENIX Association, Baltimore, MD, 639--656. https://www.usenix.org/conference/usenixsecurity18/presentation/gao-peng

Digital Library

[13]

Peng Gao, Xusheng Xiao, Zhichun Li, Fengyuan Xu, Sanjeev R. Kulkarni, and Prateek Mittal. 2018 b. AIQL: Enabling Efficient Attack Investigation from System Monitoring Data 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 113--126. https://www.usenix.org/conference/atc18/presentation/gao

Digital Library

[14]

Ashvin Goel, Kenneth Po, Kamran Farhadi, Zheng Li, and Eyal de Lara. 2005. The Taser Intrusion Recovery System. In Proceedings of the Twentieth ACM Symposium on Operating Systems Principles (SOSP '05). ACM, New York, NY, USA, 163--176.

Digital Library

[15]

Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining Frequent Patterns Without Candidate Generation Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD '00). ACM, New York, NY, USA, 1--12.

Digital Library

[16]

Ragib Hasan, Radu Sion, and Marianne Winslett. 2009 a. Preventing history forgery with secure provenance. ACM Transactions on Storage (TOS) Vol. 5, 4 (2009), 12.

Digital Library

[17]

Ragib Hasan, Radu Sion, and Marianne Winslett. 2009 b. Sprov 2.0: A highly-configurable platform-independent library for secure provenance. In ACM Conference on Computer and Communications Security (CCS).

[18]

Xuxian Jiang, A. Walters, Dongyan Xu, E. H. Spafford, F. Buchholz, and Yi-Min Wang. 2006. Provenance-Aware Tracing ofWorm Break-in and Contaminations: A Process Coloring Approach. In 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06). 38--38.

Digital Library

[19]

Vishal Karande, Erick Bauman, Zhiqiang Lin, and Latifur Khan. 2017. SGX-Log: Securing System Logs With SGX. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security (ASIA CCS '17). ACM, New York, NY, USA, 19--30.

Digital Library

[20]

Samuel T. King and Peter M. Chen. 2003. Backtracking Intrusions. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP '03). ACM, New York, NY, USA, 223--236.

Digital Library

[21]

SR Kodituwakku and US Amarasinghe. 2010. Comparison of lossless data compression algorithms for text data. Indian journal of computer science and engineering Vol. 1, 4 (2010), 416--425.

[22]

Srinivas Krishnan, Kevin Z. Snow, and Fabian Monrose. 2010. Trail of Bytes: Efficient Support for Forensic Analysis Proceedings of the 17th ACM Conference on Computer and Communications Security (CCS '10). ACM, New York, NY, USA, 50--60.

Digital Library

[23]

Kyu Hyung Lee, Xiangyu Zhang, and Dongyan Xu. 2013 a. High Accuracy Attack Provenance via Binary-based Execution Partition. NDSS.

[24]

Kyu Hyung Lee, Xiangyu Zhang, and Dongyan Xu. 2013 b. LogGC: garbage collecting audit log. In Proceedings of the 2013 ACM SIGSAC conference on Computer; communications security (CCS '13). ACM, New York, NY, USA, 1005--1016.

Digital Library

[25]

Haoyuan Li, Yi Wang, Dong Zhang, Ming Zhang, and Edward Y. Chang. 2008. Pfp: Parallel Fp-growth for Query Recommendation. In Proceedings of the 2008 ACM Conference on Recommender Systems (RecSys '08). ACM, New York, NY, USA, 107--114.

Digital Library

[26]

J. Liu, C. Fang, and N. Ansari. 2014. Identifying user clicks based on dependency graph. In 2014 23rd Wireless and Optical Communication Conference (WOCC). 1--5.

[27]

Yushan Liu, Mu Zhang, Ding Li, Kangkook Jee, Zhichun Li, Zhenyu Wu, Junghwan Rhee, and Prateek Mittal. 2018. Towards a Timely Causality Analysis for Enterprise Security Proceedings of NDSS Symposium 2018.

[28]

Gordon Fyodor Lyon. 2009. Nmap network scanning: The official Nmap project guide to network discovery and security scanning. Insecure.

Digital Library

[29]

Shiqing Ma, Kyu Hyung Lee, Chung Hwan Kim, Junghwan Rhee, Xiangyu Zhang, and Dongyan Xu. 2015. Accurate, Low Cost and Instrumentation-Free Security Audit Logging for Windows Proceedings of the 31st Annual Computer Security Applications Conference (ACSAC 2015). ACM, New York, NY, USA, 401--410.

Digital Library

[30]

Shiqing Ma, Xiangyu Zhang, and Dongyan Xu. 2016. ProTracer: Towards Practical Provenance Tracing by Alternating Between Logging and Tainting. In NDSS.

[31]

Microsoft. 2017. ETW events in the common language runtime. https://msdn.microsoft.com/en-us/library/ff357719(v=vs.110).aspx.

[32]

State Minimization/Reduction. {n. d.}. http://www2.elo.utfsm.cl/ lsb/elo211/aplicaciones/katz/chapter9/chapter09.doc2.html.

[33]

J. Ouyang, H. Luo, Z. Wang, J. Tian, C. Liu, and K. Sheng. 2010. FPGA implementation of GZIP compression and decompression for IDC services 2010 International Conference on Field-Programmable Technology. 265--268.

[34]

Igor Pavlov. 2014. 7-zip. (2014).

[35]

Amazon S3 Price. {n. d.}. https://aws.amazon.com/s3/pricing/.

[36]

Redhat. 2017. The Linux audit framework. https://github.com/linux-audit/.

[37]

M. Rezvani, A. Ignjatovic, E. Bertino, and S. Jha. 2014. Provenance-aware security risk analysis for hosts and network flows 2014 IEEE Network Operations and Management Symposium (NOMS). 1--8.

[38]

Julian Seward. 1998. bzip2.

[39]

S. Sitaraman and S. Venkatesan. 2005. Forensic analysis of file system intrusions using improved backtracking Third IEEE International Workshop on Information Assurance (IWIA'05). 154--163.

Digital Library

[40]

Y. Tan, H. Jiang, D. Feng, L. Tian, and Z. Yan. 2011. CABdedupe: A Causality-Based Deduplication Performance Booster for Cloud Backup Services. In 2011 IEEE International Parallel Distributed Processing Symposium.

Digital Library

[41]

Techcrunch. 2017. Target Says Credit Card Data Breach Cost It 162M Dollars In 2013--14. https://techcrunch.com/2015/02/25/target-says-credit-card-data-breach-cost-it-162m-in-2013--14/.

[42]

Ke Wang, Liu Tang, Jiawei Han, and Junqiang Liu. 2002. Top Down FP-Growth for Association Rule Mining. Springer Berlin Heidelberg, Berlin, Heidelberg, 334--340.

[43]

Yulai Xie, Dan Feng, Zhipeng Tan, Lei Chen, Kiran-Kumar Muniswamy-Reddy, Yan Li, and Darrell D.E. Long. 2012. A Hybrid Approach for Efficient Provenance Storage Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM '12). ACM, New York, NY, USA, 1752--1756.

Digital Library

[44]

Xi Xu, Rashid Ansari, Ashfaq Khokhar, and Athanasios V. Vasilakos. 2015. Hierarchical Data Aggregation Using Compressive Sensing (HDACS) in WSNs. ACM Trans. Sen. Netw. Vol. 11, 3, Article bibinfoarticleno45 (Feb. 2015), bibinfonumpages25 pages.

Digital Library

[45]

Zhang Xu, Zhenyu Wu, Zhichun Li, Kangkook Jee, Junghwan Rhee, Xusheng Xiao, Fengyuan Xu, Haining Wang, and Guofei Jiang. 2016. High Fidelity Data Reduction for Big Data Security Dependency Analyses Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS '16). ACM, New York, NY, USA, 504--516.

Digital Library

[46]

En-Hui Yang and John C Kieffer. 2000. Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform. I. Without context models. IEEE Transactions on Information Theory Vol. 46, 3 (2000), 755--777.

Digital Library

[47]

Hao Zhang, Danfeng Daphne Yao, and Naren Ramakrishnan. 2014 b. Detection of Stealthy Malware Activities with Traffic Causality and Scalable Triggering Relation Discovery. In Proceedings of the 9th ACM Symposium on Information, Computer and Communications Security (ASIA CCS '14). ACM, New York, NY, USA, 39--50.

Digital Library

[48]

Hao Zhang, Danfeng (Daphne) Yao, Naren Ramakrishnan, and Zhibin Zhang. 2016. Causality Reasoning About Network Events for Detecting Stealthy Malware Activities. Comput. Secur. Vol. 58, C (May. 2016), 180--198.

Digital Library

[49]

Mu Zhang, Yue Duan, Heng Yin, and Zhiruo Zhao. 2014 a. Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS '14). ACM, New York, NY, USA, 1105--1116.

Digital Library

[50]

M. H. Zibaeenejad and J. G. Thistle. 2015. Dependency graph: An algorithm for analysis of generalized parameterized networks 2015 American Control Conference (ACC). 696--702.

[51]

T. Zimmermann and N. Nagappan. 2007. Predicting Subsystem Failures using Dependency Graph Complexities The 18th IEEE International Symposium on Software Reliability (ISSRE '07). 227--236.

Digital Library

Cited By

CHEN CWAN HZHAO X(2024)Log refusion: adversarial attacks against the integrity of application logs and defense methodsSCIENTIA SINICA Informationis10.1360/SSI-2024-004254:9(2157)Online publication date: 10-Sep-2024
https://doi.org/10.1360/SSI-2024-0042
Zhang BGao YKuang BYu CFu ASusilo W(2024)A Survey on Advanced Persistent Threat Detection: A Unified Framework, Challenges, and CountermeasuresACM Computing Surveys10.1145/370074957:3(1-36)Online publication date: 11-Nov-2024
https://dl.acm.org/doi/10.1145/3700749
Sun HWang SWang ZJiang ZHan DYang J(2024)AudiTrim: A Real-time, General, Efficient, and Low-overhead Data Compaction System for Intrusion DetectionProceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3678890.3679048(263-277)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3678890.3679048
Show More Cited By

Index Terms

NodeMerge: Template Based Efficient Data Reduction For Big-Data Causality Analysis
1. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation

Recommendations

High Fidelity Data Reduction for Big Data Security Dependency Analyses
CCS '16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security

Intrusive multi-step attacks, such as Advanced Persistent Threat (APT) attacks, have plagued enterprises with significant financial losses and are the top reason for enterprises to increase their security budgets. Since these attacks are sophisticated ...
Data De-duplication for Primary Storage System
ICIS '12: Proceedings of the 2012 IEEE/ACIS 11th International Conference on Computer and Information Science

Data De-duplication has being used comprehensively within Disk-based Backups, Archives and Disaster Recovery. This technology's high data reduction ratio and satisfied performance in secondary data storage system has aroused huge interests in other ...
Data deduplication techniques for efficient cloud storage management: a systematic review

The exponential growth of digital data in cloud storage systems is a critical issue presently as a large amount of duplicate data in the storage systems exerts an extra load on it. Deduplication is an efficient technique that has gained attention in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CCS '18: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security

October 2018

2359 pages

ISBN:9781450356930

DOI:10.1145/3243734

General Chairs:
David Lie
University of Toronto
,
Mohammad Mannan
Concordia University
,
Program Chairs:
Michael Backes
CISPA Helmholtz Center i.G.
,
XiaoFeng Wang
Indiana University

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSFC

Conference

CCS '18

Sponsor:

SIGSAC

CCS '18: 2018 ACM SIGSAC Conference on Computer and Communications Security

October 15 - 19, 2018

Toronto, Canada

Acceptance Rates

CCS '18 Paper Acceptance Rate 134 of 809 submissions, 17%;

Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

56
Total Citations
View Citations
907
Total Downloads

Downloads (Last 12 months)74
Downloads (Last 6 weeks)8

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

CHEN CWAN HZHAO X(2024)Log refusion: adversarial attacks against the integrity of application logs and defense methodsSCIENTIA SINICA Informationis10.1360/SSI-2024-004254:9(2157)Online publication date: 10-Sep-2024
https://doi.org/10.1360/SSI-2024-0042
Zhang BGao YKuang BYu CFu ASusilo W(2024)A Survey on Advanced Persistent Threat Detection: A Unified Framework, Challenges, and CountermeasuresACM Computing Surveys10.1145/370074957:3(1-36)Online publication date: 11-Nov-2024
https://dl.acm.org/doi/10.1145/3700749
Sun HWang SWang ZJiang ZHan DYang J(2024)AudiTrim: A Real-time, General, Efficient, and Low-overhead Data Compaction System for Intrusion DetectionProceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3678890.3679048(263-277)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3678890.3679048
Sang AWang YYang LJia JZhou L(2024)Obfuscating Provenance-Based Forensic Investigations with Mapping System Meta-BehaviorProceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3678890.3678916(248-262)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3678890.3678916
Jiang PXiao JLi DYu HBai YGuo YChen X(2024)Detecting Malicious Websites From the Perspective of System Provenance AnalysisIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.327761321:3(1406-1423)Online publication date: May-2024
https://doi.org/10.1109/TDSC.2023.3277613
Sekar RKimm HAich R(2024) eAudit: A Fast, Scalable and Deployable Audit Data Collection System * 2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00087(3571-3589)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00087
Chang BWang ZLi SZhou FWen YZhang B(2024)TurboLog: A Turbocharged Lossless Compression Method for System Logs via Transformer2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10649957(1-10)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10649957
Henriques JCaldeira FCruz TSimões P(2024)A Survey on Forensics and Compliance Auditing for Critical Infrastructure ProtectionIEEE Access10.1109/ACCESS.2023.334855212(2409-2444)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2023.3348552
Zipperle MZhang YChang EDillon T(2024)PARGMFJournal of Information Security and Applications10.1016/j.jisa.2023.10368281:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.jisa.2023.103682
Yue HLi TWu DZhang RYang Z(2024)Detecting APT attacks using an attack intent-driven and sequence-based learning approachComputers and Security10.1016/j.cose.2024.103748140:COnline publication date: 1-May-2024
https://dl.acm.org/doi/10.1016/j.cose.2024.103748
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents