[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3243734.3243763acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

NodeMerge: Template Based Efficient Data Reduction For Big-Data Causality Analysis

Published: 15 October 2018 Publication History

Abstract

Today's enterprises are exposed to sophisticated attacks, such as Advanced Persistent Threats~(APT) attacks, which usually consist of stealthy multiple steps. To counter these attacks, enterprises often rely on causality analysis on the system activity data collected from a ubiquitous system monitoring to discover the initial penetration point, and from there identify previously unknown attack steps. However, one major challenge for causality analysis is that the ubiquitous system monitoring generates a colossal amount of data and hosting such a huge amount of data is prohibitively expensive. Thus, there is a strong demand for techniques that reduce the storage of data for causality analysis and yet preserve the quality of the causality analysis. To address this problem, in this paper, we propose NodeMerge, a template based data reduction system for online system event storage. Specifically, our approach can directly work on the stream of system dependency data and achieve data reduction on the read-only file events based on their access patterns. It can either reduce the storage cost or improve the performance of causality analysis under the same budget. Only with a reasonable amount of resource for online data reduction, it nearly completely preserves the accuracy for causality analysis. The reduced form of data can be used directly with little overhead. To evaluate our approach, we conducted a set of comprehensive evaluations, which show that for different categories of workloads, our system can reduce the storage capacity of raw system dependency data by as high as 75.7 times, and the storage capacity of the state-of-the-art approach by as high as 32.6 times. Furthermore, the results also demonstrate that our approach keeps all the causality analysis information and has a reasonably small overhead in memory and hard disk.

Supplementary Material

MP4 File (p1324-jee.mp4)

References

[1]
abcNEWS. 2015. Anthem Cyber Attack. http://abcnews.go.com/Business/anthem-cyber-attack-things-happen-personal-information/story?id=28747729 Retrieved August 2017 from
[2]
J. A. Ambrose, J. Peddersen, S. Parameswaran, A. Labios, and Y. Yachide. 2014. SDG2KPN: System Dependency Graph to function-level KPN generation of legacy code for MPSoCs 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC). 267--273.
[3]
The Linux audit framework. 2016. https://wiki.archlinux.org/index.php/Audit_framework.
[4]
Adam Bates, Wajih Ul Hassan, Kevin Butler, Alin Dobra, Bradley Reaves, Patrick Cable, Thomas Moyer, and Nabil Schear. 2017 a. Transparent Web Service Auditing via Network Provenance Functions Proceedings of the 26th International Conference on World Wide Web (WWW '17).
[5]
Adam Bates, Dave (Jing) Tian, Grant Hernandez, Thomas Moyer, Kevin R. B. Butler, and Trent Jaeger. 2017 b. Taming the Costs of Trustworthy Provenance Through Policy Reduction. ACM Trans. Internet Technol. Vol. 17, 4 (Sept. 2017).
[6]
Sören Bleikertz, Carsten Vogel, and Thomas Groß. 2014. Cloud radar: near real-time detection of security failures in dynamic virtualized infrastructures. In Proceedings of the 30th Annual Computer Security Applications Conference. ACM, 26--35.
[7]
Tom Brant and Joel Santo Domingo. 2018. SSD vs. HDD: What's the Difference? https://www.pcmag.com/article2/0,2817,2404258,00.asp.
[8]
Adriane P. Chapman, H. V. Jagadish, and Prakash Ramanan. 2008. Efficient Provenance Storage. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08). ACM, New York, NY, USA, 993--1006.
[9]
DARKReading. 2011. Sony reports 24.5 million more accounts hacked. http://www.darkreading.com/attacks-and-breaches/sony-reports-245-million-more-accounts-hacked/d/d-id/1097499 Retrieved August 2017 from
[10]
Barbara Filkins. 2016. IT Security Spending Trends. https://www.sans.org/reading-room/whitepapers/analyst/security-spending-trends-36697.
[11]
Forbes. 2017. Equifax Data Breach Impacts 143 Million Americans. https://www.forbes.com/sites/leemathews/2017/09/07/equifax-data-breach-impacts-143-million-americans/a7bd9db356f8.
[12]
Peng Gao, Xusheng Xiao, Ding Li, Zhichun Li, Kangkook Jee, Zhenyu Wu, Chung Hwan Kim, Sanjeev R. Kulkarni, and Prateek Mittal. 2018 a. SAQL: A Stream-based Query System for Real-Time Abnormal System Behavior Detection 27th USENIX Security Symposium (USENIX Security 18). USENIX Association, Baltimore, MD, 639--656. https://www.usenix.org/conference/usenixsecurity18/presentation/gao-peng
[13]
Peng Gao, Xusheng Xiao, Zhichun Li, Fengyuan Xu, Sanjeev R. Kulkarni, and Prateek Mittal. 2018 b. AIQL: Enabling Efficient Attack Investigation from System Monitoring Data 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 113--126. https://www.usenix.org/conference/atc18/presentation/gao
[14]
Ashvin Goel, Kenneth Po, Kamran Farhadi, Zheng Li, and Eyal de Lara. 2005. The Taser Intrusion Recovery System. In Proceedings of the Twentieth ACM Symposium on Operating Systems Principles (SOSP '05). ACM, New York, NY, USA, 163--176.
[15]
Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining Frequent Patterns Without Candidate Generation Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD '00). ACM, New York, NY, USA, 1--12.
[16]
Ragib Hasan, Radu Sion, and Marianne Winslett. 2009 a. Preventing history forgery with secure provenance. ACM Transactions on Storage (TOS) Vol. 5, 4 (2009), 12.
[17]
Ragib Hasan, Radu Sion, and Marianne Winslett. 2009 b. Sprov 2.0: A highly-configurable platform-independent library for secure provenance. In ACM Conference on Computer and Communications Security (CCS).
[18]
Xuxian Jiang, A. Walters, Dongyan Xu, E. H. Spafford, F. Buchholz, and Yi-Min Wang. 2006. Provenance-Aware Tracing ofWorm Break-in and Contaminations: A Process Coloring Approach. In 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06). 38--38.
[19]
Vishal Karande, Erick Bauman, Zhiqiang Lin, and Latifur Khan. 2017. SGX-Log: Securing System Logs With SGX. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security (ASIA CCS '17). ACM, New York, NY, USA, 19--30.
[20]
Samuel T. King and Peter M. Chen. 2003. Backtracking Intrusions. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP '03). ACM, New York, NY, USA, 223--236.
[21]
SR Kodituwakku and US Amarasinghe. 2010. Comparison of lossless data compression algorithms for text data. Indian journal of computer science and engineering Vol. 1, 4 (2010), 416--425.
[22]
Srinivas Krishnan, Kevin Z. Snow, and Fabian Monrose. 2010. Trail of Bytes: Efficient Support for Forensic Analysis Proceedings of the 17th ACM Conference on Computer and Communications Security (CCS '10). ACM, New York, NY, USA, 50--60.
[23]
Kyu Hyung Lee, Xiangyu Zhang, and Dongyan Xu. 2013 a. High Accuracy Attack Provenance via Binary-based Execution Partition. NDSS.
[24]
Kyu Hyung Lee, Xiangyu Zhang, and Dongyan Xu. 2013 b. LogGC: garbage collecting audit log. In Proceedings of the 2013 ACM SIGSAC conference on Computer; communications security (CCS '13). ACM, New York, NY, USA, 1005--1016.
[25]
Haoyuan Li, Yi Wang, Dong Zhang, Ming Zhang, and Edward Y. Chang. 2008. Pfp: Parallel Fp-growth for Query Recommendation. In Proceedings of the 2008 ACM Conference on Recommender Systems (RecSys '08). ACM, New York, NY, USA, 107--114.
[26]
J. Liu, C. Fang, and N. Ansari. 2014. Identifying user clicks based on dependency graph. In 2014 23rd Wireless and Optical Communication Conference (WOCC). 1--5.
[27]
Yushan Liu, Mu Zhang, Ding Li, Kangkook Jee, Zhichun Li, Zhenyu Wu, Junghwan Rhee, and Prateek Mittal. 2018. Towards a Timely Causality Analysis for Enterprise Security Proceedings of NDSS Symposium 2018.
[28]
Gordon Fyodor Lyon. 2009. Nmap network scanning: The official Nmap project guide to network discovery and security scanning. Insecure.
[29]
Shiqing Ma, Kyu Hyung Lee, Chung Hwan Kim, Junghwan Rhee, Xiangyu Zhang, and Dongyan Xu. 2015. Accurate, Low Cost and Instrumentation-Free Security Audit Logging for Windows Proceedings of the 31st Annual Computer Security Applications Conference (ACSAC 2015). ACM, New York, NY, USA, 401--410.
[30]
Shiqing Ma, Xiangyu Zhang, and Dongyan Xu. 2016. ProTracer: Towards Practical Provenance Tracing by Alternating Between Logging and Tainting. In NDSS.
[31]
Microsoft. 2017. ETW events in the common language runtime. https://msdn.microsoft.com/en-us/library/ff357719(v=vs.110).aspx.
[32]
State Minimization/Reduction. {n. d.}. http://www2.elo.utfsm.cl/ lsb/elo211/aplicaciones/katz/chapter9/chapter09.doc2.html.
[33]
J. Ouyang, H. Luo, Z. Wang, J. Tian, C. Liu, and K. Sheng. 2010. FPGA implementation of GZIP compression and decompression for IDC services 2010 International Conference on Field-Programmable Technology. 265--268.
[34]
Igor Pavlov. 2014. 7-zip. (2014).
[35]
Amazon S3 Price. {n. d.}. https://aws.amazon.com/s3/pricing/.
[36]
Redhat. 2017. The Linux audit framework. https://github.com/linux-audit/.
[37]
M. Rezvani, A. Ignjatovic, E. Bertino, and S. Jha. 2014. Provenance-aware security risk analysis for hosts and network flows 2014 IEEE Network Operations and Management Symposium (NOMS). 1--8.
[38]
Julian Seward. 1998. bzip2.
[39]
S. Sitaraman and S. Venkatesan. 2005. Forensic analysis of file system intrusions using improved backtracking Third IEEE International Workshop on Information Assurance (IWIA'05). 154--163.
[40]
Y. Tan, H. Jiang, D. Feng, L. Tian, and Z. Yan. 2011. CABdedupe: A Causality-Based Deduplication Performance Booster for Cloud Backup Services. In 2011 IEEE International Parallel Distributed Processing Symposium.
[41]
Techcrunch. 2017. Target Says Credit Card Data Breach Cost It 162M Dollars In 2013--14. https://techcrunch.com/2015/02/25/target-says-credit-card-data-breach-cost-it-162m-in-2013--14/.
[42]
Ke Wang, Liu Tang, Jiawei Han, and Junqiang Liu. 2002. Top Down FP-Growth for Association Rule Mining. Springer Berlin Heidelberg, Berlin, Heidelberg, 334--340.
[43]
Yulai Xie, Dan Feng, Zhipeng Tan, Lei Chen, Kiran-Kumar Muniswamy-Reddy, Yan Li, and Darrell D.E. Long. 2012. A Hybrid Approach for Efficient Provenance Storage Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM '12). ACM, New York, NY, USA, 1752--1756.
[44]
Xi Xu, Rashid Ansari, Ashfaq Khokhar, and Athanasios V. Vasilakos. 2015. Hierarchical Data Aggregation Using Compressive Sensing (HDACS) in WSNs. ACM Trans. Sen. Netw. Vol. 11, 3, Article bibinfoarticleno45 (Feb. 2015), bibinfonumpages25 pages.
[45]
Zhang Xu, Zhenyu Wu, Zhichun Li, Kangkook Jee, Junghwan Rhee, Xusheng Xiao, Fengyuan Xu, Haining Wang, and Guofei Jiang. 2016. High Fidelity Data Reduction for Big Data Security Dependency Analyses Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS '16). ACM, New York, NY, USA, 504--516.
[46]
En-Hui Yang and John C Kieffer. 2000. Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform. I. Without context models. IEEE Transactions on Information Theory Vol. 46, 3 (2000), 755--777.
[47]
Hao Zhang, Danfeng Daphne Yao, and Naren Ramakrishnan. 2014 b. Detection of Stealthy Malware Activities with Traffic Causality and Scalable Triggering Relation Discovery. In Proceedings of the 9th ACM Symposium on Information, Computer and Communications Security (ASIA CCS '14). ACM, New York, NY, USA, 39--50.
[48]
Hao Zhang, Danfeng (Daphne) Yao, Naren Ramakrishnan, and Zhibin Zhang. 2016. Causality Reasoning About Network Events for Detecting Stealthy Malware Activities. Comput. Secur. Vol. 58, C (May. 2016), 180--198.
[49]
Mu Zhang, Yue Duan, Heng Yin, and Zhiruo Zhao. 2014 a. Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS '14). ACM, New York, NY, USA, 1105--1116.
[50]
M. H. Zibaeenejad and J. G. Thistle. 2015. Dependency graph: An algorithm for analysis of generalized parameterized networks 2015 American Control Conference (ACC). 696--702.
[51]
T. Zimmermann and N. Nagappan. 2007. Predicting Subsystem Failures using Dependency Graph Complexities The 18th IEEE International Symposium on Software Reliability (ISSRE '07). 227--236.

Cited By

View all
  • (2024)Log refusion: adversarial attacks against the integrity of application logs and defense methodsSCIENTIA SINICA Informationis10.1360/SSI-2024-004254:9(2157)Online publication date: 10-Sep-2024
  • (2024)A Survey on Advanced Persistent Threat Detection: A Unified Framework, Challenges, and CountermeasuresACM Computing Surveys10.1145/370074957:3(1-36)Online publication date: 11-Nov-2024
  • (2024)AudiTrim: A Real-time, General, Efficient, and Low-overhead Data Compaction System for Intrusion DetectionProceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3678890.3679048(263-277)Online publication date: 30-Sep-2024
  • Show More Cited By

Index Terms

  1. NodeMerge: Template Based Efficient Data Reduction For Big-Data Causality Analysis

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CCS '18: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security
    October 2018
    2359 pages
    ISBN:9781450356930
    DOI:10.1145/3243734
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 October 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data reduction
    2. security

    Qualifiers

    • Research-article

    Funding Sources

    • NSFC

    Conference

    CCS '18
    Sponsor:

    Acceptance Rates

    CCS '18 Paper Acceptance Rate 134 of 809 submissions, 17%;
    Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

    Upcoming Conference

    CCS '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)74
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Log refusion: adversarial attacks against the integrity of application logs and defense methodsSCIENTIA SINICA Informationis10.1360/SSI-2024-004254:9(2157)Online publication date: 10-Sep-2024
    • (2024)A Survey on Advanced Persistent Threat Detection: A Unified Framework, Challenges, and CountermeasuresACM Computing Surveys10.1145/370074957:3(1-36)Online publication date: 11-Nov-2024
    • (2024)AudiTrim: A Real-time, General, Efficient, and Low-overhead Data Compaction System for Intrusion DetectionProceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3678890.3679048(263-277)Online publication date: 30-Sep-2024
    • (2024)Obfuscating Provenance-Based Forensic Investigations with Mapping System Meta-BehaviorProceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3678890.3678916(248-262)Online publication date: 30-Sep-2024
    • (2024)Detecting Malicious Websites From the Perspective of System Provenance AnalysisIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.327761321:3(1406-1423)Online publication date: May-2024
    • (2024) eAudit: A Fast, Scalable and Deployable Audit Data Collection System * 2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00087(3571-3589)Online publication date: 19-May-2024
    • (2024)TurboLog: A Turbocharged Lossless Compression Method for System Logs via Transformer2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10649957(1-10)Online publication date: 30-Jun-2024
    • (2024)A Survey on Forensics and Compliance Auditing for Critical Infrastructure ProtectionIEEE Access10.1109/ACCESS.2023.334855212(2409-2444)Online publication date: 2024
    • (2024)PARGMFJournal of Information Security and Applications10.1016/j.jisa.2023.10368281:COnline publication date: 1-Mar-2024
    • (2024)Detecting APT attacks using an attack intent-driven and sequence-based learning approachComputers and Security10.1016/j.cose.2024.103748140:COnline publication date: 1-May-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media