VinciDecoder: Automatically Interpreting Provenance Graphs into Textual Forensic Reports with Application to OpenStack

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13700))

Included in the following conference series:

Nordic Conference on Secure IT Systems

769 Accesses
2 Citations

Abstract

The operational complexity and dynamicity of clouds highlight the importance of automated solutions for explaining the root cause of security incidents. Most existing works rely on human analysts to interpret provenance graphs for root causes of security incidents. However, navigating and understanding a large and complex cloud-scale provenance graph can be very challenging for human analysts. Without such an understanding, cloud providers cannot effectively address the underlying security issues causing the incidents, such as vulnerabilities or misconfigurations. In this paper, we propose VinciDecoder, an automated approach for generating natural language forensic reports based on provenance graphs. Our main observation is that the way nodes and edges compose a path in provenance graphs is similar to how words compose a sentence in natural languages. Therefore, VinciDecoder leverages a novel combination of provenance analysis, natural language translation, and machine-learning techniques to generate forensic reports. We implement VinciDecoder on an OpenStack cloud testbed, and evaluate its performance based on real-world attacks. Our user study and experimental results demonstrate the effectiveness of our approach in generating high-quality reports (e.g., up to 0.68 BLEU score for precision).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 51.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

XFedGraph-Hunter: An Interpretable Federated Learning Framework for Hunting Advanced Persistent Threat in Provenance Graph

Automated Reasoning over Provenance-Aware Communication Network Knowledge in Support of Cyber-Situational Awareness

ProvSec: Open Cybersecurity System Provenance Analysis Benchmark Dataset with Labels

Article Open access 15 November 2023

Notes

1.
In Sect. 4.2, we discuss how we obtain more pairs of reports and paths for training.
2.
Despite removing the numbers, the range of the elapsed time (e.g., milliseconds vs. hours) retains useful information about the incidents.
3.
https://pypi.org/project/nlpaug/.
4.
Note that while both sets of our experiments in Sect. 4.1 and 4.2 show high quality reports, directly comparing their results is not meaningful as their reports are of incomparable lengths (e.g., cloud management-level provenance graph-based reports are typically longer which has a negative effect on the performance).
5.
This study has been identified as quality assurance by Research Ethics/Office of Research of our university, which means it requires no ethics approval.

References

Cisco AVOS. https://github.com/CiscoSystems/avos. Accessed 28 July 2022
CVE-2014-0056. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-0056/. Accessed 28 July 2022
CVE-2015-5240. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-5240. Accessed 28 July 2022
CVE-2016-7498. https://nvd.nist.gov/vuln/detail/CVE-2016-7498. Accessed 28 July 2022
CVE-2020-17376. https://bugs.launchpad.net/nova/+bug/1890501. Accessed 28 July 2022
CVE details. https://www.cvedetails.com/vulnerability-list/. Accessed 14 June 2022
Neo4j Graph Platform. https://neo4j.com/. Accessed 28 July 2022
OpenStack. https://www.openstack.org/. Accessed 28 July 2022
Alsaheel, A., et al.: ATLAS: a sequence-based learning approach for attack investigation. In: USENIX Security, pp. 3005–3022 (2021)
Google Scholar
Assila, A., Ezzedine, H., et al.: Standardized usability questionnaires: features and quality focus. eJCIST 6(1) (2016)
Google Scholar
Bates, A., Mood, B., Valafar, M., Butler, K.R.B.: Towards secure provenance-based access control in cloud environments. In: CODASPY, pp. 277–284 (2013)
Google Scholar
Bhattarai, B., Huang, H.: SteinerLog: prize collecting the audit logs for threat hunting on enterprise network. In: ASIA CCS, pp. 97–108 (2022)
Google Scholar
Binyamini, H., Bitton, R., Inokuchi, M., Yagyu, T., Elovici, Y., Shabtai, A.: A framework for modeling cyber attack techniques from security vulnerability descriptions. In: KDD, p. 2574–2583 (2021)
Google Scholar
Bleikertz, S., Vogel, C., Groß, T., Mödersheim, S.: Proactive security analysis of changes in virtualized infrastructures. In: ACSAC, pp. 51–60. ACM (2015)
Google Scholar
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13(4), 359–394 (1999)
Article Google Scholar
Chen, X., Irshad, H., Chen, Y., Gehani, A., Yegneswaran, V.: CLARION: sound and clear provenance tracking for microservice deployments. In: USENIX Security, pp. 3989–4006 (2021)
Google Scholar
Chiche, A., Yitagesu, B.: Part of speech tagging: a systematic review of deep learning and machine learning approaches. J. Big Data 9(1), 1–25 (2022)
Article Google Scholar
Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. In: SSST, pp. 103–111. ACL (2014)
Google Scholar
Fadaee, M., Bisazza, A., Monz, C.: Data augmentation for low-resource neural machine translation. In: ACL, pp. 567–573 (2017)
Google Scholar
Gao, P., et al.: Enabling efficient cyber threat hunting with cyber threat intelligence. In: ICDE, pp. 193–204. IEEE (2021)
Google Scholar
Hassan, W.U., Aguse, L., Aguse, N., Bates, A., Moyer, T.: Towards scalable cluster auditing through grammatical inference over provenance graphs. In: NDSS (2018)
Google Scholar
He, D., Lu, H., Xia, Y., Qin, T., Wang, L., Liu, T.Y.: Decoding with value networks for neural machine translation. Adv. Neural Inf. Process. Syst. 30, 177–186 (2017)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Johnson, C., Badger, L., Waltermire, D., Snyder, J., Skorupka, C., et al.: Guide to cyber threat information sharing. NIST Spec. Publ. 800, 150 (2016)
Google Scholar
King, S.T., Chen, P.M.: Backtracking intrusions. In: SOSP, pp. 223–236 (2003)
Google Scholar
Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.: OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL, System Demonstrations, pp. 67–72. ACL (2017)
Google Scholar
Koncel-Kedziorski, R., Bekal, D., Luan, Y., Lapata, M., Hajishirzi, H.: Text generation from knowledge graphs with graph transformers. In: NAACL (2019)
Google Scholar
Läubli, S., Sennrich, R., Volk, M.: Has machine translation achieved human parity? A case for document-level evaluation. In: EMNLP, pp. 4791–4796. ACL (2018)
Google Scholar
Lavie, A.: Evaluating the output of machine translation systems. AMTA Tutor. 86 (2010)
Google Scholar
Lebret, R., Grangier, D., Auli, M.: Neural text generation from structured data with application to the biography domain. In: EMNLP, pp. 1203–1213. ACL (2016)
Google Scholar
Lopez, A.: Statistical machine translation. ACM Comput. Surv. (CSUR) 40(3), 1–49 (2008)
Article Google Scholar
Lu, R., Lin, X., Liang, X., Shen, X.S.: Secure provenance: the essential of bread and butter of data forensics in cloud computing. In: ASIA CCS, pp. 282–292 (2010)
Google Scholar
L’Heureux, A., Grolinger, K., Elyamany, H.F., Capretz, M.A.M.: Machine learning with big data: challenges and approaches. IEEE Access 5, 7776–7797 (2017). https://doi.org/10.1109/ACCESS.2017.2696365
Article Google Scholar
Madi, T., et al.: QuantiC: distance metrics for evaluating multi-tenancy threats in public cloud. In: CloudCom, pp. 163–170. IEEE (2018)
Google Scholar
Miao, H., Deshpande, A.: Understanding data science lifecycle provenance via graph segmentation and summarization. In: ICDE, pp. 1710–1713. IEEE (2019)
Google Scholar
Milajerdi, S.M., Eshete, B., Gjomemo, R., Venkatakrishnan, V.: POIROT: aligning attack behavior with kernel audit records for cyber threat hunting. In: CCS, pp. 1795–1812 (2019)
Google Scholar
Milajerdi, S.M., Gjomemo, R., Eshete, B., Sekar, R., Venkatakrishnan, V.N.: HOLMES: real-time APT detection through correlation of suspicious information flows. In: IEEE S &P, pp. 1137–1152 (2019)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Nguyen, D., Park, J., Sandhu, R.: Adopting provenance-based access control in openstack cloud IaaS. In: Au, M.H., Carminati, B., Kuo, C.-C.J. (eds.) NSS 2014. LNCS, vol. 8792, pp. 15–27. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11698-3_2
Chapter Google Scholar
Pasquier, T., et al.: Practical whole-system provenance capture. In: SoCC, pp. 405–418 (2017)
Google Scholar
Pasquier, T., et al.: Runtime analysis of whole-system provenance. In: CCS, pp. 1601–1616. ACM (2018)
Google Scholar
Puduppully, R., Dong, L., Lapata, M.: Data-to-text generation with content selection and planning. In: AAAI, vol. 33, pp. 6908–6915 (2019)
Google Scholar
Santana, M.A.B., Ricca, F., Cuteri, B.: Reducing the impact of out of vocabulary words in the translation of natural language questions into SPARQL queries. arXiv preprint arXiv:2111.03000 (2021)
Satvat, K., Gjomemo, R., Venkatakrishnan, V.: EXTRACTOR: extracting attack behavior from threat reports. In: EuroS &P, pp. 598–615. IEEE (2021)
Google Scholar
Sharma, S., El Asri, L., Schulz, H., Zumer, J.: Relevance of unsupervised metrics in task-oriented dialogue for evaluating natural language generation. CoRR abs/1706.09799 (2017)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 2, 3104–3112 (2014)
Google Scholar
Tabiban, A., Jarraya, Y., Zhang, M., Pourzandi, M., Wang, L., Debbabi, M.: Catching falling dominoes: cloud management-level provenance analysis with application to OpenStack. In: CNS, pp. 1–9. IEEE (2020)
Google Scholar
Tabiban, A., Majumdar, S., Wang, L., Debbabi, M.: PERMON: An Openstack middleware for runtime security policy enforcement in clouds. In: CNS, pp. 1–7. IEEE (2018)
Google Scholar
Tabiban, A., Zhao, H., Jarraya, Y., Pourzandi, M., Zhang, M., Wang, L.: ProvTalk: towards interpretable multi-level provenance analysis in networking functions virtualization (NFV). In: NDSS (2022)
Google Scholar
Thirunavukkarasu, S.L., et al.: Modeling NFV deployment to identify the cross-level inconsistency vulnerabilities. In: CloudCom, pp. 167–174. IEEE (2019)
Google Scholar
Ujcich, B.E., et al.: Cross-app poisoning in software-defined networking. In: CCS, pp. 648–663 (2018)
Google Scholar
Wang, H., Yang, G., Chinprutthiwong, P., Xu, L., Zhang, Y., Gu, G.: Towards fine-grained network security forensics and diagnosis in the SDN era. In: CCS, pp. 3–16. ACM (2018)
Google Scholar
Wang, Q., Hassan, W.U., Bates, A., Gunter, C.: Fear and logging in the internet of things. In: NDSS (2018)
Google Scholar
Wang, Q., et al.: You are what you do: hunting stealthy malware via data provenance analysis. In: NDSS (2020)
Google Scholar
Wang, Y., et al.: TenantGuard: scalable runtime verification of cloud-wide VM-level network isolation. In: NDSS (2017)
Google Scholar
Wu, Y., Zhao, M., Haeberlen, A., Zhou, W., Loo, B.T.: Diagnosing missing events in distributed systems with negative provenance. In: ACM SIGCOMM, pp. 383–394 (2014)
Google Scholar
Yusif, S., Hafeez-Baig, A.: A conceptual model for cybersecurity governance. J. Appl. Secur. Res. 16(4), 490–513 (2021)
Article Google Scholar
Zeng, J., Chua, Z.L., Chen, Y., Ji, K., Liang, Z., Mao, J.: WATSON: abstracting behaviors from audit logs via aggregation of contextual semantics. In: NDSS (2021)
Google Scholar

Download references

Acknowledgment

We thank the anonymous reviewers for their valuable comments. This work was supported by the Natural Sciences and Engineering Research Council of Canada and Ericsson Canada under the Industrial Research Chair in SDN/NFV Security and the Canada Foundation for Innovation under JELF Project 38599.

Author information

Authors and Affiliations

CIISE, Concordia University, Montreal, QC, Canada
Azadeh Tabiban, Heyang Zhao & Lingyu Wang
Ericsson Security Research, Ericsson Canada, Montreal, QC, Canada
Yosr Jarraya & Makan Pourzandi

Authors

Azadeh Tabiban
View author publications
You can also search for this author in PubMed Google Scholar
Heyang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yosr Jarraya
View author publications
You can also search for this author in PubMed Google Scholar
Makan Pourzandi
View author publications
You can also search for this author in PubMed Google Scholar
Lingyu Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Azadeh Tabiban .

Editor information

Editors and Affiliations

Reykjavik University, Reykjavik, Iceland
Hans P. Reiser
Reykjavik University, Reykjavik, Iceland
Marcel Kyas

Appendix

Algorithm 2 shows our rule-based mechanism generating reports based on the cloud management-level provenance graphs (e.g., the provenance graph in Fig. 1). To generate fluent sentences, we specify rules for indicating different subjects (line 2–5). We add resources extracted from the names of operations (e.g., a VM in CreateVM) through the template a $resource_type named $main_resource_name (line 7–9). We specify various rules (line 11–20) for describing other affected resources connected to an operation node. We also specify rules to record other information such as the elapsed time between operations (line 21–26). Through such rules specifically designed for each type of operations, resources, and users, VinciDecoder generates reports when there is an insufficient amount of training data for generating high quality reports.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tabiban, A., Zhao, H., Jarraya, Y., Pourzandi, M., Wang, L. (2022). VinciDecoder: Automatically Interpreting Provenance Graphs into Textual Forensic Reports with Application to OpenStack. In: Reiser, H.P., Kyas, M. (eds) Secure IT Systems. NordSec 2022. Lecture Notes in Computer Science, vol 13700. Springer, Cham. https://doi.org/10.1007/978-3-031-22295-5_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-22295-5_19
Published: 01 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22294-8
Online ISBN: 978-3-031-22295-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

VinciDecoder: Automatically Interpreting Provenance Graphs into Textual Forensic Reports with Application to OpenStack

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

XFedGraph-Hunter: An Interpretable Federated Learning Framework for Hunting Advanced Persistent Threat in Provenance Graph

Automated Reasoning over Provenance-Aware Communication Network Knowledge in Support of Cyber-Situational Awareness

ProvSec: Open Cybersecurity System Provenance Analysis Benchmark Dataset with Labels

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

VinciDecoder: Automatically Interpreting Provenance Graphs into Textual Forensic Reports with Application to OpenStack

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

XFedGraph-Hunter: An Interpretable Federated Learning Framework for Hunting Advanced Persistent Threat in Provenance Graph

Automated Reasoning over Provenance-Aware Communication Network Knowledge in Support of Cyber-Situational Awareness

ProvSec: Open Cybersecurity System Provenance Analysis Benchmark Dataset with Labels

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation