[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

VinciDecoder: Automatically Interpreting Provenance Graphs into Textual Forensic Reports with Application to OpenStack

  • Conference paper
  • First Online:
Secure IT Systems (NordSec 2022)

Abstract

The operational complexity and dynamicity of clouds highlight the importance of automated solutions for explaining the root cause of security incidents. Most existing works rely on human analysts to interpret provenance graphs for root causes of security incidents. However, navigating and understanding a large and complex cloud-scale provenance graph can be very challenging for human analysts. Without such an understanding, cloud providers cannot effectively address the underlying security issues causing the incidents, such as vulnerabilities or misconfigurations. In this paper, we propose VinciDecoder, an automated approach for generating natural language forensic reports based on provenance graphs. Our main observation is that the way nodes and edges compose a path in provenance graphs is similar to how words compose a sentence in natural languages. Therefore, VinciDecoder leverages a novel combination of provenance analysis, natural language translation, and machine-learning techniques to generate forensic reports. We implement VinciDecoder on an OpenStack cloud testbed, and evaluate its performance based on real-world attacks. Our user study and experimental results demonstrate the effectiveness of our approach in generating high-quality reports (e.g., up to 0.68 BLEU score for precision).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 51.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 64.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    In Sect. 4.2, we discuss how we obtain more pairs of reports and paths for training.

  2. 2.

    Despite removing the numbers, the range of the elapsed time (e.g., milliseconds vs. hours) retains useful information about the incidents.

  3. 3.

    https://pypi.org/project/nlpaug/.

  4. 4.

    Note that while both sets of our experiments in Sect. 4.1 and 4.2 show high quality reports, directly comparing their results is not meaningful as their reports are of incomparable lengths (e.g., cloud management-level provenance graph-based reports are typically longer which has a negative effect on the performance).

  5. 5.

    This study has been identified as quality assurance by Research Ethics/Office of Research of our university, which means it requires no ethics approval.

References

  1. Cisco AVOS. https://github.com/CiscoSystems/avos. Accessed 28 July 2022

  2. CVE-2014-0056. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-0056/. Accessed 28 July 2022

  3. CVE-2015-5240. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-5240. Accessed 28 July 2022

  4. CVE-2016-7498. https://nvd.nist.gov/vuln/detail/CVE-2016-7498. Accessed 28 July 2022

  5. CVE-2020-17376. https://bugs.launchpad.net/nova/+bug/1890501. Accessed 28 July 2022

  6. CVE details. https://www.cvedetails.com/vulnerability-list/. Accessed 14 June 2022

  7. Neo4j Graph Platform. https://neo4j.com/. Accessed 28 July 2022

  8. OpenStack. https://www.openstack.org/. Accessed 28 July 2022

  9. Alsaheel, A., et al.: ATLAS: a sequence-based learning approach for attack investigation. In: USENIX Security, pp. 3005–3022 (2021)

    Google Scholar 

  10. Assila, A., Ezzedine, H., et al.: Standardized usability questionnaires: features and quality focus. eJCIST 6(1) (2016)

    Google Scholar 

  11. Bates, A., Mood, B., Valafar, M., Butler, K.R.B.: Towards secure provenance-based access control in cloud environments. In: CODASPY, pp. 277–284 (2013)

    Google Scholar 

  12. Bhattarai, B., Huang, H.: SteinerLog: prize collecting the audit logs for threat hunting on enterprise network. In: ASIA CCS, pp. 97–108 (2022)

    Google Scholar 

  13. Binyamini, H., Bitton, R., Inokuchi, M., Yagyu, T., Elovici, Y., Shabtai, A.: A framework for modeling cyber attack techniques from security vulnerability descriptions. In: KDD, p. 2574–2583 (2021)

    Google Scholar 

  14. Bleikertz, S., Vogel, C., Groß, T., Mödersheim, S.: Proactive security analysis of changes in virtualized infrastructures. In: ACSAC, pp. 51–60. ACM (2015)

    Google Scholar 

  15. Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13(4), 359–394 (1999)

    Article  Google Scholar 

  16. Chen, X., Irshad, H., Chen, Y., Gehani, A., Yegneswaran, V.: CLARION: sound and clear provenance tracking for microservice deployments. In: USENIX Security, pp. 3989–4006 (2021)

    Google Scholar 

  17. Chiche, A., Yitagesu, B.: Part of speech tagging: a systematic review of deep learning and machine learning approaches. J. Big Data 9(1), 1–25 (2022)

    Article  Google Scholar 

  18. Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. In: SSST, pp. 103–111. ACL (2014)

    Google Scholar 

  19. Fadaee, M., Bisazza, A., Monz, C.: Data augmentation for low-resource neural machine translation. In: ACL, pp. 567–573 (2017)

    Google Scholar 

  20. Gao, P., et al.: Enabling efficient cyber threat hunting with cyber threat intelligence. In: ICDE, pp. 193–204. IEEE (2021)

    Google Scholar 

  21. Hassan, W.U., Aguse, L., Aguse, N., Bates, A., Moyer, T.: Towards scalable cluster auditing through grammatical inference over provenance graphs. In: NDSS (2018)

    Google Scholar 

  22. He, D., Lu, H., Xia, Y., Qin, T., Wang, L., Liu, T.Y.: Decoding with value networks for neural machine translation. Adv. Neural Inf. Process. Syst. 30, 177–186 (2017)

    Google Scholar 

  23. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  24. Johnson, C., Badger, L., Waltermire, D., Snyder, J., Skorupka, C., et al.: Guide to cyber threat information sharing. NIST Spec. Publ. 800, 150 (2016)

    Google Scholar 

  25. King, S.T., Chen, P.M.: Backtracking intrusions. In: SOSP, pp. 223–236 (2003)

    Google Scholar 

  26. Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.: OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL, System Demonstrations, pp. 67–72. ACL (2017)

    Google Scholar 

  27. Koncel-Kedziorski, R., Bekal, D., Luan, Y., Lapata, M., Hajishirzi, H.: Text generation from knowledge graphs with graph transformers. In: NAACL (2019)

    Google Scholar 

  28. Läubli, S., Sennrich, R., Volk, M.: Has machine translation achieved human parity? A case for document-level evaluation. In: EMNLP, pp. 4791–4796. ACL (2018)

    Google Scholar 

  29. Lavie, A.: Evaluating the output of machine translation systems. AMTA Tutor. 86 (2010)

    Google Scholar 

  30. Lebret, R., Grangier, D., Auli, M.: Neural text generation from structured data with application to the biography domain. In: EMNLP, pp. 1203–1213. ACL (2016)

    Google Scholar 

  31. Lopez, A.: Statistical machine translation. ACM Comput. Surv. (CSUR) 40(3), 1–49 (2008)

    Article  Google Scholar 

  32. Lu, R., Lin, X., Liang, X., Shen, X.S.: Secure provenance: the essential of bread and butter of data forensics in cloud computing. In: ASIA CCS, pp. 282–292 (2010)

    Google Scholar 

  33. L’Heureux, A., Grolinger, K., Elyamany, H.F., Capretz, M.A.M.: Machine learning with big data: challenges and approaches. IEEE Access 5, 7776–7797 (2017). https://doi.org/10.1109/ACCESS.2017.2696365

    Article  Google Scholar 

  34. Madi, T., et al.: QuantiC: distance metrics for evaluating multi-tenancy threats in public cloud. In: CloudCom, pp. 163–170. IEEE (2018)

    Google Scholar 

  35. Miao, H., Deshpande, A.: Understanding data science lifecycle provenance via graph segmentation and summarization. In: ICDE, pp. 1710–1713. IEEE (2019)

    Google Scholar 

  36. Milajerdi, S.M., Eshete, B., Gjomemo, R., Venkatakrishnan, V.: POIROT: aligning attack behavior with kernel audit records for cyber threat hunting. In: CCS, pp. 1795–1812 (2019)

    Google Scholar 

  37. Milajerdi, S.M., Gjomemo, R., Eshete, B., Sekar, R., Venkatakrishnan, V.N.: HOLMES: real-time APT detection through correlation of suspicious information flows. In: IEEE S &P, pp. 1137–1152 (2019)

    Google Scholar 

  38. Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  39. Nguyen, D., Park, J., Sandhu, R.: Adopting provenance-based access control in openstack cloud IaaS. In: Au, M.H., Carminati, B., Kuo, C.-C.J. (eds.) NSS 2014. LNCS, vol. 8792, pp. 15–27. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11698-3_2

    Chapter  Google Scholar 

  40. Pasquier, T., et al.: Practical whole-system provenance capture. In: SoCC, pp. 405–418 (2017)

    Google Scholar 

  41. Pasquier, T., et al.: Runtime analysis of whole-system provenance. In: CCS, pp. 1601–1616. ACM (2018)

    Google Scholar 

  42. Puduppully, R., Dong, L., Lapata, M.: Data-to-text generation with content selection and planning. In: AAAI, vol. 33, pp. 6908–6915 (2019)

    Google Scholar 

  43. Santana, M.A.B., Ricca, F., Cuteri, B.: Reducing the impact of out of vocabulary words in the translation of natural language questions into SPARQL queries. arXiv preprint arXiv:2111.03000 (2021)

  44. Satvat, K., Gjomemo, R., Venkatakrishnan, V.: EXTRACTOR: extracting attack behavior from threat reports. In: EuroS &P, pp. 598–615. IEEE (2021)

    Google Scholar 

  45. Sharma, S., El Asri, L., Schulz, H., Zumer, J.: Relevance of unsupervised metrics in task-oriented dialogue for evaluating natural language generation. CoRR abs/1706.09799 (2017)

    Google Scholar 

  46. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 2, 3104–3112 (2014)

    Google Scholar 

  47. Tabiban, A., Jarraya, Y., Zhang, M., Pourzandi, M., Wang, L., Debbabi, M.: Catching falling dominoes: cloud management-level provenance analysis with application to OpenStack. In: CNS, pp. 1–9. IEEE (2020)

    Google Scholar 

  48. Tabiban, A., Majumdar, S., Wang, L., Debbabi, M.: PERMON: An Openstack middleware for runtime security policy enforcement in clouds. In: CNS, pp. 1–7. IEEE (2018)

    Google Scholar 

  49. Tabiban, A., Zhao, H., Jarraya, Y., Pourzandi, M., Zhang, M., Wang, L.: ProvTalk: towards interpretable multi-level provenance analysis in networking functions virtualization (NFV). In: NDSS (2022)

    Google Scholar 

  50. Thirunavukkarasu, S.L., et al.: Modeling NFV deployment to identify the cross-level inconsistency vulnerabilities. In: CloudCom, pp. 167–174. IEEE (2019)

    Google Scholar 

  51. Ujcich, B.E., et al.: Cross-app poisoning in software-defined networking. In: CCS, pp. 648–663 (2018)

    Google Scholar 

  52. Wang, H., Yang, G., Chinprutthiwong, P., Xu, L., Zhang, Y., Gu, G.: Towards fine-grained network security forensics and diagnosis in the SDN era. In: CCS, pp. 3–16. ACM (2018)

    Google Scholar 

  53. Wang, Q., Hassan, W.U., Bates, A., Gunter, C.: Fear and logging in the internet of things. In: NDSS (2018)

    Google Scholar 

  54. Wang, Q., et al.: You are what you do: hunting stealthy malware via data provenance analysis. In: NDSS (2020)

    Google Scholar 

  55. Wang, Y., et al.: TenantGuard: scalable runtime verification of cloud-wide VM-level network isolation. In: NDSS (2017)

    Google Scholar 

  56. Wu, Y., Zhao, M., Haeberlen, A., Zhou, W., Loo, B.T.: Diagnosing missing events in distributed systems with negative provenance. In: ACM SIGCOMM, pp. 383–394 (2014)

    Google Scholar 

  57. Yusif, S., Hafeez-Baig, A.: A conceptual model for cybersecurity governance. J. Appl. Secur. Res. 16(4), 490–513 (2021)

    Article  Google Scholar 

  58. Zeng, J., Chua, Z.L., Chen, Y., Ji, K., Liang, Z., Mao, J.: WATSON: abstracting behaviors from audit logs via aggregation of contextual semantics. In: NDSS (2021)

    Google Scholar 

Download references

Acknowledgment

We thank the anonymous reviewers for their valuable comments. This work was supported by the Natural Sciences and Engineering Research Council of Canada and Ericsson Canada under the Industrial Research Chair in SDN/NFV Security and the Canada Foundation for Innovation under JELF Project 38599.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Azadeh Tabiban .

Editor information

Editors and Affiliations

Appendix

Appendix

Algorithm 2 shows our rule-based mechanism generating reports based on the cloud management-level provenance graphs (e.g., the provenance graph in Fig. 1). To generate fluent sentences, we specify rules for indicating different subjects (line 2–5). We add resources extracted from the names of operations (e.g., a VM in CreateVM) through the template a $resource_type named $main_resource_name (line 7–9). We specify various rules (line 11–20) for describing other affected resources connected to an operation node. We also specify rules to record other information such as the elapsed time between operations (line 21–26). Through such rules specifically designed for each type of operations, resources, and users, VinciDecoder generates reports when there is an insufficient amount of training data for generating high quality reports.

figure b

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tabiban, A., Zhao, H., Jarraya, Y., Pourzandi, M., Wang, L. (2022). VinciDecoder: Automatically Interpreting Provenance Graphs into Textual Forensic Reports with Application to OpenStack. In: Reiser, H.P., Kyas, M. (eds) Secure IT Systems. NordSec 2022. Lecture Notes in Computer Science, vol 13700. Springer, Cham. https://doi.org/10.1007/978-3-031-22295-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22295-5_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22294-8

  • Online ISBN: 978-3-031-22295-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics