Abstract
Inferring aliasing and buffer-size information is important to understanding a C program's memory layout, which is critical to program analysis and security-related tasks. However, traditional static and dynamic program analysis methods suffer from certain limitations: static alias analysis methods suffer from precision loss and have poor scalability. Meanwhile, although dynamic analysis can achieve high precision, there is no soundness guarantee, and an online analysis may cause non-negligible runtime overhead. Besides, the current methods can only capture aliasing information. As for the buffer-size relational information, which is the specific variable storing the size of the buffer pointed by the pointers, it is tough to analyze by traditional methods. Moreover, we observe that most methods are designed for specific information. To address these limitations, we present ABSLearn, a deep learning framework that is capable of retrieving both aliasing and buffer-size information from C programs. The core idea of ABSLearn is to formulate the information retrieval as a link prediction problem, where a Graph Neural Network (GNN) model is applied to solve the problem. We developed the first related dataset that contains 285 C program samples to train ABSLearn. Then, the trained model is applied to infer the information on three practical benchmarks: Gzip-1.2.4, Make-3.80, and Tar-1.15.1. The results show that ABSLearn achieves comparable performance and excellent runtime performance. As the first attempt at applying GNN to infer aliasing and buffer-size information, ABSLearn can potentially benefit future program analysis frameworks.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data used to support the experiments and findings of this study will be made available upon publication of the article.
Notes
The C library function void *realloc(void *ptr, size_t size) attempts to resize the memory block pointed to by ptr that was previously allocated with a call to malloc or calloc.
References
Balakrishnan G, Reps T (2004) Analyzing Memory Accesses in x86 Executables. In: Duesterwald E (ed) Compiler construction. CC 2004. Lecture notes in computer science, vol 2985. Springer, Heidelberg. https://doi.org/10.1007/978-3-540-24723-4_2
Evans I, Long F, Otgonbaatar U, Shrobe H, Rinard M, Okhravi H, Sidiroglou-Douskos S (2015) Control jujutsu: on the weaknesses of fine-grained control flow integrity. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security. Association for Computing Machinery, New York, p 901–913. https://doi.org/10.1145/2810103.2813646
Zeng D, Tan G (2018) From debugging-information based binary-level type inference to CFG generation. In: Proceedings of the eighth ACM conference on data and application security and privacy. Association for Computing Machinery, New York, p 366–376.https://doi.org/10.1145/3176258.3176309
Lu K, Hu H (2019) Where does it go? Refining indirect-call targets with multi-layer type analysis. In: Proceedings of the 2019 ACM SIGSAC conference on computer and communications security. Association for Computing Machinery, p 1867–1881. https://doi.org/10.1145/3319535.3354244
Kim SH, Sun C, Zeng D, Tan G (2021) Refining indirect call targets at the binary level. Netw Distrib Syst Secur Symp. https://doi.org/10.14722/ndss.2021.24386
Abadi M, Budiu M, Erlingsson Ú, Ligatti J (2009) Control-flow integrity principles, implementations, and applications. ACM Trans Inf Syst Secur 13(1):40. https://doi.org/10.1145/1609956.1609960
Zhang C, Wei T, Chen Z, Duan L, Szekeres L, McCamant S, Song D, Zou W (2013) Practical control flow integrity and randomization for binary executables. In: IEEE symposium on security and privacy (S&P). pp 559–573. https://doi.org/10.1109/SP.2013.44
Zhang M, Sekar R (2013) Control flow integrity for COTS binaries. In: 22nd Usenix security symposium. pp 337–352. https://doi.org/10.5555/2534766.2534796
Tice C, Roeder T, Collingbourne P, Checkoway S, Erlingsson Ú, Lozano L, Pike G (2014) Enforcing forward-edge control-flow integrity in GCC & LLVM. In: Proceedings of the 23rd USENIX conference on Security Symposium (SEC'14). USENIX Association, USA, pp 941–955. https://doi.org/10.5555/2671225.2671285
Niu B, Tan G (2014) Modular control-flow integrity. In: SIGPLAN Not. 49, 6 (June 2014), pp 577–587. https://doi.org/10.1145/2666356.2594295
Niu B, Tan G (2015) Per-Input control-flow integrity. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS '15). 914–926. https://doi.org/10.1145/2810103.2813644
van der Veen V, Andriesse D, Göktaş E, Gras B, Sambuc L, Slowinska A, Bos H, Giuffrida C (2015) Practical context-sensitive CFI. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security (CCS '15). Association for Computing Machinery, New York, pp 927–940. https://doi.org/10.1145/2810103.2813673
Ge X, Talele N, Payer M, Jaeger T (2016) Fine-grained control-flow integrity for kernel software. In: IEEE European symposium on security and privacy (EuroS&P), pp 179–194. https://doi.org/10.1109/EuroSP.2016.24
van der Veen V, G ̈oktas E, Contag M, Pawoloski A, Chen X, Rawat S, Bos H, Holz T, Athanasopoulos E, Giuffrida C (2016) A tough call: Mitigating advanced code-reuse attacks at the binary level,” in IEEE Symposium on Security and Privacy (S&P). pp. 934–953. https://doi.org/10.1109/SP.2016.60
Ding R, Qian C, Song C, Harris B, Kim T, Lee W (2017) Efficient protection of path-sensitive control security. In: 26th Usenix security symposium, pp 131–148. https://doi.org/10.5555/3241189.3241201
Khandaker M, Naser A, Liu W, Wang Z, Zhou Y, Cheng Y (2019) Adaptive call-site sensitive control flow integrity. In: IEEE european symposium on security and privacy (EuroS&P), pp 95–110. https://doi.org/10.1109/EuroSP.2019.00017
Khandaker MR, Liu W, Naser A, Wang Z, Yang J (2019) Origin-sensitive control flow integrity. In: 28th Usenix security symposium, pp 195–211. https://doi.org/10.5555/33.3361338.3361353
Nagarakatte S, Zhao J, Martin MMK, Zdancewic S (2009) Soft-bound: highly compatible and complete spatial memory safety for C. In: ACM conference on programming language design and implementation (PLDI), pp 245–258. https://doi.org/10.1145/1543135.1542504
Xu S, Huang W, Lie D (2021) In-fat pointer: hardware-assisted tagged-pointer spatial memory safety defense with subobject granularity protection. In: Proceedings of the 26th ACM international conference on architectural support for programming languages and operating systems, 224–240. https://doi.org/10.1145/3445814.3446761
Serebryany K, Bruening D, Potapenko A, Vyukov D (2012) Addresssanitizer: A fast address sanity checker. In: USENIX ATC. https://doi.org/10.5555/2342821.2342849
Zdancewic S, Zheng L, Nystrom N, Myers AC (2002) Secure program partitioning. ACM Trans Comput Syst. https://doi.org/10.1145/566340.566343
Zheng L, Chong S, Myers A, Zdancewic S (2003) Using replication and partitioning to build secure distributed systems. In: IEEE symposium on security and privacy (S&P), pp 236–250. https://doi.org/10.1109/SECPRI.2003.1199340
Provos N, Friedl M, Honeyman P (2003) Preventing privilege escalation. In: 12th Usenix security symposium, pp 231–242. https://doi.org/10.5555/1251353.1251369
Kilpatrick D (2003) Privman: a library for partitioning applications. In: USENIX annual technical conference, pp 273–284
Brumley D, Song D (2004) Privtrans: Automatically partitioning programs for privilege separation. In: 13th Usenix Security Symposium, pp 57–72. https://doi.org/10.5555/1251375.1251380
Chong S, Liu J, Myers A, Qi X, Vikram K, Zheng L, Zheng X (2007) Secure web applications via automatic partitioning. In: ACM SIGOPS symposium on operating systems principles (SOSP). pp 31–44. https://doi.org/10.1145/1323293.1294265
Bittau A, Marchenko P, Handley M, Karp B (2008) Wedge: splitting applications into reduced-privilege compartments. In: Proceedings of the 5th USENIX symposium on networked systems design and implementation. pp 309–322. https://doi.org/10.5555/1387589.1387611
Krishnamurthy A, Mettler A, Wagner D (2010) Fine-grained privilege separation for web applications. In: Proceedings of the 19th international conference on world wide web, pp 551–560. https://doi.org/10.1145/1772690.1772747
Niu B, Tan G (2012) Enforcing user-space privilege separation with declarative architectures. In: Proceedings of the sixth ACM workshop on scalable trusted computing (STC), pp 9–20. https://doi.org/10.1145/2382536.2382541
Dong X, Hu H, Saxena P, Liang Z (2013) A quantitative evaluation of privilege separation in web browser designs. In: 18th european symposium on research in computer security (ESORICS), pp 75–93. https://doi.org/10.1145/3319535.3354218
Wu Y, Sun J, Liu Y, Dong JS (2013) Automatically partition software into least privilege components using dynamic data dependency analysis. In: International conference on automated software engineering (ASE). pp 323–333. https://doi.org/10.1109/ASE.2013.6693091.
Liu Y, Zhou T, Chen K, Chen H, Xia Y (2015) Thwarting memory disclosure with efficient hypervisor-enforced intra-domain isolation. In: 22nd ACM conference on computer and communications security (CCS). pp 1607–1619. https://doi.org/10.1145/2810103.2813690
Rubinov K, Rosculete L, Mitra T, Roychoudhury A (2016) Automated partitioning of Android applications for trusted execution environments. In: International conference on software engineering (ICSE). pp 923–934. https://doi.org/10.1145/2884781.2884817
Jacobsen C, Khole M, Spall S, Bauer S, Burtsev A (2016) Lightweight capability domains: Towards decomposing the linux kernel. SIGOPS Oper Syst Rev. https://doi.org/10.1145/2883591.2883601
Mambretti A, Onarlioglu K, Mulliner C, Robertson W, Kirda E, Maggi F, Zanero S (2016) Trellis: privilege separation for multi-user applications made easy. In: International symposium on research in attacks, intrusions and Defenses (RAID), pp. 437–456. https://doi.org/10.1007/978-3-319-45719-2_20
Lind J, Priebe C, Muthukumaran D, O’Keeffe D, Aublin P, Kelbert F, Reiher T, Goltzsche D, Eyers DM, Kapitza R, Fetzer C, Pietzuch PR (2017) Glamdring: automatic application partitioning for intel SGX. In: USENIX annual technical conference (ATC). pp. 285–298. https://dl.acm.org/doi/https://doi.org/10.5555/3154690.3154718
Liu S, Tan G, Jaeger T (2017) PtrSplit: supporting general pointers in automatic program partitioning. In 24th ACM conference on computer and communications security (CCS), pp 2359–2371. https://doi.org/10.1145/3133956.3134066
Liu S, Zeng D, Huang Y, Capobianco F, McCamant S, Jaeger T, Tan G (2019) Program-mandering: Quantitative privilege separation. In: 26th ACM conference on computer and communications security (CCS)., pp 1023–1040. https://doi.org/10.1145/3319535.3354218
Hind M (2001) Pointer analysis: haven’t we solved this problem yet? In: ACM SIGPLAN/SIGSOFT workshop on program analysis for software tools and engineering.https://doi.org/10.1145/379605.379665
Zhang Q, Xiao X, Zhang C, Yuan H, Su Z (2014) Efficient subcubic alias analysis for c. In: ACM SIGPLAN notices. pp 829–845. https://doi.org/10.1145/2714064.2660213
Kroes T, Koning K, van der Kouwe E, Bos H, Giuffrida C (2018) Delta pointers: Buffer overflow checks without the checks. In: Proceedings of the thirteenth eurosys conference, ser. EuroSys 18. https://doi.org/10.1145/3190508.3190553
Ben-Nun T, Jakobovits AS, Hoefler T (2018) Neural code comprehension: A learnable representation of code semantics. In: NeurIPS. https://doi.org/10.5555/3327144.3327276
Liang K, Wu S, Gu J (2021) MKA: A scalable medical knowledge assisted mechanism for generative models on medical conversation tasks. In: Computational and mathematical methods in medicine. pp. 148–168. https://doi.org/10.1155/2021/5294627.
Teru KK, Denis E, Hamilton WL (2020) Inductive relation prediction by subgraph reasoning. ICML. https://doi.org/10.5555/3524938.3525814
Kumar A, Singh SS, Singh K, Biswas B (2020) Link prediction techniques, applications, and performance: a survey. Phys A Stat Mech Appl 553:124289. https://doi.org/10.1016/j.physa.2020.124289
Zhou T, Liu L, Zhang YC (2009) Predicting missing links via local information. Eur Phys J 71:623–630. https://doi.org/10.1140/EPJB/E2009-00335-8
Adamic L, Adar E (2001) Friends and neighbors on the web. Social Netw 25:211–230. https://doi.org/10.1016/S0378-8733(03)00009-1
Jaccard P (1901) Distribution de la flore alpine dans le Bassin des Dranses et dans quelques regions voisines. Bulletin de la Société Vaudoise des Sciences Naturelles 37:241–272
Barabasi A, Jeong H, Neda Z, Ravasz E, Schubert A, Vicsek T (2002) Evolution of the social network of scientific collaborations. Phys A Stat Mech Appl 311:590–614. https://doi.org/10.1016/S0378-4371(02)00736-7
Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43. https://doi.org/10.1007/BF02289026
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inform Sci Technol 58(7):1019–1031. https://doi.org/10.1002/asi.20591
Tong H, Faloutsos C, Pan JY (2006) Fast random walk with restart and its applications. In: Sixth international conference on data mining, pp 613–622. IEEE. Doi: https://doi.org/10.1109/ICDM.2006.70
Clauset A, Moore C, Newman M (2008) Hierarchical structure and the prediction of missing links in networks. Nature 453(7191):98–101. https://doi.org/10.1038/nature06830
Guimerà R, Sales-Pardo M (2009) Missing and spurious interactions and the reconstruction of complex networks. Proc Natl Acad Sci 106(52):22073–22078. https://doi.org/10.1073/pnas.0908366106
Stanley N, Bonacci T, Kwitt R, Niethammer M, Mucha PJ (2019) Stochastic block models with multiple continuous attributes. Appl Netw Sci 4:1–22. https://doi.org/10.1007/s41109-019-0170-z
Vallès-Català T, Peixoto TP, Sales-Pardo M, Guimerà R (2018) Consistencies and inconsistencies between model selection and link prediction in networks. Phys Review E 97(6–1):062316. https://doi.org/10.1103/PhysRevE.97.062316
Kuo TT, Yan R, Huang YY, Kung PH, Lin SD (2013) Unsupervised link prediction using aggregative statistics on heterogeneous social networks. In Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, ser. KDD ’13, Association for Computing Machinery, New York, pp 775–783. https://doi.org/10.1145/2487575.2487614
Yang J, McAuley J, Leskovec J (2013) Community detection in networks with node attributes. In: IEEE 13th international conference on data mining, pp 1151–1156. doi: https://doi.org/10.1109/ICDM.2013.167
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ser. KDD ’14. ACM, New York, pp 701–710. https://doi.org/10.1145/2623330.2623732
Grover A, Leskovec J (2016) Node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining , ser. KDD ’16. Association for Computing Machinery, New York, p 855–864. https://doi.org/10.1145/2939672.2939754
Ribeiro L F, Saverese P H, Figueiredo D R (2017) Struc2vec: Learning node representations from structural identity. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining , ser. KDD ’17. ACM, New York, pp 385–394. https://doi.org/10.1145/3097983.3098061
Perozzi B, Kulkarni V, Chen H, Skiena S (2017) Don't Walk, Skip! Online learning of multi-scale network embeddings. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, pp 258–265. https://doi.org/10.1145/3110025.3110086
Schlichtkrull M, Kipf T N, Bloem P, Berg R V D, Titov I, Welling M (2018) Modeling relational data with graph convolutional networks. In: European semantic web conference, Springer, Cham, pp 593–607
Hind M (2001) Pointer analysis: haven't we solved this problem yet?. In: Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on program analysis for software tools and engineering, pp 54–61. https://doi.org/10.1145/379605.379665
Smaragdakis Y, Balatsouras G (2015) Pointer analysis. Found Trends Program Lang 2(1):1–69. https://doi.org/10.1561/2500000014
Sui Y, Xue J (2016) SVF: interprocedural static value-flow analysis in LLVM. In: Proceedings of the 25th international conference on compiler construction, pp 265–266. https://doi.org/10.1145/2892208.2892235
Gurfinkel A, Navas J A (2017) A context-sensitive memory model for verification of C/C++ programs. In: International static analysis symposium. Springer, pp 148–168. https://doi.org/10.1007/978-3-319-66706-5_8
Xu W, DuVarney D C, Sekar R (2004) An efficient and backwards-compatible transformation to ensure memory safety of C programs. In: Proceedings of the 12th ACM SIGSOFT twelfth international symposium on foundations of software engineering, pp 117–126. https://doi.org/10.1145/1029894.1029913
Blanchet B, Cousot P, Cousot R, Feret J, Mauborgne L, Miné A, Monniaux D, Rival X (2003) A static analyzer for large safety-critical software. In: Proceedings of the ACM SIGPLAN 2003 conference on programming language design and implementation, pp 196–207. https://doi.org/10.1145/781131.781153
Dor N, Rodeh M, Sagiv M (2003) CSSV: Towards a realistic tool for statically detecting all buffer overflows in C. In: Proceedings of the ACM SIGPLAN 2003 conference on programming language design and implementation, pp 155–167. https://doi.org/10.1145/781131.781149
Nethercote N, Fitzhardinge J (2004) Bounds-checking entire programs without recompiling. SPACE
Narayanan V, Huang Y, Tan G, Jaeger T, Burtsev A (2020) Lightweight kernel isolation with virtualization and VM functions. In: Proceedings of the 16th ACM SIGPLAN/SIGOPS international conference on virtual execution environments, pp 157–171. https://doi.org/10.1145/3381052.3381328
Ferrante J, Ottenstein KJ, Warren JD (1987) The program dependence graph and its use in optimization. ACM Trans Program Lang Syst 9:319–349. https://doi.org/10.1145/24039.24041
Gilmer J, Schoenholz S S, Riley P F, Vinyals O, Dahl G E (2017) Neural message passing for quantum chemistry. In: International conference on machine learning, ser. Proceedings of Machine Learning Research, D. Precup and Y. W. I, Eds., vol 70. PMLR, 06–11 Aug 2017, pp 1263–1272. https://doi.org/10.5555/3305381.3305512
Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks? In: International conference on learning representations
Schlichtkrull M, Kipf T N, Bloem P, Berg R V D, Titov I, Welling M (2018) Modeling relational data with graph convolutional networks. In: European semantic web conference, pp 593–607. Springer, Cham. https://doi.org/10.1007/978-3-319-93417-4_38
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2018) Graph attention networks.In: International conference on learning representations (ICLR). https://arxiv.org/abs/1710.10903
Andersen LO (1994) Program analysis and specialization for the C programming language. University of Copenhagen, DIKU
Liang K, Wu S, Gu J (2021) MKA: A scalable medical knowledge-assisted mechanism for generative models on medical conversation tasks. Comput Math Methods Med. https://doi.org/10.1155/2021/5294627
Chen J, He H, Wu F, Wang J (2021) Topology-aware correlations between relations for inductive link prediction in knowledge graphs. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol 35, No 7, pp 6271–6278
Mai S, Zheng S, Yang Y, Hu H (2021) Communicative message passing for inductive relation reasoning. In AAAI, pp 4294–4302
Liang K, Meng L, Liu M, Liu Y, Tu W, Wang S, Zhou S, Liu X, Sun F (2022) Reasoning over different types of knowledge graphs: static, temporal and multi-modal. https://doi.org/10.48550/arXiv.2212.05767
Funding
The authors did not have any specific funding related to this work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liang, K., Tan, J., Zeng, D. et al. ABSLearn: a GNN-based framework for aliasing and buffer-size information retrieval. Pattern Anal Applic 26, 1171–1189 (2023). https://doi.org/10.1007/s10044-023-01142-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-023-01142-2