[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3545948.3545969acmotherconferencesArticle/Chapter ViewAbstractPublication PagesraidConference Proceedingsconference-collections
research-article
Open access

Script Tainting Was Doomed From The Start (By Type Conversion): Converting Script Engines into Dynamic Taint Analysis Frameworks

Published: 26 October 2022 Publication History

Abstract

Data flow analysis is an essential technique for understanding the complicated behavior of malicious scripts. For tracking the data flow in scripts, dynamic taint analysis has been widely adopted by existing studies. However, the existing taint analysis techniques have a problem that each script engine needs to be separately designed and implemented. Given the diversity of script languages that attackers can choose for their malicious scripts, it is unrealistic to prepare taint analysis tools for the various script languages and engines.
In this paper, we propose an approach that automatically builds a taint analysis framework for scripts on top of the framework designed for native binaries. We first conducted experiments to reveal that the semantic gaps in data types between binaries and scripts disturb our approach by causing under-tainting. To address this problem, our approach detects such gaps and bridges them by generating force propagation rules, which can eliminate the under-tainting. We implemented a prototype system with our approach called STAGER T. We built taint analysis frameworks for Python and VBScript with STAGER T and found that they could effectively analyze the data flow of real-world malicious scripts.

References

[1]
Pieter Agten, Steven Van Acker, Yoran Brondsema, Phu H Phung, Lieven Desmet, and Frank Piessens. 2012. JSand: complete client-side sandboxing of third-party JavaScript without browser modifications. In Proceedings of the 28th Annual Computer Security Applications Conference(ACSAC ’12). ACM, 1–10.
[2]
Fatemeh Asadi, Massimiliano Di Penta, Giuliano Antoniol, and Yann-Gaël Guéhéneuc. 2010. A Heuristic-Based Approach to Identify Concepts in Execution Traces. In Proceedings of the 14th European Conference on Software Maintenance and Reengineering(CSMR ’10). IEEE, 31–40.
[3]
David Brumley, Cody Hartwig, Zhenkai Liang, James Newsome, Dawn Song, and Heng Yin. 2008. Automatically Identifying Trigger-based Behavior in Malware. In Botnet Detection. Vol. 36. Springer, 65–88.
[4]
David Brumley, Ivan Jager, Thanassis Avgerinos, and Edward J Schwartz. 2011. BAP: A binary analysis platform. In Proceedings of the 23rd International Conference on Computer Aided Verification(CAV ’11). Springer, 463–469.
[5]
Stefan Bucur, Johannes Kinder, and George Candea. 2014. Prototyping symbolic execution engines for interpreted languages. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS ’14). ACM, 239–254.
[6]
Curtis Carmony, Xunchao Hu, Heng Yin, Abhishek Vasisht Bhaskar, and Mu Zhang. 2016. Extract Me If You Can: Abusing PDF Parsers in Malware Detectors. In Proceedings of the 23rd Annual Network and Distributed System Security Symposium(NDSS ’16). Internet Society, 1–15.
[7]
Microsoft Threat Intelligence Center. 2020. Ghost in the shell: Investigating web shell attacks. https://www.microsoft.com/security/blog/2020/02/04/ghost-in-the-shell-investigating-web-shell-attacks/. (accessed: 2022-06-19).
[8]
Peng Chen and Hao Chen. 2018. Angora: Efficient fuzzing by principled search. In Proceedings of the 2018 IEEE Symposium on Security and Privacy(SP ’18). IEEE, 711–725.
[9]
Quan Chen and Alexandros Kapravelos. 2018. Mystique: Uncovering Information Leakage from Browser Extensions. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security(CCS ’18). 1687–1700.
[10]
Zheng Leong Chua, Yanhao Wang, Teodora Baluta, Prateek Saxena, Zhenkai Liang, and Purui Su. 2019. One Engine To Serve’em All: Inferring Taint Rules Without Architectural Semantics. In Proceedings of the 26th Annual Network and Distributed System Security Symposium(NDSS ’19). Internet Society.
[11]
James Clause, Wanchun Li, and Alessandro Orso. 2007. Dytan: a generic dynamic taint analysis framework. In Proceedings of the 2007 International Symposium on Software Testing and Analysis(ISSTA ’07). 196–206.
[12]
Kevin Coogan, Gen Lu, and Saumya Debray. 2011. Deobfuscation of Virtualization-Obfuscated Software: A Semantics-Based Approach. In Proceedings of the 18th ACM Conference on Computer and Communications Security(CCS ’11). ACM, 275–284.
[13]
Anthony Cozzie, Frank Stratton, Hui Xue, and Samuel T King. 2008. Digging for Data Structures. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation(OSDI ’08). 255–266.
[14]
Cybersecurity and Infrastructure Security Agency. 2015. Compromised Web Servers and Web Shells - Threat Awareness and Guidance. https://www.cisa.gov/uscert/ncas/alerts/TA15-314A. (accessed: 2022-06-19).
[15]
Ali Davanian, Zhenxiao Qi, Yu Qu, and Heng Yin. 2019. DECAF++: Elastic whole-system dynamic taint analysis. In Proceedings of the 22nd International Symposium on Research in Attacks, Intrusions and Defenses(RAID ’19). 31–45.
[16]
Giancarlo De Maio, Alexandros Kapravelos, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. 2014. PExy: The Other Side of Exploit Kits. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment(DIMVA ’14). Springer, 132–151.
[17]
Andreas Dewald, Thorsten Holz, and Felix C. Freiling. 2010. ADSandbox: Sandboxing JavaScript to Fight Malicious Websites. In Proceedings of the 2010 ACM Symposium on Applied Computing(SAC ’10). Association for Computing Machinery, New York, NY, USA, 1859–1864.
[18]
Brendan Dolan-Gavitt, Josh Hodosh, Patrick Hulin, Tim Leek, and Ryan Whelan. 2015. Repeatable reverse engineering with PANDA. In Proceedings of the 5th Program Protection and Reverse Engineering Workshop(PPREW ’15). 1–11.
[19]
Thomas Eisenbarth, Rainer Koschke, and Daniel Simon. 2003. Locating Features in Source Code. IEEE Transactions on Software Engineering 29, 3 (2003), 210–224.
[20]
Birhanu Eshete, Abeer Alhuzali, Maliheh Monshizadeh, Phillip A Porras, Venkat N Venkatakrishnan, and Vinod Yegneswaran. 2015. EKHunter: A Counter-Offensive Toolkit for Exploit Kit Infiltration. In Proceedings of the 22nd Annual Network and Distributed System Security Symposium(NDSS ’15).
[21]
Python Software Foundation. 2022. CPython. https://github.com/python/cpython. (accessed: 2021-02-24).
[22]
Xie Haijiang, Zhang Yuanyuan, Li Juanru, and Gu Dawu. 2017. Nightingale: Translating Embedded VM Code in x86 Binary Executables. In Proceedings of the 20th International Conference on Information Security(ISC ’17). Springer, 387–404.
[23]
Jingxuan He, Pesho Ivanov, Petar Tsankov, Veselin Raychev, and Martin Vechev. 2018. Debin: Predicting Debug Information in Stripped Binaries. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security(CCS ’18). ACM, 1667–1680.
[24]
Daniel Hedin, Arnar Birgisson, Luciano Bello, and Andrei Sabelfeld. 2014. JSFlow: Tracking Information Flow in JavaScript and its APIs. In Proceedings of the 29th Annual ACM Symposium on Applied Computing(SAC ’14). 1663–1671.
[25]
Andrew Henderson, Aravind Prakash, Lok Kwong Yan, Xunchao Hu, Xujiewen Wang, Rundong Zhou, and Heng Yin. 2014. Make it work, make it right, make it fast: building a platform-neutral whole-system dynamic binary analysis platform. In Proceedings of the 2014 International Symposium on Software Testing and Analysis(ISSTA ’14). 248–258.
[26]
Hex-Rays. 2022. IDA Pro. https://www.hex-rays.com/products/ida/. (accessed: 2021-02-24).
[27]
Hex-Rays. 2022. IDAPython. https://github.com/idapython/src. (accessed: 2021-02-24).
[28]
Anatoli Kalysch, Johannes Götzfried, and Tilo Müller. 2017. VMAttack: Deobfuscating Virtualization-Based Packed Binaries. In Proceedings of the 12th International Conference on Availability, Reliability and Security(ARES ’17). 1–10.
[29]
Min Gyung Kang, Stephen McCamant, Pongsin Poosankam, and Dawn Song. 2011. DTA++: Dynamic Taint Analysis with Targeted Control-Flow Propagation. In Proceedings of the 26th Annual Network and Distributed System Security Symposium(NDSS ’11). Internet Society.
[30]
Min Gyung Kang, Pongsin Poosankam, and Heng Yin. 2007. Renovo: A Hidden Code Extractor for Packed Executables. In Proceedings of the 2007 ACM Workshop on Recurring Malcode(WORM ’07). 46–53.
[31]
Rezwana Karim, Frank Tip, Alena Sochurkova, and Koushik Sen. 2018. Platform-Independent Dynamic Taint Analysis for JavaScript. IEEE Transactions on Software Engineering(2018).
[32]
Yuhei Kawakoya, Eitaro Shioji, Makoto Iwamura, and Jun Miyoshi. 2019. API Chaser: Taint-Assisted Sandbox for Evasive Malware Analysis. Journal of Information Processing 27 (2019), 297–314.
[33]
Vasileios P Kemerlis, Georgios Portokalidis, Kangkook Jee, and Angelos D Keromytis. 2012. libdft: Practical dynamic data flow tracking for commodity systems. In Proceedings of the 8th ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environments(VEE ’12). 121–132.
[34]
Johannes Kinder. 2012. Towards static analysis of virtualization-obfuscated binaries. In 2012 19th Working Conference on Reverse Engineering(WCRE ’12). IEEE, 61–70.
[35]
Dhilung Kirat and Giovanni Vigna. 2015. MalGene: Automatic Extraction of Malware Analysis Evasion Signature. In Proceedings of the 2015 ACM SIGSAC Conference on Computer and Communications Security(CCS ’15). 769–780.
[36]
Clemens Kolbitsch, Thorsten Holz, Christopher Kruegel, and Engin Kirda. 2010. Inspector Gadget: Automated Extraction of Proprietary Gadgets from Malware Binaries. In Proceedings of the 2010 IEEE Symposium on Security and Privacy(SP ’10). IEEE, 29–44.
[37]
David Korczynski and Heng Yin. 2017. Capturing Malware Propagations with Code Injections and Code-Reuse Attacks. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security(CCS ’17). ACM, 1691–1708.
[38]
Rainer Koschke and Jochen Quante. 2005. On Dynamic Feature Location. In Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering(ASE ’05). 86–95.
[39]
McAfee Labs. 2021. McAfee Labs Threats Report, June 2021. https://www.mcafee.com/enterprise/en-us/assets/reports/rp-threats-jun-2021.pdf. (accessed: 2022-06-19).
[40]
JongHyup Lee, Thanassis Avgerinos, and David Brumley. 2011. TIE: Principled Reverse Engineering of Types in Binary Programs. In Proceedings of the 18th Annual Network and Distributed System Security Symposium(NDSS ’11). Internet Society, 1–18.
[41]
Sebastian Lekies, Ben Stock, and Martin Johns. 2013. 25 Million Flows Later: Large-Scale Detection of DOM-Based XSS. In Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security(CCS ’13). Association for Computing Machinery, 1193–1204.
[42]
V Benjamin Livshits and Weidong Cui. 2008. Spectator: Detection and Containment of JavaScript Worms. In Proceedings of the 2008 USENIX Annual Technical Conference(ATC ’08). USENIX Association, 335–348.
[43]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation(PLDI ’05). ACM, 190–200.
[44]
Alwin Maier, Hugo Gascon, Christian Wressnegger, and Konrad Rieck. 2019. TypeMiner: Recovering Types in Binary Programs Using Machine Learning. In Proceedings of the 16th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment(DIMVA ’19). Springer, 288–308.
[45]
Mattia Monga, Roberto Paleari, and Emanuele Passerini. 2009. A Hybrid Analysis Framework for Detecting Web Application Vulnerabilities. In Proceedings of the 2009 ICSE Workshop on Software Engineering for Secure Systems(IWSESS ’09). IEEE, 25–32.
[46]
Abbas Naderi-Afooshteh, Yonghwi Kwon, Anh Nguyen-Tuong, Mandana Bagheri-Marzijarani, and Jack W Davidson. 2019. Cubismo: Decloaking Server-side Malware via Cubist Program Analysis. In Proceedings of the 35th Annual Computer Security Applications Conference(ACSAC ’19). ACM, 430–443.
[47]
Abbas Naderi-Afooshteh, Yonghwi Kwon, Anh Nguyen-Tuong, Ali Razmjoo-Qalaei, Mohammad-Reza Zamiri-Gourabi, and Jack W Davidson. 2019. MalMax: Multi-Aspect Execution for Automated Dynamic Web Server Malware Analysis. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security(CCS ’19). ACM, 1849–1866.
[48]
James Newsome and Dawn Song. 2005. Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software. In Proceedings of the 12th Annual Network and Distributed Systems Security Symposium(NDSS ’05). Internet Society, 1–17.
[49]
Tadeusz Pietraszek and Chris Vanden Berghe. 2005. Defending Against Injection Attacks Through Context-Sensitive String Evaluation. In Proceedings of the 8th International Workshop on Recent Advances in Intrusion Detection(RAID ’05). Springer, 124–145.
[50]
Georgios Portokalidis, Asia Slowinska, and Herbert Bos. 2006. Argos: an emulator for fingerprinting zero-day attacks for advertised honeypots with automatic signature generation. In Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems(EuroSys ’06). ACM, 15–27.
[51]
ReactOS Project. 2022. ReactOS. https://www.reactos.org/. (accessed: 2021-05-03).
[52]
Rolf Rolles. 2009. Unpacking Virtualization Obfuscators. In Proceedings of the 3rd USENIX Workshop on Offensive Technologies(WOOT ’09). USENIX.
[53]
Florent Saudel and Jonathan Salwan. 2015. Triton: Concolic Execution Framework. http://shell-storm.org/talks/SSTIC2015_English_slide_detailed_version_Triton_Concolic_Execution_FrameWork_FSaudel_JSalwan.pdf. (accessed: 2021-08-09).
[54]
Monirul Sharif, Andrea Lanzi, Jonathon Giffin, and Wenke Lee. 2009. Automatic Reverse Engineering of Malware Emulators. In 2009 30th IEEE Symposium on Security and Privacy(SP ’09). IEEE, 94–109.
[55]
Dongdong She, Yizheng Chen, Abhishek Shah, Baishakhi Ray, and Suman Jana. 2020. Neutaint: Efficient dynamic taint analysis with neural networks. In Proceedings of the 41st IEEE Symposium on Security and Privacy(SP ’20). IEEE, 1527–1543.
[56]
Asia Slowinska and Herbert Bos. 2009. Pointless Tainting? Evaluating the Practicality of Pointer Tainting. In Proceedings of the 4th ACM European Conference on Computer Systems(EuroSys ’09). 61–74.
[57]
VMProtect Software. 2022. VMProtect. https://vmpsoft.com/. (accessed: 2020-04-27).
[58]
Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan Jager, Min Gyung Kang, Zhenkai Liang, James Newsome, Pongsin Poosankam, and Prateek Saxena. 2008. BitBlaze: A New Approach to Computer Security via Binary Analysis. In Proceedings of the 4th International Conference on Information Systems Security(ICISS ’08). Springer, 1–25.
[59]
The PyPy Team. 2022. PyPy. https://www.pypy.org/. (accessed: 2021-05-06).
[60]
Denis Ugarte, Davide Maiorca, Fabrizio Cara, and Giorgio Giacinto. 2019. PowerDrive: accurate de-obfuscation and analysis of PowerShell malware. In Proceedings of the 16th Conference on Detection of Intrusions and Malware, and Vulnerability Assessment(DIMVA ’19). Springer, 240–259.
[61]
Toshinori Usui, Yuto Otsuki, Tomonori Ikuse, Yuhei Kawakoya, Makoto Iwamura, Jun Miyoshi, and Kanta Matsuura. 2021. Automatic Reverse Engineering of Script Engine Binaries for Building Script API Tracers. Digital Threats: Research and Practice 2, 1, Article 5 (Jan. 2021), 31 pages.
[62]
Toshinori Usui, Yuto Otsuki, Yuhei Kawakoya, Makoto Iwamura, Jun Miyoshi, and Kanta Matsuura. 2019. My Script Engines Know What You Did in the Dark: Converting Engines into Script API Tracers. In Proceedings of the 35th Annual Computer Security Applications Conference(ACSAC ’19). Association for Computing Machinery, 466–477.
[63]
Timon Van Overveldt, Christopher Kruegel, and Giovanni Vigna. 2012. FlashDetect: ActionScript 3 Malware Detection. In Proceedings of the 15th International Symposium on Research in Attacks, Intrusions and Defenses(RAID ’12). Springer, 274–293.
[64]
Philipp Vogt, Florian Nentwich, Nenad Jovanovic, Engin Kirda, Christopher Kruegel, and Giovanni Vigna. 2007. Cross Site Scripting Prevention with Dynamic Data Tainting and Static Analysis. In Proceedings of the 22nd Annual Network and Distributed System Security Symposium(NDSS ’07). Internet Society.
[65]
Zhi Wang, Xuxian Jiang, Weidong Cui, Xinyuan Wang, and Mike Grace. 2009. ReFormat: Automatic Reverse Engineering of Encrypted Messages. In Proceedings of the 14th European Symposium on Research in Computer Security(ESORICS ’09). Springer, 200–215.
[66]
Shiyi Wei and Barbara G Ryder. 2013. Practical Blended Taint Analysis for JavaScript. In Proceedings of the 2013 International Symposium on Software Testing and Analysis(ISSTA ’13). 336–346.
[67]
Norman Wilde and Michael C Scully. 1995. Software Reconnaissance: Mapping Program Features to Code. Journal of Software Maintenance: Research and Practice 7, 1 (1995), 49–62.
[68]
Gilbert Wondracek, Paolo Milani Comparetti, Christopher Kruegel, Engin Kirda, and Scuola Superiore S Anna. 2008. Automatic Network Protocol Analysis. In Proceedings of the 15th Annual Network and Distributed System Security Symposium(NDSS ’08, Vol. 8). Internet Society, 1–14.
[69]
W Eric Wong, Swapna S Gokhale, Joseph R Horgan, and Kishor S Trivedi. 1999. Locating Program Features using Execution Slices. In Proceedings of the 1999 IEEE Symposium on Application-Specific Systems and Software Engineering and Technology. (Cat. No. PR00122)(ASSET ’99). IEEE, 194–203.
[70]
Dongpeng Xu, Jiang Ming, Yu Fu, and Dinghao Wu. 2018. VMHunt: A Verifiable Approach to Partially-Virtualized Binary Code Simplification. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security(CCS ’18). ACM, 442–458.
[71]
Zhaoyan Xu, Antonio Nappa, Robert Baykov, Guangliang Yang, Juan Caballero, and Guofei Gu. 2014. AUTOPROBE: Towards automatic active malicious server probing using dynamic binary analysis. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security(CCS ’14). 179–190.
[72]
Heng Yin and Dawn Song. 2010. TEMU: Binary Code Analysis via Whole-System Layered Annotative Execution. Technical Report. EECS Department, University of California, Berkeley.
[73]
Heng Yin, Dawn Song, Manuel Egele, Christopher Kruegel, and Engin Kirda. 2007. Panorama: Capturing System-wide Information Flow for Malware Detection and Analysis. In Proceedings of the 2007 ACM SIGSAC Conference on Computer and Communications Security(CCS ’07). 116–127.
[74]
Akira Yokoyama, Kou Ishii, Rui Tanabe, Yinmin Papa, Katsunari Yoshioka, Tsutomu Matsumoto, Takahiro Kasama, Daisuke Inoue, Michael Brengel, Michael Backes, 2016. SandPrint: fingerprinting malware sandboxes to provide intelligence for sandbox evasion. In Proceedings of the 19th International Symposium on Research in Attacks, Intrusions, and Defenses(RAID ’16). Springer, 165–187.
[75]
Shitong Zhu, Xunchao Hu, Zhiyun Qian, Zubair Shafiq, and Heng Yin. 2018. Measuring and Disrupting Anti-Adblockers Using Differential Execution Analysis. In Proceedings of the 25th Annual Network and Distributed System Security Symposium(NDSS ’18). Internet Society.

Cited By

View all
  • (2024)Dynamic Possible Source Count Analysis for Data Leakage PreventionProceedings of the 21st ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes10.1145/3679007.3685065(98-111)Online publication date: 13-Sep-2024
  • (2023)Cryptocurrency Security Study based on Static Taint AnalysisHighlights in Science, Engineering and Technology10.54097/hset.v39i.668439(962-970)Online publication date: 1-Apr-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
RAID '22: Proceedings of the 25th International Symposium on Research in Attacks, Intrusions and Defenses
October 2022
536 pages
ISBN:9781450397049
DOI:10.1145/3545948
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2022

Check for updates

Author Tags

  1. dynamic analysis
  2. functionality enhancement
  3. malicious script
  4. taint analysis

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • JSPS KAKENHI

Conference

RAID 2022

Acceptance Rates

Overall Acceptance Rate 43 of 173 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)329
  • Downloads (Last 6 weeks)38
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Dynamic Possible Source Count Analysis for Data Leakage PreventionProceedings of the 21st ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes10.1145/3679007.3685065(98-111)Online publication date: 13-Sep-2024
  • (2023)Cryptocurrency Security Study based on Static Taint AnalysisHighlights in Science, Engineering and Technology10.54097/hset.v39i.668439(962-970)Online publication date: 1-Apr-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media