[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3427228.3427261acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacsacConference Proceedingsconference-collections
research-article

AVclass2: Massive Malware Tag Extraction from AV Labels

Published: 08 December 2020 Publication History

Abstract

Tags can be used by malware repositories and analysis services to enable searches for samples of interest across different dimensions. Automatically extracting tags from AV labels is an efficient approach to categorize and index massive amounts of samples. Recent tools like AVclass and Euphony have demonstrated that, despite their noisy nature, it is possible to extract family names from AV labels. However, beyond the family name, AV labels contain much valuable information such as malware classes, file properties, and behaviors.
This work presents AVclass2, an automatic malware tagging tool that given the AV labels for a potentially massive number of samples, extracts clean tags that categorize the samples. AVclass2 uses, and helps building, an open taxonomy that organizes concepts in AV labels, but is not constrained to a predefined set of tags. To keep itself updated as AV vendors introduce new tags, it provides an update module that automatically identifies new taxonomy entries, as well as tagging and expansion rules that capture relations between tags. We have evaluated AVclass2 on 42M samples and showed how it enables advanced malware searches and to maintain an updated knowledge base of malware concepts in AV labels.

References

[1]
Daniel Arp, Michael Spreitzenbarth, Malte Huebner, Hugo Gascon, and Konrad Rieck. 2014. Drebin: Efficient and Explainable Detection of Android Malware in Your Pocket. In Network and Distributed System Security.
[2]
Michael Bailey, Jon Oberheide, Jon Andersen, Zhuoqing Morley Mao, Farnam Jahanian, and Jose Nazario. 2007. Automated Classification and Analysis of Internet Malware. In International Symposium on Recent Advances in Intrusion Detection.
[3]
Ulrich Bayer, Paolo Milani Comparetti, Clemens Hlauschek, Christopher Kruegel, and Engin Kirda. 2009. Scalable, Behavior-Based Malware Clustering. In Network and Distributed System Security.
[4]
Desiree Beck and Julie Connolly. 2006. The Common Malware Enumeration Initiative. In Virus Bulletin Conference.
[5]
Julio Canto, Marc Dacier, Engin Kirda, and Corrado Leita. 2008. Large Scale Malware Collection: Lessons Learned. In IEEE SRDS Workshop on Sharing Field Data and Experiment Measurements on Resilience of Distributed Computing Systems.
[6]
CARO [n.d.]. CARO Virus Naming Convention. http://www.caro.org/articles/naming.html.
[7]
George E. Dahl, Jack W. Stokes, Li Deng, and Dong Yu. 2013. Large-Scale Malware Classification using Random Projections and Neural Networks. In IEEE International Conference on Acoustics, Speech and Signal Processing.
[8]
Felipe N Ducau, Ethan M Rudd, Tad M Heppner, Alex Long, and Konstantin Berlin. 2019. SMART: Semantic Malware Attribute Relevance Tagging. arXiv preprint arXiv:1905.06262(2019).
[9]
Euphony [n.d.]. Harmonious Unification of Cacophonous Anti-Virus Vendor Labels for Android Malware. https://github.com/fmind/euphony.
[10]
Ilir Gashi, Bertrand Sobesto, Stephen Mason, Vladimir Stankovic, and Michel Cukier. 2013. A Study of the Relationship between Antivirus Regressions and Label Changes. In International Symposium on Software Reliability Engineering.
[11]
Harry Halpin, Valentin Robu, and Hana Shepherd. 2007. The Complex Dynamics of Collaborative Tagging. In International Conference on World Wide Web.
[12]
Wenyi Huang and Jack W. Stokes. 2016. MtNet: A Multi-Task Neural Network for Dynamic Malware Classification. In Detection of Intrusions and Malware, and Vulnerability Assessment.
[13]
Médéric Hurier, Kevin Allix, Tegawendé Bissyandé, Jacques Klein, and Yves Le Traon. 2016. On the Lack of Consensus in Anti-Virus Decisions: Metrics and Insights on Building Ground Truths of Android Malware. In Detection of Intrusions and Malware, and Vulnerability Assessment.
[14]
Jiyong Jang, David Brumley, and Shobha Venkataraman. 2011. BitShred: Feature Hashing Malware for Scalable Triage and Semantic Analysis. In ACM Conference on Computer and Communications Security.
[15]
Jiyong Jang, Maverick Woo, and David Brumley. 2013. Towards Automatic Software Lineage Inference. In USENIX Security Symposium.
[16]
JoeSandbox [n.d.]. Joe Sandbox. https://www.joesandbox.com/.
[17]
Alex Kantchelian, Michael Carl Tschantz, Sadia Afroz, Brad Miller, Vaishaal Shankar, Rekha Bachwani, Anthony D Joseph, and JD Tygar. 2015. Better Malware Ground Truth: Techniques for Weighting Anti-Virus Vendor Labels. In ACM Workshop on Artificial Intelligence and Security.
[18]
Christian Körner, Dominik Benz, Andreas Hotho, Markus Strohmaier, and Gerd Stumme. 2010. Stop Thinking, Start Tagging: Tag Semantics Emerge from Collaborative Verbosity. In International Conference on World Wide Web.
[19]
Platon Kotzias, Srdjan Matic, Richard Rivera, and Juan Caballero. 2015. Certified PUP: Abuse in Authenticode Code Signing. In ACM Conference on Computer and Communication Security.
[20]
Chaz Lever, Platon Kotzias, Davide Balzarotti, Juan Caballero, and Manos Antonakakis. 2017. A Lustrum of Malware Network Communication: Evolution and Insights. In Proceedings of the 38th IEEE Symposium on Security and Privacy. San Jose, CA, USA.
[21]
Peng Li, Limin Liu, Debin Gao, and Michael K Reiter. 2010. On Challenges in Evaluating Malware Clustering. In International Symposium on Recent Advances in Intrusion Detection.
[22]
Martina Lindorfer, Alessandro Di Federico, Federico Maggi, Paolo Milani Comparetti, and Stefano Zanero. 2012. Lines of Malicious Code: Insights into the Malicious Software Industry. In Annual Computer Security Applications Conference.
[23]
Martina Lindorfer, Matthias Neugschwandtner, Lukas Weichselbaum, Yanick Fratantonio, Victor van der Veen, and Christian Platzer. 2014. ANDRUBIS-1,000,000 Apps Later: A View on Current Android Malware Behaviors. In International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security.
[24]
MAEC [n.d.]. Malware Attribute Enumeration and Characterization. http://maec.mitre.org/.
[25]
Federico Maggi, Andrea Bellini, Guido Salvaneschi, and Stefano Zanero. 2011. Finding Non-Trivial Malware Naming Inconsistencies. In International Conference on Information Systems Security.
[26]
Brad Miller, Alex Kantchelian, Michael Carl Tschantz, Sadia Afroz, Rekha Bachwani, Riyaz Faizullabhoy, Ling Huang, Vaishaal Shankar, Tony Wu, George Yiu, Anthony D. Joseph, and J. D. Tygar. 2016. Reviewer Integration and Performance Measurement for Malware Detection. In Detection of Intrusions and Malware, and Vulnerability Assessment.
[27]
misp [n.d.]. MISP Standard. https://www.misp-standard.org/.
[28]
Aziz Mohaisen and Omar Alrawi. 2014. AV-Meter: An Evaluation of Antivirus Scans and Labels. In Detection of Intrusions and Malware, and Vulnerability Assessment.
[29]
Antonio Nappa, M. Zubair Rafique, and Juan Caballero. 2015. The MALICIA Dataset: Identification and Analysis of Drive-by Download Operations. International Journal of Information Security 14, 1 (February 2015), 15–33.
[30]
Roberto Perdisci, Andrea Lanzi, and Wenke Lee. 2008. McBoost: Boosting Scalability in Malware Collection and Analysis using Statistical Classification of Executables. In Annual Computer Security Applications Conference.
[31]
Roberto Perdisci, Wenke Lee, and Nick Feamster. 2010. Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces. In USENIX Symposium on Networked Systems Design and Implementation.
[32]
Roberto Perdisci and U. ManChon. 2012. VAMO: Towards a Fully Automated Malware Clustering Validity Analysis. In Annual Computer Security Applications Conference.
[33]
Konrad Rieck, Thorsten Holz, Carsten Willems, Patrick Düssel, and Pavel Laskov. 2008. Learning and Classification of Malware Behavior. In Detection of Intrusions and Malware, and Vulnerability Assessment.
[34]
Konrad Rieck, Philipp Trinius, Carsten Willems, and Thorsten Holz. 2011. Automatic Analysis of Malware Behavior using Machine Learning. Journal of Computer Security 19, 4 (2011).
[35]
Marcos Sebastián, Richard Rivera, Platon Kotzias, and Juan Caballero. 2016. AVClass: A Tool for Massive Malware Labeling. In Proceedings of the 19th International Symposium on Research in Attacks, Intrusions and Defenses. Evry, France.
[36]
VirusTotal [n.d.]. VirusTotal. https://virustotal.com/.
[37]
vtTags [n.d.]. Full list of VirusTotal Intelligence tag modifier. https://support.virustotal.com/hc/en-us/articles/360002160378-Full-list-of-VirusTotal-Intelligence-tag-modifier.
[38]
Fengguo Wei, Yuping Li, Sankardas Roy, Xinming Ou, and Wu Zhou. 2017. Deep Ground Truth Analysis of Current Android Malware. In Conference on Detection of Intrusions and Malware & Vulnerability Assessment.
[39]
Yajin Zhou and Xuxian Jiang. 2012. Dissecting Android Malware: Characterization and Evolution. In IEEE Symposium on Security and Privacy.
[40]
Shuofei Zhu, Jianjun Shi, Limin Yang, Boqin Qin, Ziyi Zhang, Linhai Song, and Gang Wang. 2020. Measuring and Modeling the Label Dynamics of Online Anti-Malware Engines. (2020).

Cited By

View all
  • (2024)LEDA—Layered Event-Based Malware Detection ArchitectureSensors10.3390/s2419639324:19(6393)Online publication date: 2-Oct-2024
  • (2024)Detection Strategies for COM, WMI, and ALPC-Based Multi-Process MalwareSensors10.3390/s2416511824:16(5118)Online publication date: 7-Aug-2024
  • (2024)Ensemble Malware Classifier Considering PE Section InformationIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.2023CIP0024E107.A:3(306-318)Online publication date: 1-Mar-2024
  • Show More Cited By

Index Terms

  1. AVclass2: Massive Malware Tag Extraction from AV Labels
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        ACSAC '20: Proceedings of the 36th Annual Computer Security Applications Conference
        December 2020
        962 pages
        ISBN:9781450388580
        DOI:10.1145/3427228
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 08 December 2020

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. AV Labels
        2. Malware
        3. Tag
        4. Taxonomy

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        Conference

        ACSAC '20

        Acceptance Rates

        Overall Acceptance Rate 104 of 497 submissions, 21%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)70
        • Downloads (Last 6 weeks)3
        Reflects downloads up to 05 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)LEDA—Layered Event-Based Malware Detection ArchitectureSensors10.3390/s2419639324:19(6393)Online publication date: 2-Oct-2024
        • (2024)Detection Strategies for COM, WMI, and ALPC-Based Multi-Process MalwareSensors10.3390/s2416511824:16(5118)Online publication date: 7-Aug-2024
        • (2024)Ensemble Malware Classifier Considering PE Section InformationIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.2023CIP0024E107.A:3(306-318)Online publication date: 1-Mar-2024
        • (2024)Beyond App Markets: Demystifying Underground Mobile App Distribution Via TelegramProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/37004328:3(1-25)Online publication date: 10-Dec-2024
        • (2024)Towards Demystifying Android Adware: Dataset and Payload LocationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops10.1145/3691621.3694948(167-175)Online publication date: 27-Oct-2024
        • (2024)Android Malware Family Labeling: Perspectives from the IndustryProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695280(2176-2186)Online publication date: 27-Oct-2024
        • (2024)ZeroD-fender: A Resource-aware IoT Malware Detection Engine via Fine-grained Side-channel AnalysisACM Transactions on Design Automation of Electronic Systems10.1145/368748229:6(1-25)Online publication date: 24-Aug-2024
        • (2024)COMEX: Deeply Observing Application Behavior on Real Android DevicesProceedings of the 17th Cyber Security Experimentation and Test Workshop10.1145/3675741.3675745(100-109)Online publication date: 13-Aug-2024
        • (2024)Uncovering and Mitigating the Impact of Code Obfuscation on Dataset Annotation with Antivirus EnginesProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680302(553-565)Online publication date: 11-Sep-2024
        • (2024)Poster: Empirical Analysis of Lifespan Increase of IoT C&C DomainsProceedings of the 2024 ACM on Internet Measurement Conference10.1145/3646547.3689670(767-768)Online publication date: 4-Nov-2024
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media