More Web Proxy on the site http://driver.im/

research-article

AVclass2: Massive Malware Tag Extraction from AV Labels

Authors:

Silvia Sebastián,

Juan CaballeroAuthors Info & Claims

ACSAC '20: Proceedings of the 36th Annual Computer Security Applications Conference

Pages 42 - 53

https://doi.org/10.1145/3427228.3427261

Published: 08 December 2020 Publication History

Abstract

Tags can be used by malware repositories and analysis services to enable searches for samples of interest across different dimensions. Automatically extracting tags from AV labels is an efficient approach to categorize and index massive amounts of samples. Recent tools like AVclass and Euphony have demonstrated that, despite their noisy nature, it is possible to extract family names from AV labels. However, beyond the family name, AV labels contain much valuable information such as malware classes, file properties, and behaviors.

This work presents AVclass2, an automatic malware tagging tool that given the AV labels for a potentially massive number of samples, extracts clean tags that categorize the samples. AVclass2 uses, and helps building, an open taxonomy that organizes concepts in AV labels, but is not constrained to a predefined set of tags. To keep itself updated as AV vendors introduce new tags, it provides an update module that automatically identifies new taxonomy entries, as well as tagging and expansion rules that capture relations between tags. We have evaluated AVclass2 on 42M samples and showed how it enables advanced malware searches and to maintain an updated knowledge base of malware concepts in AV labels.

References

[1]

Daniel Arp, Michael Spreitzenbarth, Malte Huebner, Hugo Gascon, and Konrad Rieck. 2014. Drebin: Efficient and Explainable Detection of Android Malware in Your Pocket. In Network and Distributed System Security.

[2]

Michael Bailey, Jon Oberheide, Jon Andersen, Zhuoqing Morley Mao, Farnam Jahanian, and Jose Nazario. 2007. Automated Classification and Analysis of Internet Malware. In International Symposium on Recent Advances in Intrusion Detection.

[3]

Ulrich Bayer, Paolo Milani Comparetti, Clemens Hlauschek, Christopher Kruegel, and Engin Kirda. 2009. Scalable, Behavior-Based Malware Clustering. In Network and Distributed System Security.

[4]

Desiree Beck and Julie Connolly. 2006. The Common Malware Enumeration Initiative. In Virus Bulletin Conference.

[5]

Julio Canto, Marc Dacier, Engin Kirda, and Corrado Leita. 2008. Large Scale Malware Collection: Lessons Learned. In IEEE SRDS Workshop on Sharing Field Data and Experiment Measurements on Resilience of Distributed Computing Systems.

[6]

CARO [n.d.]. CARO Virus Naming Convention. http://www.caro.org/articles/naming.html.

[7]

George E. Dahl, Jack W. Stokes, Li Deng, and Dong Yu. 2013. Large-Scale Malware Classification using Random Projections and Neural Networks. In IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]

Felipe N Ducau, Ethan M Rudd, Tad M Heppner, Alex Long, and Konstantin Berlin. 2019. SMART: Semantic Malware Attribute Relevance Tagging. arXiv preprint arXiv:1905.06262(2019).

[9]

Euphony [n.d.]. Harmonious Unification of Cacophonous Anti-Virus Vendor Labels for Android Malware. https://github.com/fmind/euphony.

[10]

Ilir Gashi, Bertrand Sobesto, Stephen Mason, Vladimir Stankovic, and Michel Cukier. 2013. A Study of the Relationship between Antivirus Regressions and Label Changes. In International Symposium on Software Reliability Engineering.

[11]

Harry Halpin, Valentin Robu, and Hana Shepherd. 2007. The Complex Dynamics of Collaborative Tagging. In International Conference on World Wide Web.

[12]

Wenyi Huang and Jack W. Stokes. 2016. MtNet: A Multi-Task Neural Network for Dynamic Malware Classification. In Detection of Intrusions and Malware, and Vulnerability Assessment.

[13]

Médéric Hurier, Kevin Allix, Tegawendé Bissyandé, Jacques Klein, and Yves Le Traon. 2016. On the Lack of Consensus in Anti-Virus Decisions: Metrics and Insights on Building Ground Truths of Android Malware. In Detection of Intrusions and Malware, and Vulnerability Assessment.

[14]

Jiyong Jang, David Brumley, and Shobha Venkataraman. 2011. BitShred: Feature Hashing Malware for Scalable Triage and Semantic Analysis. In ACM Conference on Computer and Communications Security.

Digital Library

[15]

Jiyong Jang, Maverick Woo, and David Brumley. 2013. Towards Automatic Software Lineage Inference. In USENIX Security Symposium.

[16]

JoeSandbox [n.d.]. Joe Sandbox. https://www.joesandbox.com/.

[17]

Alex Kantchelian, Michael Carl Tschantz, Sadia Afroz, Brad Miller, Vaishaal Shankar, Rekha Bachwani, Anthony D Joseph, and JD Tygar. 2015. Better Malware Ground Truth: Techniques for Weighting Anti-Virus Vendor Labels. In ACM Workshop on Artificial Intelligence and Security.

Digital Library

[18]

Christian Körner, Dominik Benz, Andreas Hotho, Markus Strohmaier, and Gerd Stumme. 2010. Stop Thinking, Start Tagging: Tag Semantics Emerge from Collaborative Verbosity. In International Conference on World Wide Web.

Digital Library

[19]

Platon Kotzias, Srdjan Matic, Richard Rivera, and Juan Caballero. 2015. Certified PUP: Abuse in Authenticode Code Signing. In ACM Conference on Computer and Communication Security.

Digital Library

[20]

Chaz Lever, Platon Kotzias, Davide Balzarotti, Juan Caballero, and Manos Antonakakis. 2017. A Lustrum of Malware Network Communication: Evolution and Insights. In Proceedings of the 38th IEEE Symposium on Security and Privacy. San Jose, CA, USA.

[21]

Peng Li, Limin Liu, Debin Gao, and Michael K Reiter. 2010. On Challenges in Evaluating Malware Clustering. In International Symposium on Recent Advances in Intrusion Detection.

[22]

Martina Lindorfer, Alessandro Di Federico, Federico Maggi, Paolo Milani Comparetti, and Stefano Zanero. 2012. Lines of Malicious Code: Insights into the Malicious Software Industry. In Annual Computer Security Applications Conference.

[23]

Martina Lindorfer, Matthias Neugschwandtner, Lukas Weichselbaum, Yanick Fratantonio, Victor van der Veen, and Christian Platzer. 2014. ANDRUBIS-1,000,000 Apps Later: A View on Current Android Malware Behaviors. In International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security.

Digital Library

[24]

MAEC [n.d.]. Malware Attribute Enumeration and Characterization. http://maec.mitre.org/.

[25]

Federico Maggi, Andrea Bellini, Guido Salvaneschi, and Stefano Zanero. 2011. Finding Non-Trivial Malware Naming Inconsistencies. In International Conference on Information Systems Security.

[26]

Brad Miller, Alex Kantchelian, Michael Carl Tschantz, Sadia Afroz, Rekha Bachwani, Riyaz Faizullabhoy, Ling Huang, Vaishaal Shankar, Tony Wu, George Yiu, Anthony D. Joseph, and J. D. Tygar. 2016. Reviewer Integration and Performance Measurement for Malware Detection. In Detection of Intrusions and Malware, and Vulnerability Assessment.

[27]

misp [n.d.]. MISP Standard. https://www.misp-standard.org/.

[28]

Aziz Mohaisen and Omar Alrawi. 2014. AV-Meter: An Evaluation of Antivirus Scans and Labels. In Detection of Intrusions and Malware, and Vulnerability Assessment.

[29]

Antonio Nappa, M. Zubair Rafique, and Juan Caballero. 2015. The MALICIA Dataset: Identification and Analysis of Drive-by Download Operations. International Journal of Information Security 14, 1 (February 2015), 15–33.

Digital Library

[30]

Roberto Perdisci, Andrea Lanzi, and Wenke Lee. 2008. McBoost: Boosting Scalability in Malware Collection and Analysis using Statistical Classification of Executables. In Annual Computer Security Applications Conference.

[31]

Roberto Perdisci, Wenke Lee, and Nick Feamster. 2010. Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces. In USENIX Symposium on Networked Systems Design and Implementation.

[32]

Roberto Perdisci and U. ManChon. 2012. VAMO: Towards a Fully Automated Malware Clustering Validity Analysis. In Annual Computer Security Applications Conference.

[33]

Konrad Rieck, Thorsten Holz, Carsten Willems, Patrick Düssel, and Pavel Laskov. 2008. Learning and Classification of Malware Behavior. In Detection of Intrusions and Malware, and Vulnerability Assessment.

[34]

Konrad Rieck, Philipp Trinius, Carsten Willems, and Thorsten Holz. 2011. Automatic Analysis of Malware Behavior using Machine Learning. Journal of Computer Security 19, 4 (2011).

[35]

Marcos Sebastián, Richard Rivera, Platon Kotzias, and Juan Caballero. 2016. AVClass: A Tool for Massive Malware Labeling. In Proceedings of the 19th International Symposium on Research in Attacks, Intrusions and Defenses. Evry, France.

[36]

VirusTotal [n.d.]. VirusTotal. https://virustotal.com/.

[37]

vtTags [n.d.]. Full list of VirusTotal Intelligence tag modifier. https://support.virustotal.com/hc/en-us/articles/360002160378-Full-list-of-VirusTotal-Intelligence-tag-modifier.

[38]

Fengguo Wei, Yuping Li, Sankardas Roy, Xinming Ou, and Wu Zhou. 2017. Deep Ground Truth Analysis of Current Android Malware. In Conference on Detection of Intrusions and Malware & Vulnerability Assessment.

[39]

Yajin Zhou and Xuxian Jiang. 2012. Dissecting Android Malware: Characterization and Evolution. In IEEE Symposium on Security and Privacy.

[40]

Shuofei Zhu, Jianjun Shi, Limin Yang, Boqin Qin, Ziyi Zhang, Linhai Song, and Gang Wang. 2020. Measuring and Modeling the Label Dynamics of Online Anti-Malware Engines. (2020).

Cited By

Portase RPortase RColesa ASebestyen G(2024)LEDA—Layered Event-Based Malware Detection ArchitectureSensors10.3390/s2419639324:19(6393)Online publication date: 2-Oct-2024
https://doi.org/10.3390/s24196393
Portase RMuntea AMermeze AColesa ASebestyen G(2024)Detection Strategies for COM, WMI, and ALPC-Based Multi-Process MalwareSensors10.3390/s2416511824:16(5118)Online publication date: 7-Aug-2024
https://doi.org/10.3390/s24165118
TAKEUCHI RMITSUHASHI RNISHIGAKI MOHKI T(2024)Ensemble Malware Classifier Considering PE Section InformationIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.2023CIP0024E107.A:3(306-318)Online publication date: 1-Mar-2024
https://doi.org/10.1587/transfun.2023CIP0024
Show More Cited By

Index Terms

AVclass2: Massive Malware Tag Extraction from AV Labels
1. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
  2. Systems security
    1. Operating systems security
2. Social and professional topics
  1. Computing / technology policy
    1. Computer crime

Index terms have been assigned to the content through auto-classification.

Recommendations

The Next Malware Battleground: Recovery After Unknown Infection

Malware has become a natural aspect of Internet computing due to the imperfectness of systems that identify malware and prevent their installation. Our ability to control the volume of unwanted and malicious traffic on the Internet—the spam messages, ...
Visualization of Tag Sequence
CSSE '08: Proceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 05

Tags, the user-generated metadata for web resources, have been widely used in social networking systems, such as del.icio.us, Flickr, Youtube and Facebook. Tagging technology is deserved to be studied for education because it can reveal the pattern of ...
Revealing Packed Malware

In concert with the ever-growing network applications, a significant increase in the spread of malware over the Internet has been observed. In cases where malware are the zero-day threats, generating their signatures for detection via anti-virus (AV) ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACSAC '20: Proceedings of the 36th Annual Computer Security Applications Conference

December 2020

962 pages

ISBN:9781450388580

DOI:10.1145/3427228

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Ministerio de Ciencia, Innovación y Universidades
Comunidad de Madrid

Conference

ACSAC '20

ACSAC '20: Annual Computer Security Applications Conference

December 7 - 11, 2020

Austin, USA

Acceptance Rates

Overall Acceptance Rate 104 of 497 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

80
Total Citations
View Citations
464
Total Downloads

Downloads (Last 12 months)70
Downloads (Last 6 weeks)3

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Portase RPortase RColesa ASebestyen G(2024)LEDA—Layered Event-Based Malware Detection ArchitectureSensors10.3390/s2419639324:19(6393)Online publication date: 2-Oct-2024
https://doi.org/10.3390/s24196393
Portase RMuntea AMermeze AColesa ASebestyen G(2024)Detection Strategies for COM, WMI, and ALPC-Based Multi-Process MalwareSensors10.3390/s2416511824:16(5118)Online publication date: 7-Aug-2024
https://doi.org/10.3390/s24165118
TAKEUCHI RMITSUHASHI RNISHIGAKI MOHKI T(2024)Ensemble Malware Classifier Considering PE Section InformationIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.2023CIP0024E107.A:3(306-318)Online publication date: 1-Mar-2024
https://doi.org/10.1587/transfun.2023CIP0024
Guo YWang DWang LFang YWang CYang MLiu TWang H(2024)Beyond App Markets: Demystifying Underground Mobile App Distribution Via TelegramProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/37004328:3(1-25)Online publication date: 10-Dec-2024
https://dl.acm.org/doi/10.1145/3700432
Wang CLiu TZhao YZhang LDu XLi LWang HFilkov VRay BZhou M(2024)Towards Demystifying Android Adware: Dataset and Payload LocationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops10.1145/3691621.3694948(167-175)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691621.3694948
Wang LWang HZhang TXu HMeng GGao PWei CWang YFilkov VRay BZhou M(2024)Android Malware Family Labeling: Perspectives from the IndustryProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695280(2176-2186)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695280
Li ZZhao D(2024)ZeroD-fender: A Resource-aware IoT Malware Detection Engine via Fine-grained Side-channel AnalysisACM Transactions on Design Automation of Electronic Systems10.1145/368748229:6(1-25)Online publication date: 24-Aug-2024
https://dl.acm.org/doi/10.1145/3687482
Umayya ZMalik DNandi AKumar AKarapoola SChakravarty S(2024)COMEX: Deeply Observing Application Behavior on Real Android DevicesProceedings of the 17th Cyber Security Experimentation and Test Workshop10.1145/3675741.3675745(100-109)Online publication date: 13-Aug-2024
https://dl.acm.org/doi/10.1145/3675741.3675745
Cuiying GWu YLi HYuan WJiang HHe QLiu YChristakis MPradel M(2024)Uncovering and Mitigating the Impact of Code Obfuscation on Dataset Annotation with Antivirus EnginesProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680302(553-565)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680302
Uroz DRodríguez RGañán CVallina-Rodríguez NSuarez-Tángil GLevin DPelsser C(2024)Poster: Empirical Analysis of Lifespan Increase of IoT C&C DomainsProceedings of the 2024 ACM on Internet Measurement Conference10.1145/3646547.3689670(767-768)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3646547.3689670
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents