Authors:
William La Cholter
;
Matthew Elder
and
Antonius Stalick
Affiliation:
Applied Physics Laboratory, Johns Hopkins University, U.S.A.
Keyword(s):
Malware, GitHub, Open Source Software, Windows.
Abstract:
Does malware lurking in GitHub pose a threat? GitHub is the most popular open source software website, having 188 million repositories. GitHub hosts malware-related projects for research and educational purposes and has also been used by malware to attack users. In this paper, we explore the prevalence of unencrypted, uncompressed binary code malware in Microsoft Windows compatible C and C++ GitHub repositories and characterize the threat. We mined 1,835 repositories for already-compiled malicious files and data suggesting whether the repository is malware-related. We focused on these repositories because Windows is frequently targeted by malware written in C or C++. These repositories are good resources for attackers and could target Windows users. We extracted all Portable Executable (PE) files from all commits and queried the malware resource VirusTotal for analysis from its 76 anti-virus engines. Of the 24,395 files, 4,335 are suspicious, with at least one detection; 440 could be
considered malicious, with at least seven detections. We identify topic tags suggesting malware or offensive security content, to differentiate from seemingly benign repositories. 197 of 440 malicious executables were in 27 ostensibly benign repositories. This work illustrates risks in source code repositories and lessons learned in relating GitHub and VirusTotal data.
(More)