Abstract
Today’s security threats like malware are more sophisticated and targeted than ever, and they are growing at an unprecedented rate. To deal with them, various approaches are introduced. One of them is Signature-based detection, which is an effective method and widely used to detect malware; however, there is a substantial problem in detecting new instances. In other words, it is solely useful for the second malware attack. Due to the rapid proliferation of malware and the desperate need for human effort to extract some kinds of signature, this approach is a tedious solution; thus, an intelligent malware detection system is required to deal with new malware threats. Most of intelligent detection systems utilise some data mining techniques in order to distinguish malware from sane programs. One of the pivotal phases of these systems is extracting features from malware samples and benign ones in order to make at least a learning model. This phase is called “Malware Analysis” which plays a significant role in these systems. Since API call sequence is an effective feature for realising unknown malware, this paper is focused on extracting this feature from executable files. There are two major kinds of approach to analyse an executable file. The first type of analysis is “Static Analysis” which analyses a program in source code level. The second one is “Dynamic Analysis” that extracts features by observing program’s activities such as system requests during its execution time. Static analysis has to traverse the program’s execution path in order to find called APIs. Because it does not have sufficient information about decision making points in the given executable file, it is not able to extract the real sequence of called APIs. Although dynamic analysis does not have this drawback, it suffers from execution overhead. Thus, the feature extraction phase takes noticeable time. In this paper, a novel hybrid approach, HDM-Analyser, is presented which takes advantages of dynamic and static analysis methods for rising speed while preserving the accuracy in a reasonable level. HDM-Analyser is able to predict the majority of decision making points by utilising the statistical information which is gathered by dynamic analysis; therefore, there is no execution overhead. The main contribution of this paper is taking accuracy advantage of the dynamic analysis and incorporating it into static analysis in order to augment the accuracy of static analysis. In fact, the execution overhead has been tolerated in learning phase; thus, it does not impose on feature extraction phase which is performed in scanning operation. The experimental results demonstrate that HDM-Analyser attains better overall accuracy and time complexity than static and dynamic analysis methods.
Similar content being viewed by others
References
Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: Detection of new malicious code using n-grams signatures. In: Proceedings of Second Annual Conference on Privacy, Security and Trust, pp. 193–196. Citeseer (2004)
Bayer, U., Kruegel, C., Kirda, E.: Ttanalyze: A tool for analyzing malware. In: 15th European Institute for Computer Antivirus Research (EICAR 2006) Annual Conference. Citeseer (2006)
Bergeron, J., Debbabi, M., Desharnais, J., Erhioui, M., Lavoie, Y., Tawbi, N.: Static detection of malicious code in executable programs. Int. J. Req. Eng. 2001, 184–189 (2001)
Bergeron, J., Debbabi, M., Desharnais, J., Ktari, B., Salois, M., Tawbi, N., Charpentier, R., Patry, M.: Detection of malicious code in cots software: A short survey. In: First International Software Assurance Certification Conference (ISACC99) (1999)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Cleary, J., Trigg, L.: K*: An instance-based learner using an entropic distance measure. In: Machine Learning-International Workshop Then Conference-, pp. 108–114. Citeseer (1995)
Dietterich, T.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach. Learn. 40(2), 139–157 (2000)
Duds, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Fasikhov, R.: Api logger tool. http://blackninja2000.narod.ru/rus/api_logger.html
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2), 131–163 (1997)
Holmes, G., Donkin, A., Witten, I.: Weka: A machine learning workbench. In: Intelligent Information Systems, 1994. Proceedings of the 1994 Second Australian and New Zealand Conference on, pp. 357–361. IEEE (1994)
Iba, W., Langley, P.: Induction of one-level decision trees. In: Proceedings of the Ninth International Conference on, Machine Learning, pp. 233–240 (1992)
Idika, N., Mathur, A.: A survey of malware detection techniques. Purdue University (2007)
Langley, P., Iba, W., Thompson, K.: An analysis of bayesian classifiers. In: Proceedings of the National Conference on Artificial Intelligence, pp. 223–223. Wiley, Hoboken (1992)
Lewis, D.: Naive (bayes) at forty: The independence assumption in information retrieval. Machine Learning: ECML-98, pp. 4–15 (1998)
Orenstein, D.: Quickstudy: Application programming interface (api) (2000)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, Los Altos (1988)
Picard, R., Cook, R.: Cross-validation of regression models. J. Am. Stat. Assoc., 575–583 (1984)
Platt, J.: 12 fast training of support vector machines using sequential minimal, optimization (1998)
Rabek, J., Khazan, R., Lewandowski, S., Cunningham, R.: Detection of injected, dynamically generated, and obfuscated malicious code. In: Proceedings of the 2003 ACM workshop on Rapid malcode, pp. 76–82. ACM (2003)
Roundy, K., Miller, B.: Hybrid analysis and control of malware. In: Recent Advances in Intrusion Detection, pp. 317–338. Springer, Berlin (2010)
Sekar, R., Bowen, T., Segal, M.: On preventing intrusions by process behavior monitoring. In: USENIX Intrusion Detection, Workshop, vol. 1999 (1999)
Siddiqui, M.: Data mining methods for malware detection. ProQuest (2008)
Sung, A., Xu, J., Chavez, P., Mukkamala, S.: Static analyzer of vicious executables (save). In: Computer Security Applications Conference, 2004. 20th Annual, pp. 326–334. IEEE (2004)
Szor, P.: The Art of Computer Virus Research and Defense. Addison-Wesley Professional, Reading (2005)
Tzermias, Z., Sykiotakis, G., Polychronakis, M., Markatos, E.: Combining static and dynamic analysis for the detection of malicious documents. In: Proceedings of the Fourth European Workshop on System Security, p. 4. ACM (2011)
Wagner, D., Dean, R.: Intrusion detection via static analysis. In: Security and Privacy, 2001. S &P 2001. Proceedings. 2001 IEEE Symposium on, pp. 156–168. IEEE (2001)
Xu, J., Sung, A., Chavez, P.: Mukkamala, S.: Polymorphic malicious executable scanner by api sequence analysis (2004)
Ye, Y., Wang, D., Li, T., Ye, D.: Imds: Intelligent malware detection system. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1043–1047. ACM (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Eskandari, M., Khorshidpour, Z. & Hashemi, S. HDM-Analyser: a hybrid analysis approach based on data mining techniques for malware detection. J Comput Virol Hack Tech 9, 77–93 (2013). https://doi.org/10.1007/s11416-013-0181-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-013-0181-8