CN117521063A - Malicious software detection method and device based on residual neural network and combined with transfer learning - Google Patents
Malicious software detection method and device based on residual neural network and combined with transfer learning Download PDFInfo
- Publication number
- CN117521063A CN117521063A CN202311397538.6A CN202311397538A CN117521063A CN 117521063 A CN117521063 A CN 117521063A CN 202311397538 A CN202311397538 A CN 202311397538A CN 117521063 A CN117521063 A CN 117521063A
- Authority
- CN
- China
- Prior art keywords
- model
- neural network
- data set
- malicious code
- residual neural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 41
- 238000013526 transfer learning Methods 0.000 title claims abstract description 25
- 238000000034 method Methods 0.000 claims abstract description 72
- 238000001514 detection method Methods 0.000 claims abstract description 46
- 230000006870 function Effects 0.000 claims abstract description 36
- 238000012549 training Methods 0.000 claims abstract description 31
- 238000012360 testing method Methods 0.000 claims abstract description 14
- 230000004913 activation Effects 0.000 claims description 18
- 238000010606 normalization Methods 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 11
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 238000013508 migration Methods 0.000 claims description 4
- 230000005012 migration Effects 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 abstract description 8
- 230000008901 benefit Effects 0.000 abstract description 6
- 238000013135 deep learning Methods 0.000 abstract description 5
- 238000011156 evaluation Methods 0.000 abstract description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 abstract description 3
- 238000002474 experimental method Methods 0.000 abstract description 2
- 230000000007 visual effect Effects 0.000 abstract description 2
- 238000012546 transfer Methods 0.000 abstract 1
- 210000002569 neuron Anatomy 0.000 description 12
- 238000010586 diagram Methods 0.000 description 10
- 238000012986 modification Methods 0.000 description 6
- 230000004048 modification Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000008033 biological extinction Effects 0.000 description 3
- 238000005065 mining Methods 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 210000004205 output neuron Anatomy 0.000 description 2
- 238000007493 shaping process Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/096—Transfer learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computer Security & Cryptography (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Computer Hardware Design (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Virology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a malicious software detection method and device based on a residual neural network and combined with transfer learning, wherein the method firstly converts original byte codes of malicious codes in a dataset into a visual gray level map, then normalizes and enhances the dataset to expand the dataset, and transfers a ResNet50 model as a basic model into image classification of the malicious software to accelerate convergence of the model and verify the model on a test set, finally sets an Adam optimizer to compile the model, and generates images of accuracy and loss functions and various evaluation indexes after training is completed so as to evaluate the advantages and disadvantages of the model. Through experiments, the accuracy of detecting malicious codes by using a deep learning method reaches 94%, and the method is more accurate and has universality than the detection of the traditional machine learning method.
Description
Technical Field
The invention relates to the technical field of deep learning and network security, in particular to a malicious software detection method and device based on a residual neural network and combined with transfer learning.
Background
In recent years, as the complexity of computer software systems has increased, the number of potential security vulnerabilities has shown an increasing trend.
Although part of the disclosed software security vulnerabilities have been remedied, this does not mean that the hazards faced by computer users when using the software system are reduced. In recent years, software security holes not only proliferate year by year in number, but also have the characteristics of complexity and diversity, and bring serious challenges to the normal operation of a software system.
Early software security vulnerability discovery techniques were largely divided into three categories: 1) Static analysis techniques. This technique refers to analyzing the syntax, semantics, control flow and data flow of program source code or bytecode without running the program, thereby detecting possible potential security vulnerabilities in the target program. 2) Dynamic analysis techniques. The technology is to analyze the running state, the execution path and the register state of the running program under the running condition of the program so as to find the security hole in the dynamic debugger. 3) Hybrid analysis techniques. The technology is to combine static analysis and dynamic analysis technologies simultaneously so as to excavate security holes of the target program.
In recent years, with the advent of artificial intelligence (artificial intelligence, AI) technology, effective features of data can be automatically extracted from complex high-dimensional data using AI. Currently, AI is mainly applied to the security hole mining field by using Machine Learning (ML), natural language processing (natural language processing, NLP) and Deep Learning (DL) to realize automatic and intelligent research of software security holes. However, the software security vulnerability mining model based on traditional machine learning relies on security experts to define vulnerability characteristics, only known vulnerability information can be mined, unknown vulnerability information cannot be mined in an actual application environment, and the application range is limited. Meanwhile, the existing software security vulnerability mining model based on machine learning cannot indicate key sentences or features related to vulnerabilities, so that the accurate position where the security vulnerability exists is difficult to locate.
Disclosure of Invention
The invention provides a malicious software detection method and device based on a residual neural network and combined with transfer learning, which are used for solving or at least partially solving the technical problem of low detection accuracy in the prior art.
In order to solve the technical problem, a first aspect of the present invention provides a method for detecting malware based on a residual neural network and combined with transfer learning, including:
acquiring an original data set;
the method for expanding the original data set by adopting the data enhancement method specifically comprises the following steps: carrying out normalization processing on the malicious code binary file, converting the malicious code binary file into a gray image, and expanding the obtained gray image;
constructing a sample data set according to the gray level image in the expanded data set, and dividing a training set and a testing set;
the method comprises the steps of adopting a pre-constructed residual neural network as a basic model to obtain partial parameters, migrating the partial parameters to the identification process of a malicious code gray level image, constructing a flat layer and a full-connection layer to form a detection model, inputting a training set into the detection model for training, and finely adjusting the parameters of the detection model according to output data;
and detecting the sample to be detected by using the fine-tuned detection model.
In one embodiment, the original data set is derived from malicious samples provided by the Kaggle platform, and each piece of data of the data set is divided into a. Asm format and a. Bytes format, and a file in the. Bytes format is used as a sample of the original data set.
In one embodiment, normalizing malicious code binary files includes:
and processing the generated original byte matrix of the malicious code by adopting a min-max normalization method.
In one embodiment, converting a malicious code binary file into a grayscale image includes: for a given malicious code binary executable file, each 8 bits of the malicious code binary executable file is divided into a subsequence, each subsequence is converted into two hexadecimal numbers through binary conversion, the subsequences are corresponding to pixel intervals [0,255], and the subsequences are arranged in sequence.
In one embodiment, expanding the resulting gray scale image includes:
and (3) rotating, cutting, vertically turning, horizontally turning and adjusting the contrast of the obtained gray image.
In one embodiment, the pre-built residual neural network is an ImageNet pre-trained ResNet50 model, which is composed of a plurality of basic residual blocks, one basic residual block including a convolution layer, a ReLU activation function, batch normalization, downsampling, a global average pooling layer, a full connection layer.
In one embodiment, the detection model uses ResNet50 as the infrastructure, then sets a Flatten layer and five Dense layers, adds a Dropout between the third and fourth Dense layers, and adds a Dropout between the fourth and fifth Dense layers, using the ReLU and Softmax functions as the activation functions.
Based on the same inventive concept, a second aspect of the present invention provides a malware detection device based on a residual neural network and combined with transfer learning, comprising:
the data acquisition module is used for acquiring an original data set;
the data enhancement module is used for expanding the original data set by adopting a data enhancement method, and specifically comprises the following steps: carrying out normalization processing on the malicious code binary file, converting the malicious code binary file into a gray image, and expanding the obtained gray image;
the data set dividing module constructs a sample data set according to the gray level image in the expanded data set and divides a training set and a testing set;
the model construction module is used for adopting a pre-constructed residual neural network as a basic model to obtain partial parameters, migrating the partial parameters to the identification process of the malicious code gray level image, constructing a flat layer and a full-connection layer to form a detection model, inputting a training set into the detection model for training, and finely adjusting the parameters of the detection model according to the output data;
and the detection module is used for detecting the sample to be detected by utilizing the fine-tuned detection model.
Based on the same inventive concept, a third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed, implements the method of the first aspect.
Based on the same inventive concept, a fourth aspect of the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, said processor implementing the method according to the first aspect when executing said program.
Compared with the prior art, the invention has the following advantages and beneficial technical effects:
the malicious code detection method based on the residual neural network and combined with transfer learning is provided, a data enhancement technology is adopted to expand an original data set, and the scale of the original data set is enlarged, so that the generalization capability of a model is improved, the risk of overfitting is reduced, and the model predicts unknown data more accurately; by using the existing knowledge and experience, the transfer learning technique can solve the problem of insufficient data or poor data quality, and can avoid the time and computing resources required for training the model from scratch. The method has the advantages that the pre-constructed residual neural network is adopted as a basic model, partial parameters are obtained, the partial parameters are migrated to the identification process of the malicious code gray level image, then a flat layer and a full connection layer are constructed to form a detection model for detecting malicious software, the problems that the malicious software detection model based on traditional machine learning relies on manual definition of the malicious software characteristics, the false alarm rate and the false alarm rate are high are solved, and the detection accuracy is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a process of converting a malicious code binary file into a grayscale image according to an embodiment of the present invention;
FIG. 2 is a basic frame diagram of a detection model in an embodiment of the present invention;
FIG. 3 is a diagram of a ResNet50 model framework in an embodiment of the present invention;
fig. 4 is a block diagram of a basic residual block in an embodiment of the present invention;
FIG. 5 is a diagram of a ReLU function image according to an embodiment of the invention;
FIG. 6 is a schematic diagram of a Softmax function image according to an embodiment of the present invention;
fig. 7 is a graph showing the change of the neural network before and after Dropout is used in the embodiment of the present invention.
Detailed Description
In order to more accurately realize the classification of the malicious software, the invention provides a malicious software detection method based on ResNet (residual neural network) and combining gray code visualization of transfer learning. Firstly, original byte codes of malicious codes in a data set are converted into a visual gray level map, then the data set is normalized and data enhanced to expand the data set, a ResNet50 model is used as a basic model to be migrated into image classification of malicious software so as to accelerate convergence of the model and verify the model on a test set, finally an Adam optimizer is arranged to compile the model, and images of accuracy and loss functions and various evaluation indexes are generated after training is completed so as to evaluate the advantages and disadvantages of the model.
In the implementation process, nine common malicious code family samples are collected from the kagle platform, and 10868 samples are taken in total. The samples were randomly mixed and split into training and test sets, with a ratio of training and test sets of 7:3. Through experiments, the accuracy of detecting malicious codes by using a deep learning method reaches 94%, and the method is more accurate and has universality than the detection of the traditional machine learning method.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
The embodiment of the invention provides a malicious software detection method based on a residual neural network and combined with transfer learning, which comprises the following steps:
acquiring an original data set;
the method for expanding the original data set by adopting the data enhancement method specifically comprises the following steps: carrying out normalization processing on the malicious code binary file, converting the malicious code binary file into a gray image, and expanding the obtained gray image;
constructing a sample data set according to the gray level image in the expanded data set, and dividing a training set and a testing set;
the method comprises the steps of adopting a pre-constructed residual neural network as a basic model to obtain partial parameters, migrating the partial parameters to the identification process of a malicious code gray level image, constructing a flat layer and a full-connection layer to form a detection model, inputting a training set into the detection model for training, and finely adjusting the parameters of the detection model according to output data;
and detecting the sample to be detected by using the fine-tuned detection model.
Specifically, before normalization processing, a memory map is created, and a generator is created to write malicious code binary files into memory. When the training set and the test set are divided, the corresponding proportion can be set according to the requirement, and in the embodiment, the proportion of the training set and the test set is 7:3.
the detection model takes the residual neural network as a basic model, and builds a flat layer and a full-connection layer, so that the detection model is obtained, and then the complete detection model is obtained through training and parameter fine adjustment.
In one embodiment, the original data set is derived from malicious samples provided by the Kaggle platform, and each piece of data of the data set is divided into a. Asm format and a. Bytes format, and a file in the. Bytes format is used as a sample of the original data set.
Specifically, the original dataset contains a total of 10868 pieces of data from 9 families of malicious code samples from the internet platform.
In one embodiment, normalizing malicious code binary files includes:
and processing the generated original byte matrix of the malicious code by adopting a min-max normalization method.
Specifically, min-Max normalization is used to scale data to within a specified range. The raw data is converted to new values by linear transformation such that the new values fall within specified intervals, typically between [0,1] or [ -1,1 ].
The method comprises the following steps: selecting the maximum value and the minimum value in all data of a feature; for each data of the feature, subtracting the minimum value from it, and dividing by the difference between the maximum value and the minimum value; the above steps are repeated and normalization is performed for each feature.
After the Min-Max normalization processing, the numerical value of each feature is mapped to a specified range, and the importance weight of each feature is not changed. Max is the maximum value of the sample data, min is the minimum value of the sample data, and Min-Max is expressed as follows:
x represents sample data, x * Representing the normalized data.
In one embodiment, converting a malicious code binary file into a grayscale image includes: for a given malicious code binary executable file, each 8 bits of the malicious code binary executable file is divided into a subsequence, each subsequence is converted into two hexadecimal numbers through binary conversion, the subsequences are corresponding to pixel intervals [0,255], and the subsequences are arranged in sequence.
In particular, by the above method it can be visualized as a corresponding malicious code grey-scale map, the width and height of the image depending on the size of the file. In this embodiment, the converted gray image format is JPEG, so that it is ensured that each line of the input array corresponds to one pixel of the output image, and each pixel is represented by 16 bytes of data, and the input array is adjusted to meet the requirements of the JPEG format image, and in this embodiment, the input array is adjusted to be square. The method is characterized in that square root shaping of the original array size is taken as width or height, and rounding is carried out upwards to the power nearest to 2, so that the adjusted width or height is obtained.
In the implementation process, square root shaping of the original array size is taken as width or height, and rounded up to the power nearest to 2, so that the adjusted width or height is obtained. The process of converting malicious code binary files into grayscale images is shown in fig. 1.
The kernel as library is used for analyzing executable programs generated by malicious codes. The Keras library is an open source library based on tensorflow2.0, which can analyze binary files.
In one embodiment, expanding the resulting gray scale image includes:
and (3) rotating, cutting, vertically turning, horizontally turning and adjusting the contrast of the obtained gray image.
In the specific implementation process, functions in an image library in tensorflow are used for rotating, cutting, vertically turning, horizontally turning and adjusting contrast to expand data, so that generalization capability of a model is improved, overfitting risk is reduced, and the model predicts unknown data more accurately. Wherein the number of times of rotation is random 0-3 times, the random contrast factor of contrast is 0.2 at the upper limit, and the lower limit is 0.5.
In one embodiment, the pre-built residual neural network is an ImageNet pre-trained ResNet50 model, which is composed of a plurality of basic residual blocks, one basic residual block including a convolution layer, a ReLU activation function, batch normalization, downsampling, a global average pooling layer, a full connection layer.
Specifically, the activation functions of the convolution layer and the full connection layer of the constructed residual neural network model are a nonlinear activation function Relu and a confidence conversion function Softmax.
In one embodiment, the detection model is based on ResNet50, followed by a Flatten layer and five Dense layers, with a Dropout being added between the third and fourth Dense layers, and a Dropout being added between the fourth and fifth Dense layers, using the ReLU and Softmax functions as activation functions.
In a specific implementation, dropout is used to reduce overfitting when training the model. The number of neurons of each of the five Dense layers (fully connected layers) was 1024, 512, 256, 128, 10, respectively, with one dropoff with a probability of p=0.3 being added between the third and fourth Dense layers, and one dropoff with a probability of p=0.2 being added between the fourth and fifth Dense layers. The basic framework of the detection model is shown in fig. 2.
ResNet50 is comprised of a plurality of basic residual blocks. The model frame is shown in fig. 3.
The problem of gradient extinction of the deep neural network is a common problem, and the reason for the gradient extinction is that the activation input value of the deep neural network is shifted and changed, so that the gradient becomes very small and even 0, and effective back propagation cannot be performed to update network parameters. BN is input through each layer of the normalized network and forcibly pulled back to normal standard distribution, so that the activation input value falls into the nonlinear function linear region, and the problem of gradient disappearance is avoided. The basic idea of BN is to normalize each dimension feature to have a mean of 0 and a variance of 1 on each small batch of data, and then perform linear and nonlinear transformations. This avoids bias and variation in the eigenvalues, making the training of the network more stable and rapid.
The calculated input to BN is the value of x on mini-batch. See formula (1).
B={x 1 ,…,x m }#(1)
B represents the set of input data, x 1 …x m Representing the first data and the mth data output from the convolutional layer.
And calculating mini-batch. See formula (2)
The mini-batch difference was calculated. See formula (3)
μ B Represents the average value of the samples, m represents the input number of the samples, and x i Representing the ith data, sigma B The standard deviation of the samples is shown.
Normalization. See formula (4)
Representing an unbiased estimate of the first data of a pair, e > 0, is a small constant that is used to ensure that the denominator is greater than 0.
And restoring the characteristic distribution. See formula (5)
Gamma represents the stretching parameter and beta represents the offsetParameters, both of which are learnable; n (N) γ,β (x i ) Representing normalized data.
The frame of the entire basic residual block is shown in fig. 4. Wherein, filter_num represents the number of filters, and stride represents the number of frames that the filters need to jump, i.e. the step size, when the original is scanned.
The core idea of the transfer learning is to use a trained model to perform fine tuning or adjustment on a new task, so as to accelerate and improve the learning efficiency and performance of the new task.
The model uses the migration learning based on the model, the parameters and the knowledge of the source field are reserved by adjusting the neural network model, and the model layer is shared on the target field by using the parameters, so that the characteristics of some low-level parameters can be better adapted to the tasks of the target field. Model-based transfer learning processes are generally divided into three steps: first, a model is selected that is pre-trained, typically on a large data set (e.g., imageNet, GPT, BERT, etc.). Next, a representation of the features in the model is extracted by inputting new data into the originally trained model. Finally, these feature representations are used to fine-tune a new model for the target task. In the fine tuning process, the initial weights of the new model are typically based on the weights of the original pre-trained model, and then the new model is trained by using small sample data to update the weights.
The model uses a ResNet50 model which is pre-trained by the ImageNet as a basis, freezes the weight of a pre-trained convolution layer, uses the weight which is trained by the ImageNet but does not use a full-connection layer for classification, then builds a Flatten layer and five Dense layers to form a new model, places a data set into the model for training, and carries out proper fine adjustment on model parameters according to output data to finally obtain the complete new model.
The present model uses a ReLU activation function and a Softmax activation function.
An activation function in a neural network serves to map neuron inputs to outputs. The present model uses two activation functions, a ReLU activation function for the two convolutional layers and the first Dense layer and a Softmax activation function for the second Dense layer, respectively.
The ReLU activation function is often referred to as a ramp function in mathematics, which is proposed to solve the gradient vanishing problem. The gradient of ReLU can only take two values: 0 or 1, when the input is less than 0, the gradient is 0; when the input is greater than 0, the gradient is 1, so that the problem of gradient extinction can be avoided. The ReLU function is shown in formula (6).
The ReLU function image is shown in fig. 5.
The Softmax activation function is used primarily to predict multivariate classification problems. In a neural network, each neuron of the output layer represents the confidence of the network for different classes, and the Softmax function converts these confidence into a probability distribution, the output of the neuron can be normalized to a probability value of 0 to 1, and the sum of all outputs is equal to 1. The Softmax function is shown in formula (7).
z c Representing the output value of the c-th neuron,
z d represents the d-th output neuron, denominatorAnd the sum of exponential functions representing all output data, K being the number of output neurons.
The Softmax function image is shown in fig. 6.
The present model uses Dropout to reduce the risk of overfitting of the model.
The regularization technique Dropout is used to limit the model parameters so that the overfitting phenomenon can be reduced. In each training iteration Dropout will randomly set the output of some neurons that will not contribute to the forward or backward propagation of the network in that iteration, the probability of each neuron being set to zero is a super parameter p, the other neurons being kept with a probability of q=1-p, the expression of its input is given in (8).
By randomly discarding neurons, dropout can train a neural network with greater generalization capability. In the training process, the network does not depend on any neuron excessively, because each neuron can be discarded randomly, so that Dropout can reduce the interaction between hidden neurons and improve the performance of the model. The change of the neural network before and after using Dropout is shown in fig. 7.
In one embodiment, the method further comprises using the test set data to test indexes such as accuracy, precision, recall, F1 score and the like of the residual neural network model to evaluate the model.
In this embodiment, an Adam optimizer is used to compile a model, and a loss function of the compiled model uses a cross entropy loss function, and evaluates the quality of the model according to 4 evaluation indexes ACC (Accuracy), precision (Precision), recall (Recall), and F1 Score (F1 Score).
Assuming that a malicious sample is Positive example-Positive, and a benign sample is Negative example-Negative; the prediction is correctly True, otherwise False. The following table representing four predicted values TP, FP, TN, FN is obtained:
TABLE 1 confusion matrix
There is then a definition of the evaluation index as follows:
compared with the existing malicious code detection method, the technical scheme of the invention has the following advantages:
1. the malicious code detection method based on the residual neural network and combined with transfer learning solves the problems that a malicious software detection model based on traditional machine learning relies on manually defining malicious software characteristics, and the false alarm rate are high.
2. The proposed method uses data enhancement techniques. The original dataset may be small or very unbalanced, resulting in a model that is over-fitted or does not represent the global property well. The data enhancement can expand the scale of the original data set through a series of transformation, so that the generalization capability of the model is improved, the risk of over-fitting is reduced, and the model can accurately predict unknown data.
3. The proposed method uses a transfer learning technique. By utilizing existing knowledge and experience, the transfer learning can solve the problem of insufficient data or poor data quality, and can avoid the time and computational resources required to train the model from scratch.
Example two
Based on the same inventive concept, the embodiment discloses a malicious software detection device based on a residual neural network and combined with transfer learning, comprising:
the data acquisition module is used for acquiring an original data set;
the data enhancement module is used for expanding the original data set by adopting a data enhancement method, and specifically comprises the following steps: carrying out normalization processing on the malicious code binary file, converting the malicious code binary file into a gray image, and expanding the obtained gray image;
the data set dividing module constructs a sample data set according to the gray level image in the expanded data set and divides a training set and a testing set;
the model construction module is used for adopting a pre-constructed residual neural network as a basic model to obtain partial parameters, migrating the partial parameters to the identification process of the malicious code gray level image, constructing a flat layer and a full-connection layer to form a detection model, inputting a training set into the detection model for training, and finely adjusting the parameters of the detection model according to the output data;
and the detection module is used for detecting the sample to be detected by utilizing the fine-tuned detection model.
Because the device described in the second embodiment of the present invention is a device for implementing the method for detecting malware based on the residual neural network and combined with transfer learning in the first embodiment of the present invention, based on the method described in the first embodiment of the present invention, a person skilled in the art can know the specific structure and deformation of the device, and therefore, the detailed description thereof is omitted herein. All devices used in the method of the first embodiment of the present invention are within the scope of the present invention.
Example III
Based on the same inventive concept, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed, implements the method as described in embodiment one.
Because the computer readable storage medium described in the third embodiment of the present invention is a computer readable storage medium used for implementing the method for detecting malware based on the residual neural network and combined with the migration learning in the first embodiment of the present invention, based on the method described in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and the modification of the computer readable storage medium, and therefore, the description thereof is omitted here. All computer readable storage media used in the method according to the first embodiment of the present invention are included in the scope of protection.
Example IV
Based on the same inventive concept, the present application also provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the method in the first embodiment when executing the program.
Because the computer device described in the fourth embodiment of the present invention is a computer device used for implementing the method for detecting malware based on the residual neural network and combined with transfer learning in the first embodiment of the present invention, based on the method described in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and deformation of the computer device, and therefore, the description thereof is omitted herein. All computer devices used in the method of the first embodiment of the present invention are within the scope of the present invention.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various modifications and variations can be made to the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims and the equivalents thereof, the present invention is also intended to include such modifications and variations.
Claims (10)
1. The malicious software detection method based on the residual neural network and combined with transfer learning is characterized by comprising the following steps of:
acquiring an original data set;
the method for expanding the original data set by adopting the data enhancement method specifically comprises the following steps: carrying out normalization processing on the malicious code binary file, converting the malicious code binary file into a gray image, and expanding the obtained gray image;
constructing a sample data set according to the gray level image in the expanded data set, and dividing a training set and a testing set;
the method comprises the steps of adopting a pre-constructed residual neural network as a basic model to obtain partial parameters, migrating the partial parameters to the identification process of a malicious code gray level image, constructing a flat layer and a full-connection layer to form a detection model, inputting a training set into the detection model for training, and finely adjusting the parameters of the detection model according to output data;
and detecting the sample to be detected by using the fine-tuned detection model.
2. The method for detecting malicious software based on a residual neural network and combined with transfer learning according to claim 1, wherein the original data set is from a malicious sample provided by a Kaggle platform, each piece of data of the data set is divided into an. Asm format and an. Bytes format, and a file in the. Bytes format is adopted as a sample of the original data set.
3. The malware detection method based on the residual neural network and combined with the transfer learning as claimed in claim 1, wherein the normalizing process is performed on the binary file of the malicious code, and the method comprises:
and processing the generated original byte matrix of the malicious code by adopting a min-max normalization method.
4. The method for detecting malicious software based on a residual neural network and combined with transfer learning according to claim 1, wherein converting the malicious code binary file into a grayscale image comprises: for a given malicious code binary executable file, each 8 bits of the malicious code binary executable file is divided into a subsequence, each subsequence is converted into two hexadecimal numbers through binary conversion, the subsequences are corresponding to pixel intervals [0,255], and the subsequences are arranged in sequence.
5. The malware detection method based on the residual neural network and combined with the transfer learning as claimed in claim 1, wherein the expanding of the obtained gray image comprises:
and (3) rotating, cutting, vertically turning, horizontally turning and adjusting the contrast of the obtained gray image.
6. The method for detecting malicious software based on a residual neural network and combined with transfer learning according to claim 1, wherein the pre-built residual neural network is a pre-trained ResNet50 model of ImageNet, and the model is composed of a plurality of basic residual blocks, and one basic residual block comprises a convolution layer, a ReLU activation function, batch normalization, downsampling, a global average pooling layer and a full connection layer.
7. The method for detecting malware based on a residual neural network and combined with migration learning according to claim 6, wherein the detection model is based on a ResNet50, then a Flatten layer and five Dense layers are set, a Dropout is added between the third and fourth Dense layers, a Dropout is added between the fourth and fifth Dense layers, and ReLU and Softmax functions are used as activation functions.
8. Malicious software detection device based on residual neural network and combining migration study, which is characterized by comprising:
the data acquisition module is used for acquiring an original data set;
the data enhancement module is used for expanding the original data set by adopting a data enhancement method, and specifically comprises the following steps: carrying out normalization processing on the malicious code binary file, converting the malicious code binary file into a gray image, and expanding the obtained gray image;
the data set dividing module constructs a sample data set according to the gray level image in the expanded data set and divides a training set and a testing set;
the model construction module is used for adopting a pre-constructed residual neural network as a basic model to obtain partial parameters, migrating the partial parameters to the identification process of the malicious code gray level image, constructing a flat layer and a full-connection layer to form a detection model, inputting a training set into the detection model for training, and finely adjusting the parameters of the detection model according to the output data;
and the detection module is used for detecting the sample to be detected by utilizing the fine-tuned detection model.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when executed, implements the method of any one of claims 1 to 7.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 7 when the program is executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311397538.6A CN117521063A (en) | 2023-10-25 | 2023-10-25 | Malicious software detection method and device based on residual neural network and combined with transfer learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311397538.6A CN117521063A (en) | 2023-10-25 | 2023-10-25 | Malicious software detection method and device based on residual neural network and combined with transfer learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117521063A true CN117521063A (en) | 2024-02-06 |
Family
ID=89750364
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311397538.6A Pending CN117521063A (en) | 2023-10-25 | 2023-10-25 | Malicious software detection method and device based on residual neural network and combined with transfer learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117521063A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118094550A (en) * | 2024-04-23 | 2024-05-28 | 山东省计算中心(国家超级计算济南中心) | Dynamic malicious software detection method based on Bert and supervised contrast learning |
-
2023
- 2023-10-25 CN CN202311397538.6A patent/CN117521063A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118094550A (en) * | 2024-04-23 | 2024-05-28 | 山东省计算中心(国家超级计算济南中心) | Dynamic malicious software detection method based on Bert and supervised contrast learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Murdoch et al. | Interpretable machine learning: definitions, methods, and applications | |
Albahra et al. | Artificial intelligence and machine learning overview in pathology & laboratory medicine: A general review of data preprocessing and basic supervised concepts | |
US12014257B2 (en) | Domain specific language for generation of recurrent neural network architectures | |
US11645541B2 (en) | Machine learning model interpretation | |
CN109408389B (en) | Code defect detection method and device based on deep learning | |
US20190080253A1 (en) | Analytic system for graphical interpretability of and improvement of machine learning models | |
US20180268296A1 (en) | Machine learning-based network model building method and apparatus | |
CN113692594A (en) | Fairness improvement through reinforcement learning | |
CN113779272B (en) | Knowledge graph-based data processing method, device, equipment and storage medium | |
US20200285899A1 (en) | Computer Model Machine Learning Based on Correlations of Training Data with Performance Trends | |
RU2689818C1 (en) | Method of interpreting artificial neural networks | |
EP3916597B1 (en) | Detecting malware with deep generative models | |
US20220366040A1 (en) | Deep learning based detection of malicious shell scripts | |
JP6172317B2 (en) | Method and apparatus for mixed model selection | |
CN114913923A (en) | Cell type identification method aiming at open sequencing data of single cell chromatin | |
Suleman et al. | Google play store app ranking prediction using machine learning algorithm | |
CN117521063A (en) | Malicious software detection method and device based on residual neural network and combined with transfer learning | |
CN112749737A (en) | Image classification method and device, electronic equipment and storage medium | |
CN118151020B (en) | Method and system for detecting safety performance of battery | |
CN112698977B (en) | Method, device, equipment and medium for positioning server fault | |
CN118829990A (en) | Large-scale architectural searching in a graph neural network via synthetic data | |
Rahul et al. | Deep auto encoder based on a transient search capsule network for student performance prediction | |
CN113159419A (en) | Group feature portrait analysis method, device and equipment and readable storage medium | |
CN117437507A (en) | Prejudice evaluation method for evaluating image recognition model | |
WO2020167156A1 (en) | Method for debugging a trained recurrent neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |