CN102622536B

CN102622536B - Method for catching malicious codes

Info

Publication number: CN102622536B
Application number: CN201110029135.7A
Authority: CN
Inventors: 杨轶; 冯登国; 苏璞睿; 应凌云
Original assignee: Institute of Software of CAS
Current assignee: Institute of Software of CAS
Priority date: 2011-01-26
Filing date: 2011-01-26
Publication date: 2014-09-03
Anticipated expiration: 2031-01-26
Also published as: CN102622536A

Abstract

The invention discloses a method for capturing malicious codes, which belongs to the technical field of network security. The method is: 1) configure the hardware emulator, load and start the target operating system; 2) the hardware emulator reads the virtual memory of the target operating system, identifies all processes and the export table in the dynamic library loaded by the process, and obtains the Export the addresses of all APIs in the table and intercept the network receiving API function; 3) According to the return value of the network receiving API function, mark the data packet received by the process from the network as a tainted data packet; 4) The hardware simulator disassembles the current process execution 5) Determine whether abnormal behavior occurs in the state of the current process during the taint propagation process, if abnormal behavior occurs, determine that the current process is malicious code, and extract the malicious code image from the memory of the target operating system . The invention realizes completely transparent analysis of malicious codes, and has higher efficiency and accuracy.

Description

A Malicious Code Capturing Method

技术领域 technical field

本发明属于网络安全技术领域，具体涉及一种基于硬件模拟器和污点传播的恶意代码捕获方法。The invention belongs to the technical field of network security, and in particular relates to a malicious code capture method based on a hardware simulator and stain propagation.

背景技术 Background technique

随着社会的不断发展和进步，计算机在社会各个领域的应用越来越广泛。由于软件漏洞的广泛存在和用户安全意识的不足，木马的传播速度越来越快，感染范围不断扩大，造成的破坏日益严重。传统的恶意代码捕获和分析手段由于受分析效率和用户技术水平的限制，响应周期难以缩短，响应速度已经逐渐不能适应这种新情况。因此，提高恶意代码的捕获和分析能力显得十分必要。With the continuous development and progress of society, computers are more and more widely used in various fields of society. Due to the widespread existence of software vulnerabilities and the lack of security awareness of users, Trojan horses spread faster and faster, the scope of infection continues to expand, and the damage caused is becoming more and more serious. Due to the limitations of analysis efficiency and user technical level, the traditional malicious code capture and analysis methods are difficult to shorten the response cycle, and the response speed has gradually been unable to adapt to this new situation. Therefore, it is necessary to improve the ability to capture and analyze malicious code.

现有的恶意代码捕获工具，如360云安全平台、金山云安全平台等，必须对操作系统进行修改，如Hook系统函数，或者通过PsSetCreateProcessNotifyRoutine注册回调函数，才能实现相应的捕获功能。而由于对操作系统做修改，本身会引起完整性问题，因此被修改的数据补丁或者注册的函数很容易被恶意代码发现，并产生相应的对抗手段。同时由于当前的恶意代码捕获平台，都是在跟恶意代码同一操作系统上运行，在对系统的控制权上和恶意代码产生竞争关系，不利于准确而稳定的实现恶意代码捕获。Existing malicious code capture tools, such as 360 Cloud Security Platform, Kingsoft Cloud Security Platform, etc., must modify the operating system, such as Hook system functions, or register callback functions through PsSetCreateProcessNotifyRoutine to achieve corresponding capture functions. Since the modification of the operating system itself will cause integrity problems, the modified data patches or registered functions are easy to be discovered by malicious code, and corresponding countermeasures are generated. At the same time, because the current malicious code capture platform runs on the same operating system as the malicious code, it competes with the malicious code for control of the system, which is not conducive to accurate and stable malicious code capture.

当前的恶意代码捕获技术，通常使用如下的几种方法：The current malicious code capture technology usually uses the following methods:

1.系统完整性校验1. System integrity check

系统完整性校验方法首先在干净的系统上创建系统快照或记录系统中文件散列数据的记录，在系统运行时随机或在特定行为执行后，将记录的系统快照或文件散列值和当前文件的相比较，当比较结果出现差异时，认为感染了恶意代码并提取恶意代码的样本。The system integrity verification method first creates a system snapshot or a record of file hash data in the system on a clean system. When the system is running randomly or after a specific behavior is executed, the recorded system snapshot or file hash value is compared with the current Comparing the files, when there is a difference in the comparison results, it is considered that the malicious code is infected and a sample of the malicious code is extracted.

2.启发式检测，提取违规行为2. Heuristic detection, extraction of violations

启发式检测的方法是通过定义正常操作，以其为基准比较代码执行流程中所产生的行为，度量当前恶意代码行为和正常操作之间的差异，当行为偏离正常操作较远时，认为是恶意行为，提取执行该行为的恶意代码样本。The method of heuristic detection is to define the normal operation, compare the behavior generated in the code execution process with it as a benchmark, and measure the difference between the current malicious code behavior and the normal operation. When the behavior deviates far from the normal operation, it is considered malicious Behavior, extract malicious code samples that perform the behavior.

目前分析隐藏进程中恶意代码的虚拟机调试分析方法，应用VMware、VirtualPC等虚拟机系统实现。虚拟机系统将虚拟指令直接交给本地的真实CPU执行，同时自身存在后门。隐藏进程中的恶意代码可通过检查代码执行时间，或者调用虚拟机后门功能的方法判别自己在一个虚拟系统上运行，采取操作隐藏真实功能。Currently, the virtual machine debugging analysis method for analyzing malicious codes in hidden processes is implemented by using virtual machine systems such as VMware and VirtualPC. The virtual machine system directly sends the virtual instructions to the local real CPU for execution, and at the same time, it has a backdoor. The malicious code in the hidden process can judge that it is running on a virtual system by checking the code execution time or calling the backdoor function of the virtual machine, and take actions to hide the real function.

综上，目前恶意代码样本提取的主要缺陷在于：隐藏进程和恶意代码处于同一层次上，容易被恶意代码检测并产生相应的对抗手段；仅能够提取恶意代码样本，而对其行为和攻击过程的分析不足。To sum up, the main drawbacks of current malicious code sample extraction are: the hidden process and malicious code are at the same level, which is easy to be detected by malicious code and generate corresponding countermeasures; only malicious code samples can be extracted, but the behavior and attack process Insufficient analysis.

发明内容 Contents of the invention

针对现有技术中存在的技术问题，本发明的目的在于提供一种基于硬件模拟器的恶意代码捕获方法，通过构建恶意代码运行环境，操纵和控制模拟CPU指令和各种模拟硬件的访问操作，硬件模拟器中的数据采集模块收集系统中所有进程的信息，以CR3为标志，分析进程执行过程；监控所有进程的运行过程，从虚拟内存中直接提取恶意代码镜像，分析攻击数据和攻击过程的相互关系，提取恶意代码执行的行为特征和攻击数据特征。For the technical problems existing in the prior art, the object of the present invention is to provide a kind of malicious code capturing method based on hardware simulator, by constructing malicious code operating environment, manipulate and control the access operation of simulated CPU instruction and various simulated hardware, The data acquisition module in the hardware simulator collects information of all processes in the system, uses CR3 as a symbol, and analyzes the process execution process; monitors the running process of all processes, directly extracts the malicious code image from the virtual memory, and analyzes the attack data and attack process Interrelationships to extract behavioral features of malicious code execution and attack data features.

本发明的技术方案为：Technical scheme of the present invention is:

一种恶意代码捕获方法，其步骤为：A malicious code capture method, the steps are:

1)配置硬件模拟器，硬件模拟器加载并启动目标操作系统；1) configure the hardware emulator, the hardware emulator loads and starts the target operating system;

2)硬件模拟器读取目标操作系统的虚拟内存，识别当前系统中执行的所有进程及进程所加载的动态库中的导出表，获取所述导出表中所有API的地址并拦截网络接收API函数；2) The hardware simulator reads the virtual memory of the target operating system, identifies all processes executed in the current system and the export table in the dynamic library loaded by the process, obtains the addresses of all APIs in the export table and intercepts the network receiving API function ;

3)根据网络接收API函数的返回值，将进程由网络接收到的数据包标记为污点数据包；3) According to the return value of the network receiving API function, the data packet received by the network is marked as a tainted data packet by the process;

4)硬件模拟器反汇编当前进程执行的指令，如果该指令的源操作数偏移地址和长度在污点数据包范围内，则对该指令进行污点传播计算，如果该指令为API函数调用指令，则根据API地址获取该API名称并查询其传入参数的偏移和长度是否属于污点数据范围，如果属于污点数据范围则对该API调用进行污点传播计算；4) The hardware emulator disassembles the instruction executed by the current process. If the source operand offset address and length of the instruction are within the range of the tainted data packet, the taint propagation calculation is performed on the instruction. If the instruction is an API function call instruction, Then obtain the API name according to the API address and query whether the offset and length of the incoming parameters belong to the taint data range, and if it belongs to the taint data range, perform taint propagation calculation on the API call;

5)判定污点传播过程中当前进程的状态是否发生异常行为，如果发生异常行为则判定当前进程为恶意代码，并从目标操作系统的内存中提取恶意代码镜像。5) Determine whether abnormal behavior occurs in the state of the current process during the taint propagation process. If abnormal behavior occurs, it is determined that the current process is malicious code, and the malicious code image is extracted from the memory of the target operating system.

进一步的，所述配置硬件模拟器包括：配置硬件模拟器的模拟内存大小、模拟CPU的类型、虚拟硬盘。Further, the configuring the hardware simulator includes: configuring the simulated memory size, the type of the simulated CPU, and the virtual hard disk of the hardware simulator.

进一步的，配置硬件模拟器的虚拟硬盘的方法为：采用线性寻址的方法创建虚拟镜像文件，将所创建虚拟镜像文件作为虚拟的硬盘。Further, the method for configuring the virtual hard disk of the hardware emulator is: using linear addressing to create a virtual image file, and using the created virtual image file as a virtual hard disk.

进一步的，获取所述导出表中所有API的地址的方法为：比较所述动态库的导出表中的名称与API表中的名称，获取所述导出表中所有API的地址。Further, the method for obtaining the addresses of all APIs in the export table is: comparing the names in the export table of the dynamic library with the names in the API table, and obtaining the addresses of all APIs in the export table.

进一步的，所述污点传播计算的方法为：根据当前进程执行的指令，判定污点数据包所产生的污点数据所影响的变量和寄存器。Further, the taint propagation calculation method is: according to the instructions executed by the current process, determine the variables and registers affected by the taint data generated by the taint data packet.

进一步的，所述异常行为包括：当前进程对应堆栈中的函数返回地址被污点数据包或者由污点数据包产生的污点数据覆盖、当前进程执行的API调用序列与预定义的异常行为序列匹配。Further, the abnormal behavior includes: the return address of the function in the corresponding stack of the current process is covered by the tainted data packet or the tainted data generated by the tainted data packet, and the API call sequence executed by the current process matches the predefined abnormal behavior sequence.

进一步的，判定当前进程的状态是否发生异常行为的方法为：Further, the method for determining whether abnormal behavior occurs in the state of the current process is as follows:

1)设置一异常行为序列，硬件模拟器读取并在内存中维护一个异常行为序列的单链表数据结构；1) An abnormal behavior sequence is set, and the hardware simulator reads and maintains a single-linked list data structure of an abnormal behavior sequence in memory;

2)在当前进程的对应结构中创建污点记录；同时建立污点行为列表，记录当前进程对污点数据包或由污点数据包产生的污点数据进行操作的指令和API调用；2) Create a taint record in the corresponding structure of the current process; at the same time, establish a taint behavior list to record the instructions and API calls that the current process operates on the taint data packet or the taint data generated by the taint data packet;

3)从当前进程网络数据接收函数返回的时刻开始，分析该进程后续执行的每一条指令：3) From the moment when the network data receiving function of the current process returns, analyze each instruction executed subsequently by the process:

如果该指令操作了污点数据，则将当前指令得到的结果作为由污点数据包产生的污点加入到所述污点记录中，并将该指令加入污点行为列表；如果是API调用，并且该API调用的参数含有污点数据包或由污点数据包产生的污点数据，则将该API调用加入污点行为列表；If the instruction manipulates tainted data, add the result obtained by the current instruction as the taint generated by the tainted data packet to the tainted record, and add the instruction to the tainted behavior list; if it is an API call, and the API call If the parameter contains a tainted data packet or tainted data generated by a tainted data packet, add the API call to the tainted behavior list;

4)判定污点数据操作的目标地址是否覆盖了函数的返回地址，如果覆盖则判定当前进程的状态发生异常行为；根据污点行为列表判定进程执行的API调用与内存中的异常行为序列是否匹配，如果匹配，则判定当前进程的状态发生异常行为。4) Determine whether the target address of the tainted data operation covers the return address of the function. If it is covered, it is determined that abnormal behavior occurs in the state of the current process; according to the tainted behavior list, determine whether the API call executed by the process matches the abnormal behavior sequence in the memory. If match, it is determined that the state of the current process has abnormal behavior.

进一步的，提取所述恶意代码镜像的方法为：对污点行为列表进行回溯，根据异常行为进程所调用的API和执行的指令，找到该进程的行为序列依赖的污点数据，即为捕获到的恶意代码；同时提取攻击过程执行的数据在整个污点数据包中的偏移、长度和内容。Further, the method of extracting the malicious code image is: backtracking the tainted behavior list, and finding the tainted data that the behavior sequence of the process depends on according to the API called by the abnormal behavior process and the executed instructions, which is the captured malicious code image. Code; at the same time extract the offset, length and content of the data executed by the attack process in the entire tainted data packet.

进一步的，所述异常行为序列为一系列连续的API操作。Further, the abnormal behavior sequence is a series of continuous API operations.

一种恶意代码捕获方法，其使用方法如下：A malicious code capture method, its usage is as follows:

1.配置镜像路径、硬件模拟器的模拟内存大小及模拟CPU的类型；硬件模拟器加载操作系统镜像以启动目标系统；1. Configure the image path, the simulated memory size of the hardware emulator and the type of simulated CPU; the hardware emulator loads the operating system image to start the target system;

2.利用硬件模拟器读取当前系统内存，解析系统内存数据，识别当前系统中执行的所有进程及进程所加载的动态库中的导出表；比较所述动态库的导出表中的名称与API表中的名称，获取所述导出表中所有API的地址并拦截预定义的网络的API函数WSARecv、recv和RecvFrom(即网络接收API)。2. Use the hardware simulator to read the current system memory, analyze the system memory data, identify all processes executed in the current system and the export table in the dynamic library loaded by the process; compare the name and API in the export table of the dynamic library name in the table, obtain the addresses of all APIs in the export table and intercept the API functions WSARecv, recv and RecvFrom of the predefined network (that is, the network receiving API).

3.根据第2步中拦截的网络接收API函数的返回值，获取进程由网络接收到的数据包，标记该数据包为污点数据。3. According to the return value of the network receiving API function intercepted in the second step, obtain the data packet received by the process from the network, and mark the data packet as tainted data.

4.硬件模拟器在当前进程执行过程中反汇编并分析当前进程执行的指令，如果当前进程指令的源操作数的地址和偏移属于第2步中生成的污点源数据，则需要对该指令进行污点传播计算。如果该指令为API函数调用指令，则根据API地址获取该API名称并查询其传入参数的地址和偏移是否属于污点数据，如果属于污点数据则对该API调用进行污点传播计算；污点传播运算，即根据当前进程执行的指令，判定污点数据包以及该数据包所产生的污点数据所影响的变量和寄存器。为了进行传播运算，我们解析了CPU指令的操作码、源操作数和目的操作数对于每一条指令，如果其源操作数引用了污点数据，则其目的操作数也被标记为污点数据。进行污点传播的目的，是为了更加准确的提取在进程执行过程中对网络输入数据的处理过程。在污点传播过程中，判定当前进程的状态，如：当前进程对应堆栈中的返回地址是否被污点数据所覆盖，是否执行了连续的API调用序列实现数据转发的行为等。4. The hardware simulator disassembles and analyzes the instructions executed by the current process during the execution of the current process. If the address and offset of the source operand of the current process instruction belong to the taint source data generated in step 2, the instruction needs to be Perform taint propagation calculations. If the instruction is an API function call instruction, obtain the API name according to the API address and query whether the address and offset of the incoming parameters belong to taint data, and if it belongs to taint data, perform taint propagation calculation on the API call; taint propagation operation , that is, determine the tainted data packet and the variables and registers affected by the tainted data generated by the data packet according to the instructions executed by the current process. In order to perform propagation operations, we parse the opcode, source operand, and destination operand of CPU instructions. For each instruction, if its source operand references tainted data, its destination operand is also marked as tainted data. The purpose of taint propagation is to more accurately extract the processing of network input data during process execution. In the process of taint propagation, determine the state of the current process, such as: whether the return address in the corresponding stack of the current process is covered by taint data, whether a continuous API call sequence is executed to achieve data forwarding, etc.

5.定义异常行为：在程序执行过程中判定当前进程对应堆栈中的函数返回地址是否被污点数据包或者由污点数据包产生的污点数据覆盖，或当前进程执行的API调用序列与预定义的异常行为序列匹配；通过上述两种方法判定异常行为。如果发现异常行为，则开始回溯整个污点传播计算过程，判定污点数据中不同的数据范围和其对该攻击过程所起到的作用，获得其在污点源中的偏移、长度、内容和作用信息并以日志文件的格式输出。5. Define abnormal behavior: During program execution, determine whether the function return address in the corresponding stack of the current process is overwritten by the tainted data packet or the tainted data generated by the tainted data packet, or whether the API call sequence executed by the current process is consistent with a predefined exception Behavior sequence matching; abnormal behavior is judged by the above two methods. If abnormal behavior is found, start to trace back the entire taint propagation calculation process, determine the different data ranges in the taint data and their effects on the attack process, and obtain their offset, length, content and role information in the taint source And output in log file format.

进一步，在上述步骤4中检测到进程状态的非法改变，则自动提取样本。其过程为：硬件模拟器中集成的监控程序在监控进程的执行指令时，根据获取到的进程EPROCESS结构获取进程加载地址，以该地址为起点读取物理内存中的代码，分析进程可执行文件的PE结构，确定需要读取的文件在内存中的范围；并根据内存页表找到相应的内存页在虚拟物理内存的位置，依据获取的地址偏移和在内存中的文件长度，一次性读出代码镜像。Further, if an illegal change of the process state is detected in the above step 4, the sample is automatically extracted. The process is: when the monitoring program integrated in the hardware simulator monitors the execution instructions of the process, it obtains the process loading address according to the obtained process EPROCESS structure, reads the code in the physical memory from this address, and analyzes the process executable file According to the PE structure, determine the range of the file to be read in the memory; and find the location of the corresponding memory page in the virtual physical memory according to the memory page table, and read it at one time according to the obtained address offset and the file length in the memory Mirror the code.

与现有技术相比，本发明的优点和积极效果如下：Compared with prior art, advantage and positive effect of the present invention are as follows:

1.本发明由于数据采集通过硬件模拟技术实现，而不是将恶意代码放在真实的CPU上执行，恶意代码无法感知自身是否运行在虚拟环境中，也无法分辨自身是否被跟踪，从而实现对恶意代码完全透明的分析。1. In the present invention, since the data acquisition is realized by hardware simulation technology, instead of putting malicious codes on the real CPU for execution, malicious codes cannot perceive whether they are running in a virtual environment, nor can they tell whether they are being tracked, so as to realize the detection of malicious codes. Code fully transparent analysis.

2.本发明的模拟硬件设备的所有虚拟CPU的指令和各种硬件操作都在翻译之后模拟执行，而不是直接使用代码片段在真实机器上执行，可在指令运行过程中精确计算该条指令运行的时间，从而保证了虚拟环境的透明性。2. All virtual CPU instructions and various hardware operations of the simulated hardware device of the present invention are simulated and executed after translation, instead of directly using code fragments to execute on a real machine, and the instruction can be accurately calculated during the operation of the instruction. time, thereby ensuring the transparency of the virtual environment.

3.本发明基于污点传播对恶意代码进行分析，通过识别污点数据的运算过程来判定恶意操作和攻击行为，具有更高的效率和准确性。3. The present invention analyzes malicious codes based on taint propagation, and judges malicious operations and attack behaviors by identifying the operation process of tainted data, which has higher efficiency and accuracy.

附图说明 Description of drawings

图1基于硬件模拟器和污点传播的恶意代码捕获方法示意图。Figure 1 is a schematic diagram of a malicious code capture method based on hardware simulator and taint propagation.

图2基于硬件模拟器和污点传播的恶意代码捕获方法流程图。Figure 2 is a flowchart of a malicious code capture method based on hardware simulator and taint propagation.

具体实施方式 Detailed ways

下面结合附图详细说明本发明的技术方案：The technical scheme of the present invention is described in detail below in conjunction with accompanying drawing:

如图1所示，一种基于硬件模拟器和污点传播的恶意代码捕获方法及系统，包括步骤：As shown in Figure 1, a method and system for capturing malicious code based on hardware simulator and taint propagation, including steps:

1、创建目标文件运行所需的操作系统镜像1. Create the operating system image required for the target file to run

本发明采用线性寻址的方法，创建虚拟镜像文件，该文件作为虚拟的硬盘使用，以其为基础在虚拟化分析平台上安装操作系统。The invention adopts a linear addressing method to create a virtual image file, which is used as a virtual hard disk, and an operating system is installed on a virtual analysis platform based on the file.

2、配置并启动硬件模拟器2. Configure and start the hardware emulator

配置操作系统的镜像路径，获取实际运行的操作系统镜像所在位置；配置硬件模拟器的模拟物理内存大小、系统启动时间及模拟CPU的类型，硬件模拟器根据输入的内存大小分配相应大小的内存空间，作为模拟的物理内存；根据输入的系统启动时间设定模拟器的系统时钟；根据模拟CPU的类型选择相应的指令译码引擎，实现源指令和目标指令的翻译执行。在完成上述操作后，硬件模拟器读取操作系统镜像中的引导代码，将EIP跳转到该引导代码处启动该操作系统。Configure the image path of the operating system to obtain the location of the actual running operating system image; configure the simulated physical memory size, system startup time and simulated CPU type of the hardware emulator, and the hardware emulator allocates a corresponding amount of memory space according to the input memory size , as the simulated physical memory; set the system clock of the simulator according to the input system startup time; select the corresponding instruction decoding engine according to the type of the simulated CPU, and realize the translation and execution of the source instruction and the target instruction. After the above operations are completed, the hardware emulator reads the boot code in the operating system image, and jumps the EIP to the boot code to start the operating system.

其中，本发明的虚拟内存通过在真实机器上直接申请相应大小的内存进行模拟。配置模拟内存的大小是虚拟操作系统运行的基础，模拟内存设置越大，则虚拟操作系统运行越快。本实施例给出模拟内存的大小配置在216M～1G之间。Wherein, the virtual memory of the present invention is simulated by directly applying for memory of a corresponding size on a real machine. Configuring the size of the simulated memory is the basis for running the virtual operating system. The larger the simulated memory setting, the faster the running of the virtual operating system. This embodiment provides that the size of the simulated memory is configured between 216M and 1G.

本发明定义当前模拟CPU的类型，是通过硬件模拟器的译码模块获得，使得模拟CPU的指令转化为本地CPU的指令再运行，在虚拟机上运转的操作系统能够正确的执行指令，本发明可以模拟多种CPU。例如：若当前的镜像是从一台P4的机器上读取出来，则本发明需要将硬件模拟器模拟CPU的类型配置为P4，而不能是ARM或者MIPS等其他类型CPU，否则该操作系统无法正确运行。若真实CPU是Intel P4，而本发明硬件模拟器模拟的CPU是ARM，则需利用译码模块将ARM的指令转化为一条或者多条Intel P4的指令。The invention defines the type of the current simulated CPU, which is obtained through the decoding module of the hardware simulator, so that the instructions of the simulated CPU are converted into instructions of the local CPU and then run, and the operating system running on the virtual machine can correctly execute the instructions. The present invention Various CPUs can be simulated. For example: if the current mirror image is read from a P4 machine, then the present invention needs to configure the hardware emulator to emulate the CPU type as P4, instead of other types of CPUs such as ARM or MIPS, otherwise the operating system cannot run correctly. If real CPU is Intel P4, and the CPU simulated by hardware emulator of the present invention is ARM, then need utilize decoding module to convert the instruction of ARM into the instruction of one or more Intel P4.

3、硬件模拟器执行目标操作系统的指令，识别进程的系统调用和执行的指令3. The hardware simulator executes the instructions of the target operating system, and identifies the system calls and executed instructions of the process

应用层的程序通过API来访问操作系统。本实施例使用地址比较的方法获取系统调用。在进程被调度执行之前，此时进程的代码还没有执行，但是自身的可执行文件和进程需要的动态库都已经被映射进内存。故本发明在进程加载之后，代码执行之前，通过硬件模拟器的内存管理模块机，读取进程的内存，并分析进程加载的动态库中的导出表，导出表包括API名称和API地址，本发明通过采用字符比较的方法，比较导出表中API名称与API表中的名称，获取导出表中所有API的地址，将所有API地址加入到API表，所述API表包括API名称、API地址及API参数和返回值。隐藏进程执行中，将隐藏进程的EIP值与API表中函数地址的参数逐一做匹配比较。Programs in the application layer access the operating system through APIs. In this embodiment, the method of address comparison is used to obtain the system call. Before the process is scheduled for execution, the code of the process has not been executed at this time, but its own executable file and the dynamic library required by the process have been mapped into the memory. Therefore, after the process is loaded and before the code is executed, the present invention reads the memory of the process through the memory management module machine of the hardware simulator, and analyzes the export table in the dynamic library loaded by the process. The export table includes the API name and the API address. The invention compares the API name in the export table with the name in the API table by adopting a character comparison method, obtains the addresses of all APIs in the export table, and adds all API addresses to the API table, and the API table includes API names, API addresses and API parameters and return values. During the execution of the hidden process, the EIP value of the hidden process is matched and compared with the parameters of the function address in the API table one by one.

若EIP值与API表中每个函数的第一条指令相匹配，则调用解析函数，读取堆栈和当前CPU中的通用寄存器，获取函数参数和返回地址，当函数执行到返回地址时读取eax寄存器获取函数返回值；硬件模拟器中数据采集模块记录该指令及该指令执行的数据，其中，指令执行的数据包括该指令打开的文件、打开的端口、通过某端口发送的数据、访问的文件、创建的进程和线程、创建或终止的服务、创建或使用的操作系统同步/互斥量、网络数据发送操作的内容，文件创建操作的文件名等信息。If the EIP value matches the first instruction of each function in the API table, call the parsing function, read the general-purpose registers in the stack and the current CPU, get the function parameters and return address, and read when the function executes to the return address The eax register obtains the return value of the function; the data acquisition module in the hardware simulator records the instruction and the data executed by the instruction. The data executed by the instruction includes the file opened by the instruction, the port opened, the data sent through a certain port, and the accessed data. Files, created processes and threads, created or terminated services, created or used operating system synchronization/mutex, content of network data sending operations, file names of file creation operations, etc.

4、判定程序执行异常和非法行为4. Judgment of abnormal program execution and illegal behavior

对进程执行过程进行污点传播分析，手工设定网络污点源标记规则，所有网络接收的数据包被标记为污点源，该过程由硬件模拟器根据规则自动进行。程序执行异常和非法行为的判定过程如下：The taint propagation analysis is carried out on the process execution process, and the network taint source marking rules are manually set, and all data packets received by the network are marked as taint sources. This process is automatically carried out by the hardware simulator according to the rules. The judgment process of abnormal program execution and illegal behavior is as follows:

1)为了识别进程的恶意操作，我们首先设置了异常行为序列，该异常行为序列表示一系列连续的API操作，该序列的集合存储在一个配置文件中，由硬件模拟器读取并在内存中维护一个异常行为序列的单链表数据结构。1) In order to identify the malicious operation of the process, we first set the abnormal behavior sequence, which represents a series of continuous API operations, and the collection of the sequence is stored in a configuration file, which is read by the hardware simulator and stored in the memory Maintain a singly linked list data structure of a sequence of abnormal behaviors.

2)当目标系统中的某个进程发生网络数据接收时，标记其获取的网络数据包为污点数据包，在该进程的对应结构中创建对应的污点记录，同时建立污点行为列表记录对污点数据包或由污点数据包产生的污点数据进行操作的指令和API调用。2) When a certain process in the target system receives network data, mark the acquired network data packet as a tainted data packet, create a corresponding tainted record in the corresponding structure of the process, and create a tainted behavior list to record the tainted data Instructions and API calls that operate on packets or tainted data generated by tainted packets.

3)从网络数据接收函数返回的时刻开始，分析该进程后续执行的每一条指令，如果该指令操作了污点数据，则将当前指令得到的结果作为由污点数据包产生的污点加入到污点记录中，并将该指令加入污点行为列表；如果是API调用，并且该API调用的参数含有污点数据包或由污点数据包产生的污点数据，同样需要将该API调用加入污点行为列表。在进程执行的同时，判定污点数据操作的目标地址，是否覆盖了函数的返回地址；同时将进程执行的API调用和内存中的异常行为序列进行匹配。对于异常行为序列的匹配方法为：如果当前发生的API调用名称和参数都与异常链表中的第一个API相同，则使用相同的方法继续比较该序列中的后续记录；如果存在差异不同则查找其他预定义的异常API序列的第一个API，使用相同的方法与之匹配。如果污点传播路径可与预定义的异常操作序列相匹配，则可判定此时正在执行的进程为恶意代码。3) From the moment when the network data receiving function returns, analyze each instruction executed subsequently by the process. If the instruction manipulates tainted data, add the result obtained by the current instruction as the taint generated by the tainted data packet to the tainted record. , and add the instruction to the taint behavior list; if it is an API call, and the parameters of the API call contain taint data packets or taint data generated by taint data packets, the API call also needs to be added to the taint behavior list. While the process is executing, determine whether the target address of the tainted data operation covers the return address of the function; at the same time, match the API call executed by the process with the abnormal behavior sequence in the memory. The matching method for the abnormal behavior sequence is: if the name and parameters of the current API call are the same as the first API in the abnormal linked list, use the same method to continue comparing subsequent records in the sequence; if there is a difference, find Use the same method to match the first API of other predefined exception API sequences. If the taint propagation path can match the predefined abnormal operation sequence, it can be determined that the process being executed at this time is malicious code.

5、采集并分析数据5. Collect and analyze data

在判定恶意代码后，需要提取恶意代码样本和攻击数据的样本。恶意代码样本的提取通过直接由硬件模拟器加载的目标操作系统镜像中读取文件实现，攻击数据样本通过将该进程接收到的污点数据包转储实现。为了更好的提取恶意代码，本发明通过对分析过程中产生的污点行为列表进行回溯，根据发生异常行为的进程所调用的API和执行的指令，找到该进程的行为序列依赖的污点数据，提取所述恶意代码镜像，其中污点数据包括其在整个污点数据包中的偏移、长度和内容。数据分析模块接收并存储上述数据采集模块收集的数据，将该数据返回给用户。After determining the malicious code, samples of malicious code and attack data need to be extracted. The extraction of malicious code samples is realized by reading files directly from the target operating system image loaded by the hardware simulator, and the attack data samples are realized by dumping the tainted data packets received by the process. In order to better extract malicious codes, the present invention traces back to the list of tainted behaviors generated during the analysis process, and finds the tainted data that the behavior sequence of the process depends on according to the API called and the executed instructions of the process in which the abnormal behavior occurs, and extracts The malicious code image, wherein the taint data includes its offset, length and content in the whole taint data packet. The data analysis module receives and stores the data collected by the above data collection module, and returns the data to the user.

本发明提出的基于硬件模拟器和污点传播的恶意代码捕获方法，对于本领域的技术人员而言，可以根据需要自己配置各种环境信息，设计分析和捕获方法，从而全面分析隐藏进程中的恶意代码。The malicious code capture method based on the hardware simulator and taint propagation proposed by the present invention, for those skilled in the art, they can configure various environmental information according to their needs, design analysis and capture methods, so as to comprehensively analyze the malicious code in the hidden process code.

尽管为说明目的公开了本发明的具体实施例和附图，其目的在于帮助理解本发明的内容并据以实施，但是本领域的技术人员可以理解：在不脱离本发明及所附的权利要求的精神和范围内，各种替换、变化和修改都是可能的。因此，本发明不应局限于最佳实施例和附图所公开的内容，本发明要求保护的范围以权利要求书界定的范围为准。Although specific embodiments and drawings of the present invention are disclosed for the purpose of illustration, the purpose is to help understand the content of the present invention and implement it accordingly, but those skilled in the art can understand that: without departing from the present invention and the appended claims Various substitutions, changes and modifications are possible within the spirit and scope of . Therefore, the present invention should not be limited to the content disclosed in the preferred embodiments and drawings, and the protection scope of the present invention should be defined by the claims.

Claims

1. a malicious code catching method, the steps include:

1) configure hardware simulator, hardware simulator loads and starts destination OS;

2) hardware simulator reads the virtual memory of destination OS, and the derived table in the dynamic base that all processes of carrying out in identification current system and process load obtains the address of all API in described derived table and tackles network reception api function;

3) according to network, receive the rreturn value of api function, the packet marking that process is received by network is stain packet;

4) instruction that hardware simulator dis-assembling current process is carried out, if the source operand offset address of this instruction and length are within the scope of stain packet, this instruction is carried out to tainting calculating, if this instruction is api function call instruction, according to this API Name of API address acquisition and inquire about its skew of importing parameter into and whether length belongs to stain data area, if belong to stain data area, this API Calls is carried out to tainting calculating;

5) in judgement tainting process, whether the state of current process there is abnormal behaviour, if there is abnormal behaviour, judges that current process is as malicious code, and from the internal memory of destination OS, extracts malicious code mirror image; The method whether state of wherein, judging current process abnormal behaviour occurs as:

51) an abnormal behaviour sequence is set, hardware simulator reads and safeguards in internal memory the single linked list data structure of an abnormal behaviour sequence;

52) in the counter structure of current process, create stain record; Set up stain behavior list simultaneously, record instruction and API Calls that current process operates stain packet or the stain data that produced by stain packet;

53) the moment of returning from current process network data receiver function, analyze each instruction of the follow-up execution of this process: if this command operating stain data, the result present instruction being obtained joins in described stain record as the stain being produced by stain packet, and this instruction is added to stain behavior list; If API Calls, and the parameter of this API Calls is containing the stain data that have a stain packet or produced by stain packet, this API Calls is added to stain behavior list;

54) judge whether the destination address of stain data manipulation has covered the return address of function, if covered, judge the state generation abnormal behaviour of current process; Whether the API Calls of carrying out according to stain behavior list determinating processes mates with the abnormal behaviour sequence in internal memory, if mated, judges the state generation abnormal behaviour of current process.

2. the method for claim 1, is characterized in that described configure hardware simulator comprises: the emulated memory size of configure hardware simulator, type, the virtual hard disk of simulation CPU.

3. method as claimed in claim 2, is characterized in that the method for the virtual hard disk of configure hardware simulator is: adopt the method for linear addressing to create virtual image file, using created virtual image file as virtual hard disk.

4. method as claimed in claim 3, is characterized in that the method for obtaining the address of all API in described derived table is: the title in the title in the derived table of more described dynamic base and API table, the address that obtains all API in described derived table.

5. method as claimed in claim 1 or 2 or 3 or 4, is characterized in that the method that described tainting calculates is: the instruction of carrying out according to current process, judge variable and register that stain data that stain packet produces affect.

6. method as claimed in claim 5, is characterized in that described abnormal behaviour comprises: API Calls sequence and predefined abnormal behaviour sequences match that the function return address in current process corresponding stack is carried out by stain packet or the stain data cover being produced by stain packet, current process.

7. the method for claim 1, it is characterized in that the method for extracting described malicious code mirror image is: stain behavior list is recalled, the API calling according to abnormal behaviour process and the instruction of execution, the stain data that find the behavior sequence dependence of this process, are the malicious code capturing; Extract data that attack process carries out skew, length and the content in whole stain packet simultaneously.

8. the method for claim 1, is characterized in that described abnormal behaviour sequence is a series of continuous API operations.