Based on the binary program analytic system of process simulation
Technical field
What the present invention relates to is a kind of system of electronic data monitoring technique field, specifically a kind of binary program analytic system based on process simulation.
Background technology
In computer safety field, be the element task that program safety is analyzed to the conversed analysis of all kinds of software especially rogue program.Owing to having lacked relevant semantic information, the conversed analysis for binary program is often very difficult, needs to consume a large amount of manpower and materials.Therefore, in order to assistant analysis personnel carry out conversed analysis, corresponding automated procedures analytical approach and analysis platform also arise at the historic moment.
In order to realize the process analysis of robotization, needing instruction stream when running program, control flow check and data stream to monitor with carrying out fine granularity, needing the relevant informations such as the operating processor of acquisition program, internal memory simultaneously.When running program at present, the acquisition of information mainly adopts the technology such as the simulation of Process Debugging, total system and binary pitching pile to realize.But current analytical technology all exists some problem, wherein Process Debugging adopts the debugging API of operating system to realize, and the anti-debugging method conventional to current rogue program is often helpless; Total system analogue technique is simulated whole computer platform, and a large amount of instructions had nothing to do with analysis are most simulated time as operating system nucleus occupies, and causes analysis efficiency very low; And binary pitching pile is owing to changing instruction stream and the control flow check of program, for some programs through overprotection as added shell or the program after obscuring often cannot be analyzed.Therefore, current analytical plan usually cannot meet analysis requirement when in the face of current day by day complicated program.
Through finding the retrieval of prior art, Chinese patent literature CN101814053, publication date 2010-08-25, describe a kind of binary code leak discover method based on functional mode, first code function model is set up based on static conversed analysis system, and based on described code function Construction of A Model initial test case collection; Secondly, to be controlled according to coverage rate by dynamic test and review & analysis system and routing policy loads test use cases on dynamic test platform, and adopt dynamic route constrained optimization and constraint solving, carry out the adjustment of test use cases based on the traversal path algorithm in generation, and carry out abnormal explication de texte and leak location according to recovering and analysis; 3rd, static conversed analysis system and dynamic test and review & analysis system all will analyze the program attribute that obtains stored in functional mode separately, and instruct respective analytical test work with the program attribute in functional mode.But the defect of this technology compared with the present invention and deficiency are: first, this technology depends on static conversed analysis, and widely using of Software Protection Technique makes analyst often cannot carry out effective static analysis at present, it is more difficult for therefore from current a lot of programs especially rogue program, extracting staticaanalysis results; Second, the analytical approach of this technology only can be detected the abnormal operating condition of program and be analyzed possible leak, and cannot the potential malicious attack behavior of trace routine, especially current all kinds of attack methods are as ROP(Return-OrientedProgramming) emerge in an endless stream, this technology cannot to exist in program can the leak that utilizes by this kind of attack detect; 3rd, when current all kinds of dynamic code execution is widely used, as plug-in unit (Plugin), user's script (UserScripting), Just-In-Time (Just-in-timeCompilation) etc., this technology effectively cannot be analyzed the code of this kind of dynamic generation or loading, therefore further defines its analyst coverage.
Summary of the invention
The present invention is directed to prior art above shortcomings, a kind of binary program analytic system based on process simulation is provided, for the deficiencies in the prior art part, from the bottom of program and the rank of system hardware structure and operating system, program is carried out to the simulation of running environment, do not interfere the normal operation of program, and operational process such as the contents such as data flow of program are monitored.The present invention does not rely on the static conversed analysis to program, and adopts the method for complete performance analysis, can avoid the impact that most of program protection technology causes.By expanding analytic system and self-defined attack, the present invention can carry out detecting and tackling before attack code is performed; And by introducing the means such as dynamic stain analysis, can flow to sensitive data and analyzing and follow the tracks of, avoiding the leakage causing data and privacy.Further, the present invention does not rely on the staticaanalysis results of program, can carry out complete analysis to dynamic generating code.
The present invention is achieved by the following technical solutions, the present invention includes: simulator engine module, memory management module, process manager module, system call interfaces, thread management module, central processing module and the analytic unit interface that application programming interfaces are provided, wherein: simulator engine module connects memory management module respectively, process manager module, system call interfaces and analytic unit interface, transmit running state information and operating instruction respectively, management of process and thread scheduling information, system API Calls data, Debugging message and analytic unit event etc., control, coordinate modules and the degree of coupling reduced between disparate modules, process manager module is connected with system call interfaces with central processing module, memory management module respectively, transmission processor schedule information and the information such as the conversion of running state information, memory management data and system call parameter and encapsulation respectively, thread management module is connected with central processing module with process manager module, memory management module respectively, transmits thread running status and schedule information, the access of thread internal storage data and processor running status respectively,
Described simulator engine module provides unified cooperation control for each assembly, and drive each assembly operating to complete by the loading of simulation process, initialization, operation and removing, this simulator engine module comprises: driver element, operating system latch hook unit and debugging unit, wherein: driver element is connected with process manager module with memory management module, receive running state information, send operating instruction; Operating system latch hook unit is connected with process manager module, and the system API Calls of receiving process administration module also passes to underlying operating system, returns API Calls result; Debugging unit connected system calling interface and analytic unit interface carry out application program debugging;
Described memory management module comprises: virtual memory management unit, heap manager unit and stack administrative unit, wherein: virtual memory management unit to thread management module transmission internal storage access data, and is connected to transmit running state information with the driver element of simulator engine module; Heap manager unit receives heap manager instruction from process manager module, completes the management of heap memory in process; Stack administrative unit, from thread management module receiving thread running status, completes the management of stack internal memory in all threads.
Described virtual memory management unit adopts paging scheme to manage 4GB virtual memory; Simulate the virtual memory management behavior of Windows, the Memory Allocation of complete operation system level, recovery and access privilege control simultaneously.
The management of each thread stack internal memory in the management information of piling in the process that described heap manager unit and stack administrative unit are transmitted respectively, process, overall virtual memory page allocate and recycle etc.
The management information of piling in described process comprises: the foundation destruction of heap and Memory Allocation; Heap memory priority assignation; The adjustment of heap capacity and code reassignment etc.
Described process manager module comprises: threading scheduling management unit, state-driven unit and system API encapsulation unit, wherein: threading scheduling management unit is connected with central processing module the schedule information of all threads in receiving process, the scheduling of thread and establishment and destruction is completed; State-driven unit is connected with simulator engine module, and receive the operating instruction of simulator engine module, the operation completing main thread and other threads drives and transmits running state information; System API encapsulation unit is connected with system call interfaces and receives the system API Calls of thread management module, and encapsulation parameter also transfers to simulator engine module and calls.
Described process manager module is used for driving the complete execution flow process of process; Safeguard all threads that this process comprises and scheduling thereof; In process, system handle (Handle) is safeguarded and memory address distribution; And process PEB(ProcessEnvironmentBlock) etc. the establishment of data structure and maintenance.
Described thread management module comprises: environmental information analogue unit, drive thread unit and performance element, wherein: environmental information analogue unit is connected with management of process unit, receiving thread run time behaviour information and when setting up dry run environment for thread; Drive thread unit is connected with management of process unit, and receiving thread operating instruction also transmits thread running status; Performance element be connected with the thread state unit of central processing module for complete processor instruction circulate and at the end of terminate thread.
Described environmental information analog module set up the dry run for thread time environment comprise: entrance, parameter, zone bit, stack address and size and TEB(ThreadEnvironmentBlock);
Described thread management module safeguards the environmental information of single thread in simulation process, and the implementation of drive thread from entrance, judge end condition simultaneously and terminate thread; Simultaneously for loading and the initialization of other modules (DLL) by this thread dynamic load;
Described central processing module is built-in with the register and state cell and difference transmission processor schedule information and thread running status and schedule information that are connected with thread management module with process manager module respectively.
Described transmission processor schedule information and thread running status and schedule information comprise: processor zone bit (eflags), running state information; And interpretive simulation function is provided to x86 instruction set, x87FPU instruction set, MMX instruction set and SSE instruction set, thus realize the simulation of complete process device function.
Described central processing module is provided with exception handler, for transmitting processor abnormal information and abnormality processing result, and the cacoplastic environmental information of this exception handler execute exception process function.
Described analytic unit interface provides api interface for system, makes analyst can write analytic unit easily, completes automated procedures analysis;
Described simulator engine module connects loader, dis-assembling engine respectively and is connected, for carrying out the debugging interface of application program debugging with additional debug component, wherein: loader resolve process to be analyzed performed PE file and by analysis result by simulator engine module loading to memory management module, dis-assembling engine carries out dis-assembling to wall scroll x86 instruction, parses the information such as the operational code of instruction, source operand, destination operand; Debugging interface is used for transmitting debugging information, can carry out the debugging of application program and simulator self;
The performed PE file of described process to be analyzed refers to: the executable program in Windows exists with PE form, and simulator loads and resolves PE file and can be performed by simulation;
The present invention relates to a kind of process optimization method based on said system, comprise the following steps:
Step one, treat analysis process and carry out lightweight x86 instruction-set simulation, under being namely lost within two orders of magnitude prerequisite with program operational efficiency, carry out simulation process and the virtual memory environment of x86 instruction set,
Described step one specifically comprises:
1.1 use heuristic iterative disassembly algorithm to provide static dis-assembling information for every bar processor instruction;
Described heuristic iterative disassembly algorithm comprises the following steps:
1.1.1 the judgement of following steps is carried out for each bar instruction in PE file.
1.1.2 locate entry point instruction E, when this instruction E performs step 1.1.3 after effective instruction then carries out dis-assembling process to it, otherwise skip this instruction and re-execute step 1.1.2.
1.1.3 when the instruction E after dis-assembling process is jump instruction, then dis-assembling process is carried out to the jump target of instruction E.
1.1.4 add the length information of this instruction at the afterbody of instruction E and return next instruction of step 1.1.2 process, until return the instruction set S that dis-assembling process obtains after completing the process of all instructions.
The 1.2 execution flow processs utilizing each bar instruction of dis-assembling information simulation, comprise the value of register, internal storage data and zone bit.
1.3 extract internal storage access data, register value change information is used for process analysis.
Described internal storage access data comprise: memory address and internal storage data, register value, zone bit change information and abnormal information.
Step 2, simulate some operating system behaviors and ensure that process to be analyzed operates in controlled environment, specifically comprise:
2.1 process initialization stages loaded and initialization by the program simulated, and determined the internal memory distribution situation of each section (section) in each module in process, and determined entrance and end condition.
2.2 memory managements use the memory management mechanism of paging type, comprising: the virtual memory distribution in units of page, recovery and access privilege control.
The 2.3 thread management modules thread be used in maintaining multi-thread program is set up, is destroyed and thread scheduling, and multithread programs can be performed normally under the model of shared drive.
2.4 abnormality processing are used for the operating system exception handling when processor occurs abnormal, and the exception handler of program is operated in simulated environment.
Step 3, the operating system API Calls in process to be encapsulated, transfer to operating system directly to perform;
Described operating system API refers to: operating system is supplied to the application programming interfaces of consumer process, enables user program utilize operation system function.
The API Calls of all simulation processes of 3.1 interception, Core API is performed by the direct modeling of simulator engine module, and other API are sent to operating system and perform;
Described Core API comprises: memory management API, thread management API, debugging API, operating system parameter acquiring API etc.
3.2 complete Parameter Switch during API Calls, comprising: the mapping of emulated memory address and true address.
3.3 operating systems return simulator engine module after API is finished, processing execution result.
Step 4, for simulation implementation in dynamic operation time information application programming interfaces are provided, and the implementation of each for simulator assembly is encapsulated as event, there is provided application programming interfaces in the mode of event handling, finally make routine analyzer that these information can be utilized to carry out program optimization.
During described dynamic operation, information comprises: instruction stream, data stream and control flow check.
Described event comprises: instruction performs event, internal storage access event, operating system API Calls event, thread scheduling event.
Technique effect
1) simulation that analyzed program carries out instruction-level is performed, information when using lightweight x86 instruction set interpreter to provide fine-grained operation;
2) treat content irrelevant with process analysis in analysis process, the execution as system call adopts the mode of packaging or simulation, transfers to underlying operating system to perform, and ensures analysis efficiency;
3) simulate some operating system behaviors as memory management, thread management and abnormality processing etc., ensure that analyzed program operates in controlled environment;
4) for the automated analysis of program provides good interface, the simulation of calling program is performed and analyzes to carry out by synchronous high-efficiency.
Compared with current existing analytical plan, the present invention is hardly by the impact of anti-debugging method, do not revise presumptive instruction and the data of process to be analyzed simultaneously, operational efficiency improves one to two order of magnitude than total system simulation, and Simultaneous Stabilization and compatibility comparatively scale-of-two pitching pile are greatly improved.On the basis of this analytic system, efficiently can carry out the multiple automated analysis means such as the algorithm of program and protocal analysis, bug excavation and detection, performance analysis of program, internal memory debugging, program behavior analysis and rogue program detection, reliable support is provided to program safety analysis.
Accompanying drawing explanation
Fig. 1 is present system structural drawing;
Fig. 2 is the structural representation of process manager module;
Fig. 3 is the structural representation of thread management module;
Fig. 4 is operational flow diagram of the present invention.
Embodiment
Elaborate to embodiments of the invention below, the present embodiment is implemented under premised on technical solution of the present invention, give detailed embodiment and concrete operating process, but protection scope of the present invention is not limited to following embodiment.
Embodiment 1
With conventional multi-threaded network signal procedure curl.exe(
http:// curl.haxx.se) be example, concrete implementation process is described.
As shown in Figure 1, comprise: simulator engine module, memory management module, process manager module, system call interfaces, thread management module, central processing module and the analytic unit interface that application programming interfaces are provided, wherein: simulator engine module connects memory management module respectively, process manager module, system call interfaces and analytic unit interface, transmit internal storage access data respectively, thread scheduling and processor to access data, system call parameter and encapsulation thereof and simulator event and environmental information, control, coordinate modules and the degree of coupling reduced between disparate modules, process manager module is connected with system call interfaces with memory management module, central processing module respectively, transmit memory management data respectively as Memory Allocation recovery, processor scheduling information and the information such as system call parameter conversion and encapsulation, thread management module is connected with central processing module with memory management module, process manager module respectively, transmits the distribution situation of thread-data in internal memory, thread running status and schedule information and processor running status respectively,
Described simulator engine module comprises: public system interface, for each assembly provides unified cooperation control, and drives each assembly operating to complete by the loading of simulation process, initialization, operation and removing;
Described memory management module is connected with stack with heap respectively, the management information of piling in transmission process is respectively as the foundation destruction of heap and Memory Allocation, the management of each thread stack internal memory in process, memory management module comprises: virtual memory management unit, adopts paging scheme to manage 4GB virtual memory; Simulate the virtual memory management behavior of Windows, the Memory Allocation of complete operation system level, recovery and access privilege control simultaneously;
Memory management module: the Paged Memory administrative mechanism of simulation Windows, for the Memory Allocation of process to be analyzed and recovery and rights management, the memory management API for system provides the simulated implementation of bottom, safeguards heap and the stack of process to be analyzed simultaneously;
Described process manager module comprises: for driving the complete execution flow process of process; Safeguard all threads that this process comprises and scheduling thereof; In process, system handle (Handle) is safeguarded and memory address distribution; And process PEB(ProcessEnvironmentBlock) etc. the establishment of data structure and maintenance;
Process manager module: safeguard and manage all threads of process to be analyzed by the contextual information that process to be analyzed is relevant;
Described thread management module comprises: the environmental information safeguarding single thread in simulation process, as entrance, parameter, zone bit, stack address and size and TEB etc., and the implementation of drive thread from entrance, judge end condition simultaneously and terminate thread; Simultaneously for loading and the initialization of other modules (DLL) by this thread dynamic load;
Described central processing module comprises: for complete central processing module simulated environment be provided, comprise register, processor zone bit (eflags), running state information etc.; And interpretive simulation function is provided to x86 instruction set, x87FPU instruction set, MMX instruction set and SSE instruction set, thus realize the simulation of complete process device function; And submodule exception handler; Wherein exception handler is connected with processor module, transmitting processor abnormal information and abnormality processing result;
Thread management module and central processing module: for each thread safeguards independently central processing module environment, comprise register, processor zone bit and other runtime data of register, simulation explanation execution is carried out in each bar instruction that central processing module treats analysis process;
System call interfaces: take over the API Calls in process to be analyzed, to Core API, as internal memory is correlated with, adopt simulation to perform, other API then transfer to operating system directly to run, and ensure operational efficiency;
Analytic unit interface provides api interface for system, makes analyst can write analytic unit easily, completes automated procedures analysis;
Described simulator engine module connects loader, dis-assembling engine respectively and is connected, for carrying out the debugging interface of application program debugging with additional debug component, wherein: loader resolve process to be analyzed performed PE file and by analysis result by simulator engine module loading to memory management module, dis-assembling engine carries out dis-assembling to wall scroll x86 instruction, parses the information such as the operational code of instruction, source operand, destination operand; Debugging interface is used for transmitting debugging information, can carry out the debugging of application program and simulator self;
The performed PE file of described process to be analyzed refers to: the executable program in Windows exists with PE form, and simulator loads and resolves PE file and can be performed by simulation;
Embodiment 2
As shown in Figure 2, the processes such as loading, initialization, operating analysis, termination are experienced during system cloud gray model of the present invention.
Step one, load the PE file of process to be analyzed and the dynamic link library of dependence, environment when setting up complete Windowsx86 virtual operation;
Described runtime environment comprises linear memory address space, central processing module environment and relevant operation system function;
Step 2, treat analysis process and carry out instruction simulation execution, information when using the x86 instruction set simulator of lightweight to provide fine-grained operation, and carry out subsequent analysis, concrete steps comprise:
2.1 use heuristic iterative disassembly algorithm to attempt all instructions of dis-assembling;
2.2 Construction treatment device simulated environments, utilize the execution flow process of each bar instruction of dis-assembling information accurate analog;
2.3 extract the information such as internal storage access data, register value is used for process analysis;
The x86 instruction set simulator of described lightweight refers to: x86 instruction set simulator that is high performance, that run with low-cost, can carry out simulation perform when normally performing program and not causing appreciable impact to x86 instruction set;
Described fine-grained division is specifically: be accurate to the most fine granularity to operating system visible, i.e. instruction, this rank of register, but not the fundamental block that often adopts of the schemes such as such as binary pitching pile or this rank of function;
During described operation, information specifically refers to: internal storage access information such as memory address and internal storage data, register value, zone bit change information and issuable abnormal information etc.;
Step 3, simulate some operating system behaviors and ensure that process to be analyzed operates in controlled environment, concrete steps comprise:
3.1 load the procedure subject of curl.exe and the system module (DLL) of dependence, use address space randomization (ASLR) to distribute the base address of each assembly, determine memory mapping; Determine program entry point and end condition simultaneously;
3.2 pairs of process virtual memory headrooms adopt paging management, and wherein page size is 4KB; Its status information (idle, retain or submit to) and access control right (readable, can write, can perform) etc. are safeguarded to each page, and by the distribution of virtual memory management unit unified management memory pages and recovery;
3.3 thread management modules are used for all threads in managing process, and main thread when being included in program initialization, the thread in implementation are set up and destroyed;
The operating system exception handling of 3.4 exception handler adapters when processor occurs abnormal, the exception handler of working procedure in simulator environment;
Described operating system behavior specifically refers to: process initialization, memory management, thread management, abnormality processing etc.;
Step 4, to encapsulate all operations system API in process, and part transfers to operating system directly to perform, concrete steps comprise:
The API Calls of all simulation processes of 4.1 interception, performed by simulator direct modeling Core API (as Memory Allocation APIVirtualAlloc ()), other API(are as network AP Isocket ()) be sent to operating system execution;
4.2 complete Parameter Switch during API Calls, pointer type parameter are wherein carried out to the mapping of emulated memory address and true address, carry out deep copy map to emulated memory space for labyrinth body and newly assigned heap data;
Return simulator after 4.3API is finished to control, remove storehouse and resolve API and return results;
Described operating system API refers to that operating system is supplied to the application programming interfaces of consumer process, enables user program utilize operation system function.
Step 5, for simulation implementation in dynamic operation time information as in instruction stream, data and control flow check application programming interfaces are provided, make routine analyzer that these information can be utilized to carry out process analysis, be specially: the implementation of simulator is encapsulated as event, provide application programming interfaces in the mode of event handling; Mainly comprise instruction and perform event, internal storage access event, operating system API Calls event, thread scheduling event etc.; For the execution flow process of curl.exe, all instructions that can get wherein perform event, comprise the parameter such as register and zone bit change; Such as network API Calls event can be obtained, and therefrom can resolve network data; Simultaneously can reconstruction algorithm runtime environment for describing program behavior etc.