Disclosure of Invention
In order to make up for the defects of the prior art, the invention provides a mobile application third-party library isolation method based on a user-mode sandbox, which is a method for isolating a third-party library realized on a user layer.
The invention is realized by the following technical scheme:
a mobile application third-party library isolation method based on a user state sandbox comprises the following steps:
(1) rewriting the calling code of the third-party library sensitive API:
(1.1) initializing input and output:
(1.11) initializing a directory where the JAR package is located, and generating a tmp _ class folder;
(1.12) initializing a JAR package output directory to generate an out folder;
(1.13) loading predefined sensitive API information;
(1.14) adding a custom interface class corresponding to the sensitive API in the tmp _ class folder;
(1.2) rewriting the tool package by using the byte codes, traversing the JAR package to be rewritten, and acquiring information lists of all classes in the JAR package;
(1.3) traversing and rewriting each class in the JAR package, and replacing a sensitive API calling code in the JAR package with a corresponding custom interface calling code;
and (1.4) packaging the rewritten class files to generate a new JAR package to the directory tmp _ class folder.
Preferably, (1.13) loading the predefined sensitive API information specifically operates to:
(1.131) defining sensitive API information needing to be rewritten by a developer according to needs, wherein the sensitive API information comprises a class name, a method name and a method signature, and the sensitive API information is stored in a file;
(1.132) reading the file of the sensitive API information, and analyzing the class name, the method name and the method signature.
Preferably, (1.14) is specifically operative to:
(1.141) defining a corresponding custom interface according to the sensitive API information, returning a false value in the custom interface or directly blocking the original operation to generate a custom interface class;
(1.142) adding the custom interface class to the rewritten class file directory tmp _ class folder.
4. The user state sandbox based mobile application third party library isolation method of claim 1, wherein in (1.2), the bytecode rewriting toolkit employs javasist.
Preferably, (1.3) is specifically operative to:
(1.31) loading byte codes of classes from the JAR package;
(1.32) traversing and searching calling codes of the sensitive API in the byte codes according to the loaded sensitive API information;
(1.33) rewriting calling codes of the sensitive API and replacing the calling codes with calling codes of the corresponding custom interfaces;
(1.34) writing the rewritten bytecode back to the class file, and placing the class file in the rewritten class file directory tmp _ class folder.
Preferably, the method further comprises the following steps:
(2) rewriting third-party library dynamic loading code:
(2.1) initializing the input output directory:
(2.11) initializing a directory for receiving the dynamic loading file and generating an in folder;
(2.12) initializing a directory for storing the smali file, and generating a tmp _ smali folder;
(2.13) initializing the dynamic loading code to rewrite the output directory and generating an out folder;
(2.2) when the client dynamically loads the code in the running process, uploading the dynamically loaded code to the server for rewriting;
(2.3) the server receives the dynamic loading code uploaded by the client;
(2.4) the server preliminarily processes the dynamic loading code, and extracts a dex file;
(2.5) decompiling the dex file by utilizing a smali/bakamali tool chain to generate various kinds of smali files, and storing the smali files into a tmp _ smali folder;
(2.6) adding a custom interface class corresponding to the sensitive API into a tmp _ smal folder in the rewritten smal file directory;
(2.7) traversing each of the smali files, searching a calling code of the sensitive API and replacing the calling code with a calling code of a corresponding custom interface;
(2.8) compiling the tmp _ smal folder in the rewritten smali file directory by utilizing a smali/bakmali tool chain to generate a new dex file and converting the dex file into a dynamic loading code in an original format;
and (2.9) the server side informs the client side to download the rewritten dynamic loading codes and to load and execute the codes.
Further, (2.4) is specifically operative to:
(2.41) if the dynamic loading code is the apk file, extracting dex;
(2.42) if the dynamic loading code is a jar file, extracting dex;
(2.43) if the dynamic loading code is the dex file, directly entering the next step.
Further, (2.7) is specifically operative to:
(2.71) reading the contents of the smali file;
(2.72) traversing and searching calling codes of sensitive APIs in the smali;
(2.73) rewriting the calling codes of the sensitive API and replacing the calling codes with the calling codes of the corresponding custom interfaces;
(2.74) writing the rewritten smali code back to the smali file and placing the rewritten smali code in a tmp _ smali folder in the rewritten smali file directory.
Compared with the prior art, the invention has the following beneficial technical effects:
the method is realized based on code rewriting, the codes of the existing third-party library are analyzed and rewritten, the calling codes of the sensitive API are replaced, privacy-sensitive behaviors in the codes of the third-party library are all limited in the user-mode sandbox through rewriting of the codes of the third-party library, the authority of the third-party library is well restricted, and the method can be used for protecting user privacy data in mobile equipment from being stolen by a malicious third-party library. The invention has good expandability as a loosely coupled framework. The method mainly rewrites the third-party library according to the configuration file provided by the developer, and can rewrite the calling codes of the new sensitive API in the third-party library as long as the information of the new sensitive API and the corresponding custom interface function are defined. The method makes up the defects of the current permission model of the android system, can make independent limitation on the permission of the third-party library in the mobile application, prevents the third-party library from abusing the permission of the host application, and harms the privacy and safety of users and applications. The invention is mainly realized in the user layer and has obvious advantages compared with the prior scheme. Compared with the traditional system level solution, the system source code does not need to be modified, the ROOT authority does not need to be applied, and the system can be used as an independent tool, so that the system is easy to use and popularize. Compared with the existing byte code rewriting scheme, the method mainly works in the development stage, directly processes the third-party library, and does not need to process code confusion in the existing application.
Furthermore, the invention also monitors the behavior of the third-party library for dynamically loading the code, when the third-party dynamically loads the code, the dynamically loaded code can be uploaded to the server in real time for rewriting, thereby solving the problem that the dynamic code loading cannot be processed by the existing code rewriting scheme, further preventing the third-party library from abusing the authority of the host application, and more effectively protecting the privacy and the safety of the user and the application.
Detailed Description
The present invention will now be described in further detail with reference to specific examples, which are intended to be illustrative, but not limiting, of the invention. Referring to fig. 1 and 2, the present invention includes two parts, a third-party library rewrite module and a dynamic load code rewrite module. The dynamic loading code rewriting module is mainly used by the third-party library in the running process and is used for rewriting the code dynamically loaded by the third-party library.
Third party library sensitive API call code rewrite
Referring to fig. 1, the detailed implementation of this section is as follows:
step 1, initializing input and output.
The part is mainly used for rewriting the JAR file of the third-party library, so the system firstly initializes the storage directory of the hash file after the JAR file is rewritten and the output directory of the JAR package after the JAR file is rewritten.
And (1.1) initializing a directory where the JAR package is located, and generating a tmp _ class folder.
And (1.2) initializing the JAR package output directory to generate an out folder.
(1.3) loading predefined target sensitive API information;
the method mainly isolates the code of the third-party library according to the requirement of a developer and limits the specified behavior of the third-party library, so that the method needs the developer to specify the sensitive API call list of the third-party library needing to be limited according to the format provided by the invention, and the system rewrites the byte code of the third-party library according to the sensitive API information provided by the developer and controls the call of the specified sensitive API in the third-party library.
(1.31) defining sensitive API information needing to be rewritten by a developer according to the requirement, wherein the sensitive API information comprises a class name, a method name and a method signature, and storing the sensitive API information in a file;
the sensitive API information mainly comprises class names, method names and method signatures, the sensitive API information is separated by commas and is arranged according to a row, a developer can put all sensitive API information into a txt file, and a system can read the file and analyze the file according to the rules.
(1.32) reading a file of the sensitive API information, and analyzing a class name, a method name and a method signature;
the system reads out the sensitive API information from the file provided by the developer, and analyzes and stores the sensitive API information into the system for use when the subsequent codes are rewritten.
(1.4) adding a custom interface class corresponding to the sensitive API in the tmp _ class folder;
the custom interface class is mainly used by the rewritten third-party library to replace a system sensitive API, and the original privacy sensitive operation of the third-party library is blocked.
(1.41) defining a corresponding custom interface according to the sensitive API information, and returning a false value in the custom interface or directly blocking the original operation;
the custom interface is mainly used for blocking original operation of the third-party library, and different system privacy APIs have different return values, so that the method has different processing modes aiming at different system privacy APIs. For example, a self-defined interface corresponding to the short message sending API directly returns a null value without any operation; and the custom interface corresponding to the API for obtaining the device ID may return a false device ID to spoof the third party library. Therefore, the private data of the user and the application can be prevented from being stolen by the third-party library, and the third-party library can be prevented from running abnormally or crashing due to the blocked operation.
(1.42) writing the custom interface into the new class as an independent class to be rewritten and used by the code;
(1.43) adding the custom interface class into a tmp _ class folder in the rewritten class file directory;
since the custom interface class is finally called by the rewritten third-party library, the class file needs to be added to the rewritten third-party library.
Step 2, rewriting the javasissst of the toolkit by using the byte codes, traversing the JAR package to be rewritten, and acquiring a list of all classes in the JAR package, wherein the list comprises a package name and a class name;
the part adopts a javasissist tool to rewrite the byte codes of the third-party library, and the javasissist tool supports the extraction of the specified classes from the JAR files for rewriting, so that all class file lists in the JAR package are obtained firstly, and then rewriting is performed one by one according to the lists.
Step 3, traversing and rewriting each class in the JAR package, and replacing a sensitive API calling code therein;
the main working principle of the invention is to rewrite the code of the third-party library, rewrite the calling code of the system sensitive API in the code of the third-party library, and replace the calling code of the corresponding self-defined interface, and the self-defined interface returns a false value or returns a null value to block the original operation of the third-party library, thereby ensuring that the system sensitive API can not be used by the third-party library at will and protecting the privacy sensitive data of users and applications.
(3.1) loading byte codes of the classes from the JAR packet;
the part mainly adopts a javasissist tool to rewrite byte codes of classes in a JAR packet of a third party. The javasissist tool supports direct loading of the JAR package, and takes out the specified class file from the JAR package and rewrites the byte code of the specified class file, so that the byte code of the specified class is taken out from the JAR package one by one according to the list of the class files in the JAR package obtained in the step 2, and traversal rewriting is performed.
(3.2) traversing and searching calling codes of the sensitive API in the byte codes according to the loaded sensitive API information;
according to the sensitive API information to be rewritten provided by the developer, the system searches the byte codes of each class in a traversing way, finds out the calling codes of the sensitive API from the byte codes, and rewrites the next step.
(3.3) rewriting the calling codes of the sensitive API, replacing the calling codes with the calling codes of the custom interface, and blocking related operations in the custom interface;
and according to the found calling code of the sensitive API, the system calls the javascript tool to rewrite and replace, rewrite the original code into the calling code of the user-defined interface, and block the original operation in the user-defined interface.
(3.4) writing the rewritten bytecode back to the class file, and putting the class file into a rewritten class file directory tmp _ class folder;
after the byte code file of the specified class is rewritten, rewriting the rewritten byte code into the class file according to the path of the package name and the class name. And traversing and rewriting the next class by the system according to the list of the class files of the JAR package until all the byte codes of all the class files in the JAR package are rewritten, and storing the class files in a temporary directory tmp _ class folder according to the path of the package name class name.
Step 6, packing the rewritten class files, and generating a new JAR package to a directory tmp _ class folder;
and packaging the rewritten class bytecode folder tmp _ class directory to generate a new JAR file, wherein the JAR file is processed at the moment, and the calling code of the sensitive API is rewritten, so that the behavior of the third-party library can be well controlled.
Step two, rewriting dynamic loading code
The second part of the invention is dynamic load code rewriting, since the third party library can dynamically load code during the operation process. The part is mainly deployed on a remote server, when a third-party library rewritten by the first part dynamically loads codes in operation, the codes to be loaded can be uploaded to the remote server in real time, and a dynamically loaded code rewriting module deployed on the server can receive the codes uploaded by a client in real time and rewrite the dynamically loaded codes. After rewriting is completed, the client can download the rewritten code to the local dynamic loading execution. Therefore, the behavior of dynamically loading the code by the third-party library is well controlled.
Referring to fig. 2, the specific implementation of this section is as follows:
step 1, initializing an input/output directory;
the system is mainly used for rewriting the code files dynamically loaded by the third-party library, so that the system firstly initializes the directory stored in the received code files, then initializes the directory stored after the dynamically loaded code files are decompiled, and finally initializes the directory output after the dynamically loaded codes are rewritten.
(1.1) initializing a directory for receiving the dynamically loaded file and generating an in folder;
(1.2) initializing a directory for storing the smali file, and generating a tmp _ smali folder;
(1.3) initializing a dynamic loading code to rewrite an output directory and generating an out folder;
step 2, when the client dynamically loads the code in the running process, uploading the dynamically loaded code to the server for rewriting;
because the JAR code of the third-party library of the client is rewritten by the first part of the rewriting module of the third-party library, the third-party library can upload the code file to be loaded to the server in real time when the code is dynamically loaded in the running process, and the server can rewrite the dynamically loaded code.
Step 3, the server receives the dynamic loading code uploaded by the client;
after the server is deployed, the server can monitor the file uploading request of the client in real time, timely accept the dynamic loading code file uploaded by the client and rewrite the code.
Step 4, the server side initially processes the dynamic loading code to prepare for the next rewriting;
the third-party library dynamically loads codes in three formats, namely dex, apk and JAR files, so that different rewriting schemes are required to be adopted for different file formats. The JAR package used in the dynamic code loading is a compressed package containing the dex file, and when the JAR package is loaded, the dex file in the JAR package is unpacked firstly, and then various classes in the dex file are continuously loaded in an analyzing mode. When the system dynamically loads the apk code, the dex file in the file is mainly loaded. Therefore, for processing the JAR package and the apk file, the system extracts the dex file from the JAR package and the apk file, and then the dex file is used as the dex file to be rewritten in the next step.
(4.1) if the apk file is the apk file, extracting dex;
(4.2) if the file is the jar file, extracting dex;
(4.3) if the file is the dex file, directly entering the next step.
Step 5, decompiling the dex files by utilizing a smali/bakamali tool chain to generate various kinds of smali files, and storing the smali files into a tmp _ smali folder;
for the rewriting of the dex file, the invention adopts a smali/bakamali tool chain to perform decompiling to obtain the smali file, and then the next step of traversal rewriting is performed. The smali file is mainly a code file written in the smali language and is a specific file format of the android platform. The smali language is a register language of the Dalvik virtual machine, with language features similar to assembly language. Although the main programming code of the android platform is java, the java code does not generate a class file after being compiled, but is packaged in a dex file generated in an apk file. The compiled smili language is various assembly instructions, but basically keeps structural features and language logic features of classes, methods and the like of the original java language, and is easier to understand compared with the assembly language.
Step 6, adding a custom interface class;
the dex file is used for replacing the original system sensitive API, and the original privacy sensitive operation of dynamically loading the code in the third-party library is blocked.
(6.1) defining a corresponding custom interface according to the sensitive API information, and returning a false value in the custom interface or directly blocking the original operation;
the custom interface is mainly used for blocking the original operation of dynamically loading the code in the third-party library, and different system sensitive APIs have different return values, so that the invention has different processing modes aiming at different system sensitive APIs. For example, a self-defined interface corresponding to the short message sending API directly returns a null value without any operation; and the custom interface corresponding to the API for obtaining the device ID may return a false device ID to spoof the third party library. Therefore, the method can not only ensure that private data of users and applications are not stolen by the dynamic loading codes, but also ensure that the dynamic loading codes are not operated abnormally or crashed because the operation is blocked.
(6.2) writing the custom interface into a new Smali file, and rewriting and using the SMI file as an independent class by the code;
(6.3) adding the custom interface class into the tmp _ smal folder in the rewritten smali file directory;
since the custom interface class is finally called by the rewritten dynamic loading code, the class file needs to be added to the rewritten dynamic loading code.
Step 7, traversing each Smali file, searching the calling code of the sensitive API, rewriting and replacing the calling code with the calling code of the corresponding custom interface;
sensitive API calling codes in the smali codes are analyzed in a traversing mode, then text rewriting is directly carried out, as long as new parameters are not introduced in the rewriting process, a register is added or the logic structure of the original codes is changed, the running logic of the original codes can be kept, and compiling errors are not introduced. And through rewriting, the privacy sensitive behavior of the dynamic loading code can be effectively controlled.
(7.1) reading the contents of the smali file;
(7.2) traversing and searching calling codes of sensitive APIs in the smali;
the calling code in the smali language also comprises information such as calling type, class name, method name and method signature, so that the calling code of the API sensitive to the system can be easily found out by analyzing the smali code.
(7.3) rewriting the calling codes of the sensitive API, replacing the calling codes with the calling codes of the corresponding custom interface, and blocking related operations in the custom interface;
according to the searched calling code of the sensitive API and the format of the smili language, text replacement is directly carried out on the code, the original code is rewritten into the calling code of the custom interface, and the original operation is blocked in the custom interface.
(7.4) writing the rewritten smali code back to the smali file, and putting the rewritten smali file directory tmp _ smali folder;
after the SMali file of the specified class is rewritten, rewriting the rewritten SMali codes into the class file according to the path of the package name and the class name. And traversing and rewriting the next class until the smali codes of all class files in the dex file are rewritten, and storing the class codes in a temporary directory tmp _ smal folder according to the path of the packet name class name.
Step 8, compiling the rewritten smali file directory tmp _ smali folder by using a smali/bakmali tool chain to generate a new dex file and converting the dex file into a dynamic loading code of an original format;
and packing the tmp _ smal directory of the rewritten smali file by using a smali/bakmali tool chain, and recompiling to generate a dex file, wherein the dex file is processed at the moment, and the calling code of the sensitive API is rewritten, so that the behavior of dynamically loading the code can be well controlled.
Step 9, the server side informs the client side to download the rewritten dynamic loading code;
after the rewriting is completed, the server needs to notify the client to download the rewritten dynamic loading code back to the local.
Step 10, the client downloads the rewritten dynamic loading code to the local and loads and executes the code;
after the client receives the notification of the server, the client downloads the rewritten dynamic loading code to the local, and the dynamic loading is executed, at this time, the dynamically loaded code is rewritten by the server, wherein the privacy sensitive behavior is limited, and the privacy data of the user and the host application can be well protected.
The performance effects of the present invention can be further illustrated by the following experiments:
1) conditions of the experiment
The third-party library rewriting module is used as an independent java program and is used for rewriting a JAR packet of the third-party library, and the dynamic loading code rewriting module is directly deployed on an Apache Tomcat server and is accessed by the rewritten third-party library in the running process. The hardware platform selects a common PC and an LG Nexsus 5 mobile phone which is brushed into a native android 6.0 operating system.
2) Content of the experiment
And respectively downloading 20 popular third-party libraries at home and abroad, and respectively integrating the third-party libraries into the test application. And after the third-party library can normally run in the application, rewriting the JAR package of the third-party library by using the third-party library rewriting tool of the invention, and putting the rewritten third-party library into the test application again for running to compare the effects before and after the rewriting.
Developing a simulation third-party library, performing various privacy-sensitive operations and dynamic code loading in the simulation third-party library, rewriting the simulation third-party library, and observing the limit condition of the invention on the self code and the dynamic code loading of the simulation third-party library.
And (3) performing various privacy sensitive operations by using a simulation third-party library, testing the consumed time before and after rewriting for each operation, executing each operation 100 times, calculating the average time, comparing the performance load before and after rewriting, and measuring the performance loss caused by the method.
3) Analysis of results
As can be seen from Table 1, the present invention can effectively control the codes of the entity third-party library itself, isolate them in a user-mode sandbox, and prevent the third-party library from freely accessing the private data of the user and the host application.
Table 1 results of functional testing in entity third party libraries using the present invention
As can be seen from Table 2, the present invention can effectively control the simulated third-party library and the dynamically loaded code thereof, isolate them in a user-mode sandbox, and prevent the third-party library and the dynamically loaded code thereof from randomly accessing the private data of the user and the host application.
Table 2 functional test results of simulation of third party library using the present invention
As can be seen from Table 3, the rewriting of the third-party library by the present invention does not substantially bring extra time overhead to the third-party library, and since the operation in the third-party library is blocked after the rewriting, most of the time required for the operation is also reduced.
Table 3 performance test results of simulation of third party library using the present invention