1 Introduction
Fuzzing is a popular program testing technique that has gathered much momentum in both academia and industry due to its effectiveness and scalability. Since the number of bugs in a project usually grow exponentially with the amount of code, operating system kernels such as Linux, which comprise of code on the scale of several million lines of code or more, have no shortage of critical vulnerabilities. As kernels execute in the processor’s privileged mode, triggering any of its bugs can lead to catastrophic results, including loss of data, exposure of sensitive information, unauthorized code execution, and so on. For instance, a recent vulnerability labelled CVE-2021-42008 [Hutchings
2021] is classified as Slab-Out-Of-Bounds that can result in kernel memory corruption, and when properly exploited it can ultimately result in privilege escalation. Such a vulnerability is not an exception: there have been numerous kernel vulnerabilities [Google
2021a; Deucher
2021; Lantz
2021] found and exploited by malicious parties over the past few years. Therefore, it is of paramount importance to extensively study and develop kernel fuzzing techniques to improve the safety of operating systems and to protect users and the public.
Fundamentally, fuzzers test programs through repeatedly feeding the target program with different inputs, such as raw data bytes, structured objects and network packets, and observing the program’s execution for any exceptional or faulty behavior. State-of-the-art fuzzers generally employ techniques such as coverage feedback, input mutation and grammar-based generation to effectively explore the state space and find bugs on a wide variety of programs.
Kernel fuzzers generally follow the testing paradigms of userland program fuzzers. Using
Syzkaller [Vyukov
2016] as an example, kernel fuzzers generate
system call sequences to feed as input into the kernel.
Syzkaller is feedback-driven and employs mutation-based and grammar-based input generation, allowing it to generate sophisticated system call sequences. While userland fuzzers run in the same machine as its target program, kernel fuzzers differ in design by running the target kernels in an emulated environment, such as QEMU [Bellard
2005], since fully testing kernels requires allowing it to run in the processor’s privileged mode. Therefore, in order to run multiple fuzzing instances at once and prevent bugs from crashing the fuzzer itself, kernel fuzzers are generally split into two components, specifically (1) a fuzzing manager running on the host machine with a stable kernel, and (2) fuzzer instances running on guest virtual machines with target kernels. The manager performs the following operations: synchronizing global fuzzing statistics and information between all guest fuzzing instances, and bookkeeping global data including overall coverage and new test cases. The fuzzer instances generate (or mutate) and execute test cases, monitor the target kernel’s execution status during test case execution, and synchronize information to and from the manager process. The test cases consist of system call sequences, which are organized as highly structured data when transferred between the host process and guest instances. Many kernel fuzzers have followed a similar design path to
Syzkaller’s to transfer sophisticated coverage data or test case inputs to and from the guest instance, such as Moonshine, HEALER, RtKaller, Tardis, and the like. kAFL [Schumilo et al.
2017] is another popular kernel fuzzer optimized towards fuzzing kernels for the x64 architecture. Similar to Syzkaller, kAFL is designed with the guest kernel running in a QEMU/KVM instance and the manager process residing in the host operating system, thus it also performs data transfers from the host process to guest instances. What is different, however, is its kernel fuzzing technique, which is similar to that of AFL [lcamtuf
2013], where it feeds generated inputs as linear bytes into certain kernel buffers and tries to trigger bugs within the kernel itself. Thus, kAFL’s data transfers are mainly serialized buffers rather than highly structured data, which is easier to handle compared to Syzkaller-like kernel fuzzers.
These host-guest data transfer characteristics in kernel fuzzers result in the need for data transfer mechanisms, which can be inefficient depending on its design and implementation. For instance,
Syzkaller uses
Remote Procedure Calls (RPCs) [Birrell and Nelson
1984] to invoke specific functions in the guest fuzzers. Specifically,
Syzkaller’s fuzzer and manager send data into RPC stubs, which in turn converts the data into RPC payloads to transfer over TCP/IP networks. This method is used for a broad range of functions within
Syzkaller, including the fuzzer sending new inputs to the manager and the manager synchronizing inputs and coverage information to the individual fuzzer instances. In our preliminary survey, these procedures take roughly more than one-third of the entire execution time during
Syzkaller’s fuzzing campaign, which is significant as the amount of time that system call sequences are executed to actually test the underlying kernel can only account for roughly one-half of the entire campaign.
We observe that the guest virtual machine’s memory pages physically reside in the host’s memory. However, the host is barred from accessing the data stored within the guest’s memory pages due to the process isolation mechanisms in modern systems. Therefore, if we allow the fuzzer and manager instances to communicate and transfer data through directly accessing data in another instance’s memory space, we can devise methods to transfer non-trivial highly-structured data efficiently by leveraging the ability to directly copy the information to and from exact memory locations in either instance, thus greatly reducing the data transfer overheads.
We use this insight to design Horus, a kernel fuzzing data transfer mechanism that proposes such memory access procedures in kernel fuzzing, thus providing efficient and consistent data transfers between the two instances through directly accessing another instance’s memory space. We explain our designs on Syzkaller to illustrate our approach, since the relevant mechanisms require some modifications to the fuzzer itself. Horus facilitates efficient data transfers using the following procedures. During the kernel fuzzer’s initialization process, Horus creates fixed stub structures in the guest fuzzer instance. It generates memory layout descriptions for the stub structures and registers these structures to the host manager process over RPC. When the fuzzing campaign is underway, whenever the fuzzer or manager wishes to send highly-structured non-trivial data structures to the other side, they transfer the data through the stub structures using algorithms designed to conduct efficient and consistent transfers. Briefly speaking, the transfer stubs intercept the original RPC calls and offload the actual data to transfer stubs. The stubs copy the data structure’s metadata and actual data, including all referenced data, into the stub structures. Stubs in the destination instance then retrieve the data from these stub structures using the corresponding layout descriptions.
We implemented prototypes of Horus on kernel fuzzers, including Syzkaller, Moonshine and kAFL, all popular kernel fuzzers, and evaluated Horus’s effectiveness in reducing data transfer latency and improving execution throughput on recent and major versions of the Linux kernel. For Syzkaller and Moonshine, which conducts structured data transfers during fuzzing, when integrated with Horus, their data transfer latencies decreased by 84.5% and 85.8% on average while their execution throughputs increased by 31.07% and 30.62% on average, respectively. Furthermore, Syzkaller and Moonshine’s coverage statistics achieved a speedup of \(1.6\times\) and \(1.6\times\), and improved by \(6.9\%\) and \(8.2\%\) over 12 hours, respectively. In addition, Syzkaller and Moonshine were able to find more bugs in a limited amount of time by using Horus, of which five were previously unknown and confirmed by the kernel maintainers. In kAFL’s instance, Horus increased its Redqueen component’s execution speeds by \(19.4\%\), but does not present a significant advantage overall over vanilla kAFL, which is within expectations, as kAFL mostly only performs linear buffer transfers during execution. Regardless, Horus still demonstrates its effectiveness to improve fuzzing efficiencies for kernel fuzzers, as many kernel fuzzers conduct data transfers of non-trivial data structures in methods similar to that of Syzkaller.
In summary, this paper makes the following contributions.
•
We identify that host-guest communication and data transfers in state-of-the-art kernel fuzzers incur significant overheads when transferring highly structured data, resulting in reduced performance during kernel fuzzing.
•
We observe that the efficiency of kernel fuzzers can be improved by developing efficient host-VM memory access procedures, through directly accessing memory located within a virtual machine instance, thus allowing direct memory access to transfer the relevant data.
•
We design and implement Horus, a kernel fuzzing data transfer mechanism that provides efficient data transfers through efficient data transfer techniques for improved kernel fuzzing performance.
•
We demonstrate that Horus can significantly improve kernel fuzzers’ execution throughput, speed of coverage growth, and bug detection, demonstrating that these data transfer procedures can be beneficial towards a kernel fuzzer’s effectiveness.
3 Motivation
Kernel fuzzers generally rely on traditional remote transmission procedures or virtual machine management techniques to communicate and transfer data with the guest instances, thus possibly incurring insignificant overheads. For example,
Syzkaller and
Moonshine use
Remote Procedure Calls (RPC) over TCP/IP to communicate and send data between guest fuzzer instances and the host manager to synchronize fuzzing-relevant information, including new inputs, crash information, runtime statistics, and so on. Using
Syzkaller as an example, we visualize the process of sending a test case from the fuzzer instance running in the VM to the manager instance running in the host machine in Figure
1. We observe that sending structured data from the guest fuzzer requires invoking computation and memory-use intensive operations, such as serialization and deserialization, for the data to be transferred using common methods such as TCP/IP and reach the host manager. Intuitively, using RPC to transmit data requires highly structured data structures to be encoded and decoded, incurring significant overheads for non-trivial workloads. The data that kernel fuzzers transmit to synchronize relevant information is highly-structured, i.e., containing many layers of structures, and
non-trivial, i.e., containing references to external data. Therefore, using RPC will only exacerbate the problem. Furthermore, the fuzzer instance runs in a emulator, thus incurring even more overhead due to the instruction translation and address conversion processes during its execution. Thus, the fewer instructions executed during such a data transfer on the guest side, the more efficient the fuzzing process will become. This issue also affects kernel fuzzers with similar data transfer designs, where highly-structured
non-trivial data is sent between host and guest instances.
To quantitatively understand the severity of this problem, we broke down
Syzkaller’s performance metrics using
pprof [Google
2021b].
pprof profiles and reports each component’s proportions of the entire execution time. As shown in both Figures
2(a) and
2(b), a significant proportion of
Syzkaller’s fuzzing manager’s and guest fuzzer’s execution time is spent on RPC calls, thus justifying our concerns. A more detailed investigation reveals that encoding and decoding highly structured and non-trivial data such as new inputs and coverage are the root cause of this overhead.
Fuzzers can circumvent these mechanisms by making the kernel fuzzing-aware, i.e., modifying the kernel and emulator to expose an interface that facilitates direct transfers between the guest fuzzer and the host manager. However, a general rule of thumb in fuzzing is to avoid modifying the test target if possible to avoid introducing any new bugs, thus rendering this approach inadvisable. Additionally, these fuzzers can leverage OS-provided facilities such as the network stack, hardware buses and character devices to perform direct memory accesses. However, not all operating systems provide such facilities, while adapting such techniques to the respective operating systems require a non-trivial amount of human labor.
In reality, however, kernel fuzzers are merely processing and moving one memory object to another memory location, when the guest VM’s pages already reside in the host’s memory space, only that it is isolated by the host operating system’s virtual memory mechanisms. Therefore, if we can expose the guest VM’s memory to the host manager, we can perform data transfers between the guest fuzzer and host manager through directly accessing the other instance’s memory space, thus offloading data movement from RPC calls.
To transfer data efficiently between the host manager and the guest fuzzers, we come across the following challenges:
1. Accessing data structures in another instance’s memory space correctly. We need some form of description to find relevant fields of structured data in a foreign memory space. Then, the host manager can access specific parts of the guest’s memory to retrieve and send data to the fuzzer instances. First, we need to expose the guest instances’ memory into the host’s memory space. Then, the host manager instance needs to understand the location in the guest instance’s memory of the data being transmitted to read and process the relevant data.
2. Transmitting the data efficiently and consistently to another memory space. Inter-memory-space data movement poses significant challenges towards maintaining the consistency of the target data structures. Specifically, retrieving data from a foreign memory space requires us to correctly reconstruct all structure entries and externally referenced data, while transferring data demands that we keep structural pointers and metadata intact while filling the actual contents. If not addressed adequately, this may result in situations such as loss of data, invalid pointers, incorrect amount of data transferred, and more. In addition, performance is also a significant factor in our considerations, since inefficient transfer methods will reduce the benefits of such a method.
4 Horus Design
We design Horus, a kernel fuzzing data transfer mechanism that facilitates efficient and consistent highly structured data passage between the host manager and guest fuzzers, and improves overall kernel fuzzing throughput. Horus addresses the aforementioned challenges by removing memory barriers between the host manager and guest fuzzers and providing resources and methods for efficient memory transfers. Specifically: (1) we perform modifications to existing tools and propose new primitives that allow the host manager to correctly find target data structures within the guest’s memory space; and (2) we offload the data transfer functionalities of RPC calls to Horus’s stubs by designing algorithms to efficiently and consistently transfer data between the host manager and guest fuzzers.
We illustrate
Horus’s design when implemented for
Syzkaller in Figure
3. As shown in the figure,
Horus’s design consists of manager-side and fuzzer-side
stubs that transparently transfers the data to and from the relevant memory regions of
fixed stub structures. These stub structures are created during
Syzkaller’s initialization process and persist throughout the entire lifecycle of the virtual machine. As their locations are constant, the fuzzer sends the relevant descriptions of their layout and locations to the manager process (Section
4.1). Their purpose is to hold data corresponding to the relevant data structures being transmitted between the fuzzer and manager during fuzzing, as the manager will understand exactly where to find the data that it intends to receive or place the data it wishes to transfer. When the fuzzing campaign is underway, if the fuzzer wishes to pass data to the manager, such as in cases where the fuzzer sends the manager process new system call sequences that trigger new kernel behavior,
Horus stores these system calls and their relevant data within the corresponding stub structures.
Horus will then send the same RPC call minus the actual data to the manager process to notify the manager that the transfer of data is ready to commence.
Horus’s stubs on the manager side intercept the RPC calls and transfers the data from the guest instance’s memory space into the manager process’s memory space using the descriptions sent by the fuzzer instance during initialization. After this is completed, the reconstructed structures are then returned to the manager for further processing, whereas in the fuzzer instance the RPC call finalizes, allowing it to continue its fuzzing operations. If the manager wishes to pass information to a fuzzer instance, instead of the fuzzer process first placing the data within the stub structures, the RPC call is first initialized, where the manager process places the data to be transferred within the guest’s memory space, after which the RPC call finalizes with the fuzzer instance retrieving the target data from the relevant stub structures (Section
4.2). Though our description is
Syzkaller-oriented,
Horus is not restricted to one single fuzzer.
Horus can be designed and implemented on kernel fuzzers that separate their fuzzing logic into distinct parts that run in both the host and guest operating systems and require transferring highly structured data between the respective instances, such as Moonshine and Healer. Adapting
Horus to other fuzzers will follow a similar approach, including identifying data transfer entities, using
Horus to create stub structures, and devise transfer routines based on the overall data structure.
4.1 Correctly Finding Inter-Memory-Space Data Structures
Syzkaller’s model of communication between fuzzer and manager instances only consists of data transfer pathways. Therefore, we can expose each guest virtual machine’s memory space to the manager process, allowing for direct access to each fuzzer instance’s memory to transmit and receive data requested and sent from the respective fuzzers.
Since QEMU is widely used to run the fuzzer instances, we modify QEMU’s guest machine RAM allocation scheme to expose the guest’s memory space to the host machine. Specifically, when QEMU is initializing the guest machine’s memory, instead of allocating a chunk of memory backed by anonymous pages, we redirect the allocation to shared memory and place a file descriptor under the system’s shared memory directory. During Syzkaller’s initialization process, after successfully booting the guest virtual machine, Syzkaller’s manager maps its RAM section, backed by shared memory, into its own memory space.
Now that the host manager is able to access the guest fuzzer’s physical memory, we need to inform the host manager of the source or target data structure’s locations in the guest’s physical memory. The data structures used in Syzkaller’s synchronization process are highly-structured and non-trivial. We denote highly-structured to be data structures with multiple levels of hierarchies, while non-trivial data structures are those with external references, such as pointers pointing to buffers in another memory location. The memory layout description of these data structures needs to address the following issues.
First, for highly structured non-trivial data structures, transferring between the fuzzer and manager instances requires moving and reconstructing not only the structure or array itself, but also all references that the member values may point to, in addition to fixing the references in the
non-trivial data structures for the target memory space. For instance, the
Input struct, defined in the module
syzkaller/rpctype, is frequently used in RPC call transfers to represent a test case. We show its structure in Figure
4, where we see that the member variables reference other buffers which contain the actual data. Apparently, copying the
Input struct itself is insufficient: we also need to copy the buffers that the fields
Name,
Prog and
Cover point to. Furthermore, we need to fix the data pointer values so that they are valid in the target address space. Therefore, the description needs to encompass the memory information of the structure itself and all referenced data structures.
Second, virtually contiguous memory addresses may fragment when translated to physical memory addresses as a result of the state of the page tables during the instances’ execution. Thus, when constructing memory layout descriptions for virtually contiguous memory areas, such as buffers, we need to determine, at each page boundary, the corresponding physical pages and describe the entire area using a sequence of physical pages.
Therefore, we design the following primitives and use them to describe the memory layout information of any data structure. First, we define the ContiguousArea structure to describe a contiguous physical memory area. It consists of two members, a base physical address and the length of the area. A contiguous physical memory area spanning multiple physical pages can be represented by a single ContiguousArea instance. This is the most basic building block for more complex structures. Next, we describe a virtually-contiguous memory chunk as VirtualChunk. VirtualChunk consists of a sequence of ContiguousArea instances, thus representing virtual memory chunks correspond to one or multiple physical memory areas. Now, we can describe the memory layout of structures and arrays. Arrays can be represented simply as a VirtualChunk, since they only consist of a contiguous memory chunk. Self-contained structures, i.e., structures that have no additional references, can also be represented using a VirtualChunk instance. Finally, we can describe non-trivial data structures using combinations of the aforementioned description primitives. Specifically, the description first contains a VirtualChunk that describes the memory layout of its own structure body. All referenced data structures can be represented as instances of the aforementioned primitives, such as member arrays using VirtualChunk instances.
Using the Syzkaller’s Input structure as an example, we briefly cover the components of its memory layout description. First, a VirtualChunk instance StructMem describes the memory location of the struct body. Next, we use three VirtualChunk instances to describe the memory locations of the individual data buffers of the member variables Name, Prog and Cover, respectively. Finally, the description for Input contains the description for the signal.Serial member structure Signal.
Given the complexity of generating such a description, it is not viable to transfer any arbitrary structure on demand. Doing so would require the fuzzer instance to iterate through the structure itself and all referenced data, find the corresponding physical memory areas, and send the description to the manager over RPC calls each time the fuzzer and manager instances need to transfer data. This can significantly reduce the overhead reduction of using direct memory copying, even when not considering the complexity of the design itself.
Thus, we propose to use fixed stub structures to facilitate data transfers between the fuzzer and manager instances. These structures are statically allocated during the fuzzer’s initialization process and contain the same data as the corresponding structures transferred through RPC. This allows the transmitting and receiving end to know the location to and from which to move data. Therefore, we generate the memory layout descriptions of these stub structures and notify the manager with their respective descriptions during the fuzzer instance’s initialization process.
In practice, we find that Syzkaller most frequently uses the Input and the PollRes structures during RPC data transfers. The former is described above and the latter is used to transfer synchronized inputs and coverage information from the host manager to the individual guest fuzzers. Therefore, we create fixed stub structure instances for both structures and implement their respective memory layout descriptions.
4.2 Efficiently and Consistently Transferring Data Structures
During the fuzzer’s execution process, Horus intercepts the RPC calls used to transfer and synchronize data between the manager and fuzzer instances and offloads the data into Horus’s transfer stubs. These transfer stubs are present on both fuzzer and manager instances to facilitate the movement of data to and from the respective stub structures. Since the fixed stub structures reside in the guest’s memory space, the manager needs to map its data to physical memory locations in order to facilitate transfers. Apparently, the methods for transferring to and from the stub structures are significantly different in the fuzzer and manager instances. Therefore, we propose the following algorithms to transfer data structures across memory space boundaries efficiently and consistently.
Transferring Data to the Manager: When the fuzzer instance transfers data to the manager instance, for instance when the NewInput() RPC call is invoked to send a new input to the manager, it performs the following procedure.
For the transfer stub on the fuzzer size, it first fills the corresponding
fixed stub structure with the data it wishes to transfer through a top-down manner. Specifically, the stub first assigns all
self-contained member variables and structures without external references, such as integers, boolean values and other nested structures bodies, with the desired values. Then, for member variables with external references, such as strings, byte arrays and slices of structures, the fuzzer copies the data in the original target buffers into the
fixed stub structure’s corresponding data buffers. Finally, the fuzzer modifies the length metadata of the stub buffer. Specifically, Go’s slices and strings are represented at runtime using a
*Header structure from the
reflect module. For instance, headers for string variables are named
StringHeader. This is also present in Figure
4, where the headers for strings and slices both contain pointers to the actual data buffer, the length of the buffer, and in the case of slices, the capacity of the buffer. We simply change the length value to modify its metadata. For composite structures or arrays of non-trivial data structures, the stub will then recursively perform the data transfer on the corresponding structures.
We illustrate this process in the upper half of Figure
5. To fill the
fixed stub structure for
Input structures, since there are no
self-contained member variables, the stub first sets all corresponding metadata values, specifically the length values for the string
Name and slices
Prog and
Cover. Next, it copies all relevant buffers to the stub’s corresponding buffers. This concludes the procedures on the fuzzer’s side.
For the manager to recover the structure data, the manager-side stub uses the memory layout descriptions of the stub structure to identify the locations to read from in a top-down manner. Specifically, the manager first retrieves the body of the data structure from the location indicated by its memory layout description, then proceeds to recursively recreate all its references. The lengths of the slices and strings can be recovered through reading the header structures in the structure body. When a virtually contiguous memory section in the guest’s memory space is physically discrete, the manager performs multiple reads from locations indicated by the sequence of ContiguousArea instances in the corresponding VirtualChunk description.
This process is illustrated in the lower half of Figure
5. To retrieve the
Input structure transferred from the fuzzer instance, the manager first reads the physical memory chunks that contain the body of the
Input stub structure. Then it allocates buffers to place the contents of the stub’s
Name,
Prog and
Cover buffers. Next, it copies the contents of the stub buffers into the corresponding buffers allocated in the previous step. Finally, the manager fixes the data pointers in the member variable’s slice and string headers to produce the target structure.
Transferring Data to the Fuzzer: When the manager sends data to a fuzzer instance, for example when the manager returns the result of a Poll() RPC request initiated by a fuzzer instance, which contains inputs and coverage synchronized from other fuzzer instances, the manager performs the following procedure:
Similar to the aforementioned inverse process, the manager fills the stub structures with the corresponding data. However, since the stub structure is in the guest VM’s virtual memory space, the manager needs to avoid modifying Data pointers in the header structures of strings and slices to avoid invalidating the stub structures. Therefore, the manager performs data transfers recursively using the following procedure. First, the manager copies the stub structure’s body into its own memory space. Next, it modifies all self-contained variables to the corresponding values. Then, it sets the length metadata of all member slices and strings by modifying their respective headers. Finally, it copies the body back to the stub memory locations. The manager then recursively conducts this procedure on the referenced data structures.
To retrieve the data at the fuzzer side, the fuzzer performs a deep copy of the stub structure, which copies all structure variables and their referenced objects. This procedure is much more straightforward since the manager has set the length metadata of slices and strings properly while maintaining the integrity of the pointers of the actual buffers.
This procedure is also demonstrated in Figure
5 with the flow of data reversed. Specifically, the manager first assigns the length metadata of all string and slice headers. Then, the manager copies the buffers of the
Name,
Prog and
Cover variables to the respective physical memory chunks of the stub buffers. Copying a buffer may require multiple memory copies, since, as explained before, a contiguous virtual memory chunk in the guest VM may not be physically contiguous. This completes the operations on the manager side. On the fuzzer side, the fuzzer simply performs a deep copy of the stub’s contents, producing the transferred
Input structure.
5 Implementation
Here, we discuss the implementation details regarding Horus’s adaptation Horus to Syzkaller, Moonshine and kAFL, including relevant details towards the data transfer primitives used, transfer routine implementation, and fuzzer-specific implementation details.
5.1 Syzkaller and Moonshine
We implemented Horus for Syzkaller and Moonshine. As Moonshine re-uses most of the RPC primitives from Syzkaller, we implement Horus for both fuzzers using the methods introduced in the previous section. We modified the manager and fuzzer’s RPC mechanisms to conduct transfers using Horus’s transfer mechanisms.
Specifically, as aforementioned, Horus intercepts the data transfer process of NewInput and PollRes structures, as these are the most used during fuzzing. To facilitate Horus’s transfer mechanisms, we inserted interception routines before the actual RPC invocations in both the manager and fuzzer processes. Specifically, we replaced relevant RPC calls that contains data sections with ones that exclude the relevant fields. When the fuzzer or manager process wishes to send an RPC call, Horus fills the fixed stub structures with the relevant data and invokes the relevant calls. When the receiving side processes this request, instead of extracting the data from the RPC request itself, it calls the Horus routines to recover the relevant data from the specified locations.
When allocating fixed stub structures, Go’s memory allocator lazily allocate pages to contain the structures, i.e., the pages are not allocated immediately and fully, but only when they are first accessed. Therefore, during the initialization process, Horus traverses all allocated pages to ensure that they are properly allocated, allowing both the host and guest to use the memory pages properly.
5.2 kAFL
While as aforementioned, kAFL mainly transfers linear data buffers to kernel buffers in the guest instance, thus potentially not having an overhead comparable to that of Syzkaller, we also implemented Horus for kAFL to understand the effects of Horus on these kernel fuzzers. kAFL mainly performs host-VM data transfers in the following two scenarios. First, the kAFL worker, which is the equivalent of the fuzzer component in Syzkaller, transfers inputs, which are linear byte buffers, to the guest instance’s predetermined byte buffers upon each execution cycle. In contrast, Syzkaller and Moonshine’s inputs consist of non-trivial, highly structured data, which encapsulates an entire system call sequence with all argument types and contents. Second, during the Redqueen stage of a given input seed’s fuzzing process, the worker retrieves comparison information extracted through hooking to each comparison instruction on the execution trace. It then identifies possible input positions that can influence the result of a comparison expression and guide input generation towards reversing the comparsion result.
In contrast to Syzkaller and Moonshine, where the fuzzer invokes manager routines through the use of RPC calls, in kAFL, the agent, which executes the payload on behalf of the worker in the target kernel, performs such duties through the use of hypercalls. Therefore, when adapting Horus to kAFL, the corresponding interception is performed before and after such hypercalls.
We implement Horus for the aforementioned two scenarios. For the former scenario, where the worker is transferring generated inputs to the guest instance, Horus intercepts the vanilla transfer routines and directly transfers the intercepted payload data to the location specified by the agent during initialization. For the latter scenario, when kAFL runs the Redqueen routines once for each new input before conducting traditional mutation operations, the addresses of each comparison operation is recorded into a shared memory buffer through hooks inserted into QEMU’s execution stream. When the agent finishes its execution, Horus on the worker side fetches the operands using methods similar to that discussed in the previous section from each comparison operation and returns the values for further use in Redqueen’s logic.
6 Evaluation
We evaluate
Horus’s effectiveness when adapted to kernel fuzzers
Syzkaller,
Moonshine and kAFL. To analyze and understand
Horus’s performance improvements over the original approaches, we propose and strive to answer the following research questions:
•
RQ1: Can Horus transfer fuzzing-relevant data faster than Syzkaller’s and Moonshine’s RPC mechanisms and kAFL’s Nyx-based transfer mechanisms?
•
RQ2: Can Horus achieve a empirically significant execution throughput improvement compared to using RPC in Syzkaller and Moonshine and using Nyx in kAFL?
•
RQ3: Does Horus assist kernel fuzzers achieve the same coverage as non-Horus versions faster?
•
RQ4: How does Horus affect the kernel fuzzer’s abilities in finding kernel bugs?
To answer these research questions, we designed the following experiments.
For Syzkaller and Moonshine, we first probe measure the round-trip-time (RTT) of sending a new input from the fuzzer to the manager when using Horus’s mechanisms and Syzkaller’s original RPC systems; then, we profile the processor execution time proportions for each component in both the fuzzer and manager instances during fuzzing to acquire the reduction in data transfer overhead through using Horus and examine whether Horus can improve the fuzzer’s execution throughput; next, we run Syzkaller, Moonshine, Syzkaller+Horus and Moonshine+Horus for 12 hours to compare their coverage statistics; finally, we examine the number of kernel bugs found through using Horus.
For kAFL, we first measure the average time for an input to be transferred to the agent under Horus and kAFL’s Nyx backend as well as the average time for Horus and Nyx to recover Redqueen’s required operands; we then collect the coverage statistics of kAFL with Horus and with Nyx over a period of 12 hours.
6.1 Experiment Setup
We conducted our experiments with Syzkaller and Moonshine on a server with an AMD EPYC 7742 64-Core processor, 512 GiB of memory and running 64-bit Ubuntu 20.04.2 LTS. The tested kernels are the mainline, stable and most-recent long-term versions, which, at the time of writing, are 5.16, 5.15 and 5.10, respectively. The kernels are compiled with Kernel Address SANitizer (KASAN) enabled to detect any kernel crashes. The fuzzers obtain kernel coverage using KCOV. We augment Syzkaller and Moonshine with Horus, which we refer them as Syzkaller+Horus and Moonshine+Horus, or Syz+Hor and Ms+Hor for short. For the comparison experiments, we configure the virtual machine instances for the four fuzzers with identical parameters. Specifically, each fuzzer instance has two virtual machine instances with 4 GiB of memory and two processor cores each. Each set of experiments is repeated five times where each individual experiment is executed for 12 hours on a dedicated core.
Our experiments on kAFL are conducted on a server with an Intel Xeon Silver 4210R 20-core processor with 32GiB of memory and running 64-bit Ubuntu 20.04.2 LTS. The host system’s kernel is replaced with Kernel 5.10.73-kafl+, the patched version provided by kAFL, which integrates Intel PT support for KVM. The kernel under test is Linux 5.15-4 with the kAFL agent installed as /arch/x86/kernel/kafl-agent.c. The kernel is compiled with KASAN support to detect any address violations triggered by kAFL. Each experiment is run with identical parameters and allocated the same amount of resources. Specifically, each kAFL instance is alloted with 1GiB of memory and 1 dedicated CPU thread. Each set of experiments is run for 12 hours. A total of five experiments were executed.
All statistics are then verified through the Mann-Whitney U test, as per fuzzing guidelines laid down by Klees et al. [
2018], to determine whether there exists statistical differences between the sets of data. The relevant statistics are shown in Table
4, where we analyze the values for each entry in their respective sections.
6.2 Data Transfer Speed
To answer RQ1, we measure the time it takes for kernel fuzzers to complete a data transfer operation.
6.2.1 Syzkaller and Moonshine.
We measure the round trip time statistics for data transfers between the manager and fuzzer instances in order to understand the data transfer speed speedups of Horus. The data transferred can be of different sizes, since system call sequences can have a varying number of system calls and length of the arguments, therefore, we also need to take into account the size of the transferred data in relation to its round trip time.
We measure these statistics by inserting timer probes before initiating and after the conclusion of the relevant RPC calls, including the relevant calls to Horus’s stubs. Since RPC calls itself have a considerable overhead, we also implemented an empty RPC call in the manager’s RPC server and measured its round trip time when called from the fuzzer instances.
The relevant results are shown in Figure
6. As we see in the graph, the regression lines have significantly different slopes, indicating that
Horus allows both
Syzkaller and
Moonshine to achieve a significant speedup in round trip time statistics. On average,
Syzkaller and
Moonshine achieve a 34.0% and 33.5% speedup in round trip time statistics when using
Horus. However, these numbers include the time for an RPC call itself, thus, when the amount of data transferred is relatively small, the benefits of using
Horus are overshadowed by the RPC call’s overhead itself. To obtain a clear picture of the actual speedup of
Horus, we take the base time for a RPC call into consideration. Considering that
Horus’s round trip time statistics for data transfers under 15KiB have negligible difference to the RPC call itself, we filter out data with a length of less than 15KiB to assess the actual speed of data transfers. In this case,
Horus achieves a speedup of 84.5% and 85.8%.
As shown in Table
4, we have tested these statistics using the Mann-Whitney U test and found that the
p values for
Syzkaller and
Moonshine are approximately 0.007 and 0.006, both less than 0.05, thus indicating that
Horus’s performance is statistically significant than that of vanilla
Syzkaller and
Moonshine. Thus, we can conclude that, for
Syzkaller and
Moonshine,
Horus is able to transfer data significantly faster than the RPC mechanisms provided by traditional fuzzers.
6.2.2 kAFL.
We measure the average delay of using
Horus and Nyx for performing data transfers both during executing a new input and when fetching comparison operands for Redqueen. The results are shown in Table
1.
As is evident in the chart, Horus unfortunately does not exhibit a significant difference when transferring input data to the agent during fuzzing. We believe that this is due to kAFL’s input being linear in nature, unlike the structured inputs in Syzkaller, therefore Horus’s designs that facilitate efficient structured data transfer does not offer any observable advantage. Our statistical tests tell the same story, with the p value in the Mann-Whitney U test being significantly greater than 0.05.
However, for the Redqueen scenario, Horus exhibits an average latency of 138.6 while kAFL/Nyx has 172.1, thus reducing latencies compared to vanilla kAFL by \(19.4\%\). We believe that this demonstrates Horus’s effectiveness when encountering structured data between the host and guest instances, as multiple offsets in a memory image is similar to that of a simple structure. The statistical tests show a p value of around 0.042, which indicates statistically significant differences between the data.
6.3 Execution Throughput
To answer RQ2, we measure the overall execution throughputs of each fuzzer configuration and determine whether Horus delivers an uplift to their fuzzing performances.
6.3.1 Syzkaller and Moonshine.
We first evaluate and compare the execution throughput of
Syzkaller and
Moonshine when using
Horus or RPC to perform data transfers. To this end, we gathered the number of executions of each kernel fuzzing trial over a period of 12 hours. For each fuzzer, we averaged over the total number of executions. The relevant results are shown in Table
2. As listed in the table,
Horus improves the execution throughput of
Syzkaller and
Moonshine by 31% and 30%, respectively. Statistical significance is established, as shown in Table
4, where the
p-values of both
Syzkaller and
Moonshine evaluations are below 0.05.
To verify that
Horus is the root cause for the respective performance improvements, we then profile the runtime compositions of various functional components during
Syzkaller and
Moonshine’s fuzzing campaign. Similar to the preliminary analysis in Section
3, we use
pprof to break down each fuzzer instance’s runtime performance. In particular, we focus our attention on the execution time for the RPC systems and data transfer mechanisms. A significant decrease in their execution time proportions will indicate that
Horus can effectively increase kernel fuzzer’s execution throughput by lowering the overhead of data transfer mechanisms.
The relevant results are shown in Figure
7(a) for the manager instance and Figure
7(b) for the fuzzer instance. Since
Moonshine’s runtime composition is very similar to that of
Syzkaller’s, we omitted the information from the plots for greater clarity. In this graph,
RPC represents the execution time proportions of the RPC systems itself, such as encoding call arguments, sending and receiving on network sockets, and so on;
Data represents the execution time proportions for
Syzkaller to perform data transfers, using either
Horus or the RPC systems for the
Horus-improved and original versions, respectively. The statistics obtained during the preliminary analysis are also included for comparison. Obviously, the execution time proportions of the RPC systems and data transfer mechanisms both decreased significantly with the use of
Horus, demonstrating that it is
Horus that lowered
Syzkaller and
Moonshine’s overhead, thus increasing their overall execution throughput statistics.
We further conducted an experiment to evaluate the individual effectiveness of each component in
Horus, i.e., exposing the guest’s memory space into the host and conducting transfers using
Horus’s transfer techniques. The results are shown in Figure
8. As is evident in the graph, using VMI to transfer the data blobs directly results in a significant reduction in overhead compared to using RPC calls to transfer the data blobs. However, the overhead still grows significantly as the size of the blob grows. In comparison, using full Horus with Syzkaller/Moonshine is significantly faster than transferring data blobs, either through RPC or VMI. The fluctuation in Horus’s transfer latency is due to the variable time it takes for the empty RPC call to reach its receiver. The actual time used during transferring data is negligible in comparison. Thus, we demonstrate that Horus’s transfer mechanisms deliver significant improvements in comparison to either using RPC or VMI to directly transfer the serialized or deserialized blobs.
Thus, for Syzkaller and Moonshine, we can answer RQ2 by demonstrating that Horus, through reducing the overhead for data transfers, helps state-of-the-art kernel fuzzers achieve better execution throughput.
6.3.2 kAFL.
We also measured the execution throughput statistics of kAFL with
Horus compared with vanilla kAFL. The statistics are given as follows: the average execution throughput of kAFL with
Horus are on average 102.3 executions per second, while that of vanilla kAFL is 100.2 executions per second. The statistical testing, as shown in Table
4, presents a value greater than 0.05, therefore their performance are within error margins of each other. Unfortunately,
Horus does not exhibit an impressive improvement over kAFL with Nyx. We believe that the reason still lies within the results in the previous evaluation, where kAFL’s inputs transferred to the guest instance are mainly linear buffers, therefore does not fully demonstrate
Horus’s transfer techniques’ capabilities. In addition, while Redqueen’s transfer speed are increased, it is only used during processing new seeds, which is sparsely scattered within the 12-hour fuzzing period, whereas for the most commonly used data transfer scenario, which is transferring the input buffer,
Horus does not show significant improvement. This result is within expectations, as
Horus mainly targets the transfer of structured data, while kAFL’s data transfer is, as aforementioned, linear data buffers, thus not demonstrating
Horus’s full potential, as reflected in
Syzkaller and
Moonshine’s statistics.
6.4 Coverage Statistics
We address RQ3 through examining their respective coverage growths and determining whether using Horus allows fuzzers to achieve the same coverage in a shorter amount of time.
6.4.1 Syzkaller and Moonshine.
We wish to evaluate whether the execution throughput speedup delivered through
Horus can assist
Syzkaller and
Moonshine in achieving better code coverage. Hence, we conduct fuzzing campaigns for
Syzkaller,
Moonshine,
Syzkaller+
Horus and
Moonshine+
Horus over a period of 12 hours on the three kernel versions. Each fuzzer’s campaign is repeated five times. We sample their coverage statistics every 10 seconds during their respective campaigns and take the average values. The overall results are presented in Table
3, and we show the plots of coverage over the duration of their respective fuzzing campaigns in Figure
9.
As listed in Table
4, we performed statistical testing on the results, and found that the kernels 5.10 on
Syzkaller, 5.10 on
Moonshine and 5.16 on
Moonshine have a
p value of over 0.05, while the other three do not. Upon further examination, we find that the
p-values are borderline. As the acceleration effects tends to lean towards fuzzing segments that contain frequent data transfers, we deduce that the effects can be more observable at the beginning half of the fuzzing run. Therefore for these entries, we calculated their respective p-values when considering their coverage statistics for the first 6-hours of the entire run. Thus, all six data sets yield
p values of less than 0.05, demonstrating a statistically significant improvement over vanilla
Syzkaller and
Moonshine.
The statistics show that: for Syzkaller, Horus assists the fuzzer in increasing coverage statistics for the Linux kernel versions 5.16, 5.15 and 5.10 by \(7.22\%\), \(11.71\%\) and \(8.39\%\), respectively, with an average of \(9.09\%\), at the end of 12 hours; for Moonshine, Horus increases the fuzzer’s coverage statistics by \(7.72\%\), \(8.93\%\) and \(7.24\%\), with an average of \(7.97\%\), respectively. However, comparing coverage statistics at the end of the campaign does not reveal the entire picture. As shown in the plot, Horus accelerates Syzkaller and Moonshine’s coverage statistics significantly before the 4-hour-mark. Furthermore, Syzkaller+Horus and Moonshine+Horus were able to achieve the same coverage as Syzkaller and Moonshine did over 12 hours significantly faster, leading to a speedup of \(1.6\times\) and \(1.6\times\) for Syzkaller and Moonshine, respectively. Intuitively, kernel fuzzers benefit from Horus’s design more when they frequently find test cases that trigger new kernel behavior, thus transferring newly found inputs more often to the manager. When new interesting test cases become rare, the coverage statistics of kernel fuzzers will converge to similar points with and without Horus. We believe that Horus’s mechanisms will benefit fuzzers more when they are capable of generating increasingly high quality inputs, thus being capable of exploring the kernel’s state space faster and reaping the benefits of using Horus even more so.
Therefore, for Syzkaller and Moonshine, we can answer RQ3 that Horus is able to increase the speed at which kernel fuzzers explore kernel state, allowing those fuzzers to cover kernel code more efficiently.
6.4.2 kAFL.
We also examined the coverage statistics of using
Horus on kAFL when compared to vanilla kAFL. The results are shown in Figure
10. As expected, due to no significant execution throughput increases observable when adapting
Horus to kAFL, we do not see a significant improvement of the coverage statistics over a duration of 12 hours, which is evidenced through our statistical testing, as shown in Table
4, where the datasets yielded a
p value of greater than 0.05. This is indeed as expected, as the previous two evaluations shows that
Horus does not deliver a statistically significant speedup when compared to vanilla kAFL. We will delve into the relevant details in Section
7.
6.5 Bug Detection
Though Horus does not aim to improve kernel fuzzer’s bug detection capabilites, we are interested in evaluating the effects of increasing execution throughput on the number of bugs found under a given time constraint. Thus, we collected the bug report data generated by Syzkaller, Moonshine, Syzkaller+Horus, and Moonshine+Horus over their respective fuzzing campaigns. We manually analyzed the bug reports to remove false-positives and duplicates. The results are as follows. Syzkaller and Syzkaller+Horus found 8 unique bugs in total, of which Syzkaller found 5 and Syzkaller+Horus found all 8. Moonshine and Moonshine+Horus found 11 unique bugs in total, of which Moonshine found 7 and Moonshine+Horus found 10.
Thus, we conclude that
Horus can help
Syzkaller and
Moonshine find more kernel bugs under a given time constraint, which adequately answers
RQ4. We also performed kernel fuzzing using
Horus for an extended period of time. Of the bugs found, we submitted reports and received confirmation for 5 previously unknown bugs, as listed in Table
5.
8 Conclusion
In this paper, we identify that data transfer overheads in kernel fuzzers affect their effectiveness in covering kernel code and detecting potential vulnerabilities. We propose
Horus, a kernel fuzzing data transfer mechanism that mitigates the synchronization overheads present in kernel fuzzers such as
Syzkaller and
Moonshine by circumventing their original data transfer mechanisms over remote procedure calls. In doing so, our approach provides a more efficient solution towards efficiently transferring highly-structured data between host to the guest instances. Specifically,
Horus exposes the guest fuzzer’s memory space to the host manager and sets up fixed stub structures in the fuzzer instances during their initialization processes.
Horus then registers the stub structures with the manager to allow for efficient transfers. When conducting transfers,
Horus’s stubs in both the manager and fuzzer instances use the stub structures to pass highly-structured and non-trivial data consistently and efficiently. Our evaluation shows that
Horus improves data transfer speeds by 84.5% and 85.8%, as well as execution throughput by 31.07% and 30.62% for
Syzkaller and
Moonshine, respectively. In addition,
Horus allows
Syzkaller and
Moonshine to achieve a speedup of
\(1.6\times\) and
\(1.6\times\) and increases their coverage statistics by
\(6.9\%\) and
\(8.2\%\) over a period of 12 hours, respectively. On kAFL,
Horus decreases the data transfer latency of its Redqueen component by
\(19.4\%\). To facilitate open research, we have open-sourced
Horus on Github (
https://github.com/Wingtecher-OSLab/Horus).