Abstract
Current memory architectures do not practice the principle of least privilege, resulting in the exposure of memory to data corruption and usurpation threats, collectively called memory based threats. The most famous of such threats is probably the buffer overflow. It has received considerable attention due to its relative ease of exploitation and the ease of exposure as seen by the sheer number of vulnerabilities.
The first great computer infection in 1988, the Morris Worm, made use of a buffer overflow exploit in the fingerd application as one of its methods of propagation [4]. Given the vastness and damage of the worm, one would think that in a short amount of time a solution would be developed that would inoculate the world's computers from such a sickness. However, in 2001, thirteen years after the Morris Worm, the Code-Red and CodeRedII worms would use another buffer overrun that would cost $2.6 billion [2]. This is a testimony to the ineffectiveness of the proposed solutions during that thirteen year period and the overall effectiveness of this attack scheme.
The problem that we still face is not the lack of solutions, but of a pervasive one that can be enabled and adapted easily by large populations and that actually solves the underlying problem.
For analysis purposes, we define memory based vulnerabilities as vulnerabilities that arise from the way that system memory is organized and used. They can then be binned into four large groups, uninitialized data, memory leaks, data usurpation and data corruption. Uninitialized data problems occur when an application fails to initialize objects to fail-safe defaults. Memory leaks occur when applications allocate memory but lose access to them, although the operating system can retrieve the memory after the application ends. Data usurpation problems occur when memory that should not be readable by an application is. Data corruption vulnerabilities are the ones where memory data is overwritten, and therefore corrupted, when it was not part of the application design to do so. For the rest of the paper, we will consider data corruption and focus on buffer overflows since it is by far the most common memory based vulnerability and poses the highest risk.
Data usurpation and corruption vulnerabilities exist because the CPU has ubiquitous access to all of memory, i.e. memory is world read, write, and executable by the CPU, which we consider a hardware issue since the CPU has instructions to read, write, and execute memory data. We consider uninitialized data and memory leak software issues because processors do not have built in constructs to allocate memory on the object level; this is all done at the application or software level. The common problem amongst the four of them is that the task of managing and protecting memory is left to the operating system, through the memory management unit which operates on pages, and the application level programmer, while actual memory accesses and controls are done by the CPU that operates on the native memory addressable size, which is accessible by all levels of applications.
This imbalance in the unit of protection in the operating system and the CPU and the reliance on the operating system to offer access control for memory explains many of the memory based vulnerabilities. For example, buffer overflows are successful because an application can overwrite all in-band data as long as the data is within the page and even across pages if the pages are contiguous, even though the write should be restricted to only within the buffer. Unfortunately, operating systems typically can only control access page by page.
Although buffer overflows are only one of many types of memory data corruption threats, the prevalence of buffer overflow vulnerabilities and the plethora of solutions make it a prime vehicle to investigate hardware based memory access controls. Hence, we propose adding metadata bits to memory for the purpose of providing access control to memory structures while maintaining backwards compatibility.
Two motivating factors are considered. First, current memory architectures, including Intel's new LaGrande Technology, rely on the operating system or a similar piece of software to ensure that processes can only access their own memory pages and on the application programmers to properly manage their memory space. Under such a scheme the operating system grants a process unlimited read and write privileges to a page, including the control flow data that only the processor should have access to. Furthermore, the operating system grants execute permissions by the page, leaving executable slack space that could be exploited. Second, a majority of solutions to memory corruption threats have been proposed as singular solutions and require too much user and end-user expertise, therefore limiting their adoption. For example, a majority of buffer overflow solutions do not address the memory corruption problem. They allow an overflow to occur, but attempt to prevent the intended result, such as control flow redirection.
One example of this situation is AMD's No eXecute bit (NX), used to mark pages of memory as non-executable. Intel's version of AMD's NX bit is called the Execute Disable Bit [1]. The capability of the processor to take advantage of this sort of functionality can be queried by the operating system that is running. When activated by setting the bit IA32_EFER.NXE, memory pages can be marked as not being executable. This is done by adjusting bit 63 in the corresponding page table entry for that page of memory. If the protection is running and an instruction fetch to a linear address that translates to a physical address in a memory page that has the execute disable bit set, a page fault exception will be generated. It can be seen from this example that a buffer overflow can allow arbitrary code to be written to memory, but no security breach occurs because the page is marked as non-executable. This sort of protection is very close to what is desired in protecting from memory based vulnerabilities: there is no effort required of the user, it incurs very little memory or performance overhead, and is backwards compatible with existing code.
To allow for that backwards compatibility, Intel decided to give the host OS the ability to turn this protection on or off. Windows XP Service Pack 2 and Windows 2003 Service Pack 1 contain patches to take advantage of this hardware feature by using what Microsoft calls Data Execution Prevention (DEP). Shortly after their debut, exploits began to be posted that easily sidestepped the mechanism [3]. In order to bypass DEP, a ret-libC style attack can be used to jump to a section of system code marked as executable that can then further be exploited to disable DEP and return into shell code stored in the original buffer. If a process could not disable NX support at runtime, this exploit would not work. It is this fact that makes it seem likely that a hardware based mandatory access control mechanism would be successful. Protection schemes at the hardware level should not be allowed to be circumvented by the OS. While it is understandable that some security tradeoffs may need to be made to allow for backwards compatibility, some basic security components should not. It is this idea that is at the heart of our concept.
In recent work, the idea of marking memory as non-executable has been extended to a finer grained approach. Secure Bit2 extends every memory location by one bit to add semantic meaning to each word of memory [5]. This bit is moved along with its associated word by memory manipulation instructions. Words in buffers between processes get their secure bit set while all others mark the secure bit at the destination register or memory location. Call, return, and jump instructions check the secure bit and if set, generate an interrupt or fault signal. Modifications are required at the kernel level to set the secure bit when passing a buffer across domains. Since the address validation is done in hardware, there is no performance overhead.
DIFT (Dynamic Information Flow Tracking) is another hardware-based tainting mechanism [6]. It offers two different security policies depending on the amount of protection needed and overhead willing to be incurred. Data from I/O channels is considered spurious and is marked as such. This data is stored in an extra bit and is propagated to other memory locations as it interacts with other pieces of data. DIFT makes additions to a standard processor to add the extra bits and proposes various schemes to minimize memory and performance overhead. The processor then checks the taint bits before executing an instruction or committing to a branch.
However, on some architectures, standard integer arithmetic instructions are indistinguishable from pointer arithmetic instructions. This poses a problem for taint bit propagation. Also, in the case of a properly coded program that first bounds checks a piece of data and then combines it with a trusted pointer, a perfectly safe piece of code could be marked as tainted. In this case, the operating system taint module could be invoked many times, resulting in a hefty overhead. There is also a format string attack that takes advantage of a little known printf feature that can bypass DIFT.
These hardware based solutions require the least user intervention and promise to be the most pervasive because changing the CPU or obtaining one through a new computer will mean that the solution is built in and it doesn't matter which operating system is used. The user can be the end-user, but that is only when a new system is purchased. For many people, this is done as often as every two years. For older systems, the user must be an administrator or technician since they must install and configure the new hardware.
The only required changes are to provide a tag for each byte of memory and change each processing instruction to enforce a simple policy based on each byte's tag. We concluded that tagging at the byte level is the necessary approach. This was chosen in favor of tagging each word, since memory architectures are byte addressable, even though most compilers use word boundaries. Although it does mean that there will be considerable storage overhead, it is compatible with virtual memory architectures. Given the desire to keep the solution as simple as possible, we determined that enforcing access control at the instruction level would be adequate.
Whereas these previously mentioned solutions stood up to many attacks that computer systems today cannot handle without extensive antivirus protections, they do not take into account the changing landscape of the industry. The difficulty in continuously increasing clock speed has caused CPU designers to resort to increasing the number of processor cores on a chip in an effort to maintain exponential growth in computing power. The result of this has been a programming and security nightmare. One processor core may lose track of what the others are doing and illegally access memory. It is a possibility for one core to be hijacked to operate maliciously. The challenge of merely maintaining consistency across memory has proved to be difficult. It is the purpose of this work to adapt byte level tainting to fit multicore architectures.
In order to catch input coming from an untrusted source (network, user, etc), modifications must be made to the OS in order to mark incoming data as tainted. The previous solutions vary in the granularity at which they taint data, as well as some of the specific policies they use to propagate this information and alert the user to possible security breaches, but these details are inconsequential when it comes to extending these ideas to multicore. The designers make a point to mention how these systems are orthogonal to memory and transparent to existing programs. All of the above solutions essentially add one or multiple bits to each memory location which is then moved around just as if the processor supported a wider word. No major conceptual changes are made to the memory hierarchy. If a particular multicore architecture is able to handle cache coherence issues, the hardware based solutions should work on it also. In other words, if a particular system is able to maintain data coherence across cores by adding another bit to each memory location, moving the taint bit around with its associated piece of memory (across caches and through memory), and maintaining a coherence protocol, then the multicore architecture should be able to take advantage of a hardware based protection scheme. A system that makes use of a cache coherence protocol would be able to move around data and its associated tag bits without even being aware that it was doing so (other than the odd length words).
In this paper, we explore the effectiveness of the various memory access control approaches for protecting applications executing on multicore architectures. Both single-threaded and multi-threaded application behaviors are considered, along with suggestions for how the memory access control approaches can be made relevant to the new multicore processors that are rapidly becoming ubiquitous.