Kernel development
Brief items
Kernel release status
The current 2.6 prepatch remains 2.6.12-rc5. Linus's git repository contains 200 or so patches; these are mostly fixes, but there is also a conversion of the IDE driver code to the device model, a new Broadcom bcm5706 gigabit driver, the removal of the Philips webcam decompression code, an IPv4 "alias promotion" feature (make a secondary interface address into the primary if the previous primary is deleted), and an updated CPU frequency subsystem.The current -mm tree is 2.6.12-rc5-mm2. Recent changes to -mm include the pluggable congestion avoidance modules patch, some filesystem namespace patches, some scheduler tweaks, and lots of fixes.
The current stable 2.6 kernel is 2.6.11.11, released on May 27.
The current 2.4 kernel is 2.4.31, released by Marcelo on May 31. 2.4.31 contains quite a few fixes and some driver updates, but new features are no longer being added to 2.4.
Kernel development news
The ongoing Philips webcam driver saga
Linus has just merged a patch from Alan Cox removing some of the new decompression code from the Philips webcam driver. "The original pwc author raised some questions about the reverse engineering of the decompressor algorithms used in the pwc driver. Having done some detailed investigation it appears those concerns that clean room policy was not followed are reasonable." The hope, at this point, is to merge an improved version of the driver in 2.6.13 which will support (properly reverse-engineered) decompression modules in user space.
Time to remove LSM?
The first organized kernel summit, held in 2001, included a presentation on the NSA Security-Enhanced Linux project. Linus's response at the time was that there were several projects out there trying to find the best way to harden Linux, and that he did not want to have to choose between them. Instead, he asked for the creation of a generic framework which would allow an arbitrary security module to be plugged into the system. The result, some time later, was the Linux Security Module framework; LSM provides a long list of hooks into kernel operations which allow a security module to veto any action which violates the rules it is implementing.The LSM patch ran into some difficulties on its way into the kernel, but it is now an established part of the internal API. So some developers were surprised recently when James Morris suggested that perhaps the time has come to remove the LSM framework. His arguments are simple: there is only one serious module using the LSM framework in the intended manner, while unrelated projects are trying to use it in inappropriate ways.
It's dead code, an unnecessary abstraction layer between its one real user, SELinux, and the core kernel.
James asks: rather than forcing SELinux to conform to a general-purpose API (of which it is the sole user), why not just wire SELinux directly into the kernel, get rid of LSM, and be done with it?
SELinux is not truly the only security module out there, of course. The kernel includes a couple of other modules: a reimplementation of the capabilities mechanism and "root plug," a module which prevents processes from running as root unless a specific USB device is plugged in. There are out-of-tree modules, such as the BSD securelevels patch and Trustees Linux. The Immunix (now Novell) AppArmor product includes a module which uses the LSM framework. AppArmor is a proprietary offering, but the security module portion of it is GPL-licensed (as is necessary, since the functions for loading security modules are exported GPL-only).
There does not appear to be a groundswell of support for the idea of removing the LSM framework from the kernel at this time. That could change over time, however: increasingly, out-of-tree code is held to be irrelevant when decisions are made. If SELinux remains the only significant in-tree user of the LSM framework, LSM will look like useless baggage to more and more developers. If there are security modules out there which are reasonable alternatives to SELinux, their developers may want to think about getting them into the mainline sometime in the not-too-distant future.
Files with negative offsets
Every open file on a Linux system has an associated offset - the current read or write position within that file. The virtual filesystem code, when dealing with file positions, performs some basic checks, such as ensuring that the position is not negative. After all, what sense does it make to talk about a file position before the beginning of the file?As it turns out, there is a situation where a negative file position makes sense. Special files (such as /dev/mem and /dev/kmem) provide a window into the system's main memory. The "position" within these files corresponds to the address of the memory of interest. The interesting thing is that, on the x86_64 platform, addresses can be negative numbers.
This comes about as follows: this architecture currently uses a 48-bit address space. The hardware sign-extends the uppermost bit, however, so any address with that bit set will turn into a negative number. The x86_64 Linux port uses the upper bit to mark kernel space, so kernel addresses are, in fact, negative. A quick look at /proc/kallsyms confirms this:
ffffffff80100000 T startup_32 ffffffff80100100 T startup_64 ffffffff801001a0 T initial_code ffffffff801001a8 T init_rsp ffffffff801001b0 T early_idt_handler ...
The end result is that using /dev/kmem on an x86_64 system is difficult; any attempt to seek into kernel space will yield an error.
The clear fix is to modify the VFS layer to let negative file positions be passed through to the underlying filesystem or device driver. The problem with doing that in a general way, however, is that not all code (especially in drivers) is prepared to deal with a negative offset. Suddenly exposing that code to negative offsets could open up no end of bugs and security problems. So the real solution, as worked out by Al Viro and Linus Torvalds, is to add a new flag for the file structure called FMODE_ANY_OFFSET. This flag can only be set within the kernel; user space has no access to it. So the /dev/kmem driver will be able to set the flag and work with the full range of offsets, but, for the rest of the system, nothing will change.
The beginning of the realtime preemption debate
Merging Ingo Molnar's realtime preemption work was never going to be a quiet process. The noise has, in fact, begun long before Ingo has even proposed his work for inclusion. Now might be a good time to catch up with the debate as a way of seeing how the arguments might go in the future.The realtime preemption patches attempt to provide a guaranteed maximum response time for high-priority user-space processes - just like a "real" realtime operating system would. This goal is achieved by making everything in the kernel preemptible. No matter what the kernel is doing on a given processor, if a higher-priority process becomes runnable, it will be scheduled immediately. Many changes are required to make the whole kernel preemptible; the core parts are:
- New locking primitives. The spinlocks used by the kernel can cause
any number of processors to stall while waiting for a lock to become
free. Code which holds a spinlock cannot be preempted, or a
deadlocked kernel could result. The realtime preemption patches
introduce a new mutual exclusion type (the rt_mutex) which does not
spin, and, thus, will not stall a processor. The spinlocks and
semaphores currently used in the kernel are all converted over to the
new rt_mutex type, and all code which runs with spinlocks held becomes
preemptible. The rt_mutex type also implements priority inheritance,
so that a low-priority process will not block a higher-priority
process (for long, at least) by losing the processor while holding an
important lock.
- Threaded interrupt handlers. Interrupt handlers can create latencies
by monopolizing the processor for long periods of time. The realtime
preemption patch moves interrupt handling into kernel threads, which
contend for the processor with all other processes in the system. If
a certain realtime task is more important than interrupt handling, its
priority can be set accordingly.
- Various other mutual exclusion mechanisms, including read-copy-update, per-CPU variables, and seqlocks, require that preemption be disabled. All of these mechanisms are changed for the realtime preemption mode, usually by making them look more like regular spinlocks.
The realtime preemption patch set (at version -RT-2.6.12-rc5-V0.7.47-10 as of this writing) is clearly large and intrusive - it would be hard to make fundamental changes like those listed above any other way. It should be noted that Ingo has gone out of his way to minimize this intrusiveness, however: the patch is written to minimize code changes, and the kernel functions as always if realtime preemption is not selected at configuration time. The merging of this patch set would not force the new preemption model on users.
According to Lee Revell, the realtime preemption patches are already seeing some serious use:
Certainly the discussions that inevitably follow the release of a new version of the patch set indicate that there is an active user community out there. Some members of the community are starting to wonder why the realtime preemption patches have not been merged, and when (if ever) that might change. The biggest reason is that Ingo has not yet requested that the patches be included - though many small pieces and fixes from the realtime patch set have found their way into the mainline. If and when Ingo does push for inclusion, however, there will be some opposition.
To some developers, the realtime patch seems like a set of questionable and widespread changes aimed at the needs of a very small user community. Changing spinlocks into mutexes and moving interrupt handlers into threads are fundamental changes to how the kernel does things with the potential for the creation of subtle bugs and performance problems. Reworking things and adding complexity at that level is not a task that should be undertaken without a strong need - and many developers do not see a sufficiently strong need.
There are some concerns about the performance impact of these changes. Acquiring an uncontended spinlock is a very fast operation; the rt_mutex type, with its wait queues and priority inheritance mechanisms, is bound to be slower. There is some anecdotal evidence that there is a performance hit to realtime preemption, but little in the way of real benchmarking has been done. In any case, the performance penalty should only affect users who have actually enabled the realtime preemption mode.
Finally, not everybody is convinced that the realtime preemption approach can solve the real problem: providing an ironclad guarantee that a realtime process will be scheduled within a given maximum latency. Ingo believes that this guarantee can be made by eliminating all code within the kernel which can delay a reschedule; others feel that, to make a guarantee that can truly be trusted, the entire kernel must be audited and verified. They have a point: how strong a guarantee would you want before running realtime Linux in your car's braking system?
Those who want true realtime guarantees, along with developers who simply do not want to clutter the kernel with realtime mechanisms, argue that a different approach should be taken. The most commonly suggested alternative is RTAI-Fusion, which works (at its core) by interposing a "nanokernel" between Linux and the bare hardware. The nanokernel guarantees latency by taking the lowest-level scheduling decisions out of the Linux kernel's hands; it is kept small and easy to verify. Another project taking a similar approach is Iguana, which is based on the L4 microkernel.
Since the realtime preemption patch is not being proposed for merging at this time, no decisions are likely to result from the current, lengthy discussion. If Ingo has his way, there may never be one big decision; instead, pieces of the patch will be merged if and when it makes sense.
There may be some interesting realtime-related sessions at next month's Kernel Summit in Ottawa, however. Meanwhile, should anybody wish to plow through the entire thread on linux-kernel, here is the starting point.
Patches and updates
Kernel trees
Architecture-specific
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Memory management
Networking
Security-related
Page editor: Jonathan Corbet
Next page:
Distributions>>