Defending mounted filesystems from the root user
Gabriel Krisman Bertazi recently posted a patch series adding support for negative dentries on case-insensitive ext4 and F2FS filesystems. Negative dentries cache the results of lookups on files that do not exist, accelerating subsequent lookups. Since this kind of operation happens frequently (consider, for example, iterating through a PATH environment variable to find an executable), this is an important optimization. Currently, though, negative dentries do not work with case-insensitive filesystems; this patch series rectifies that problem.
In the review discussion for this series, Eric Biggers asked about a specific check for the case where an inode shows up with the case-insensitive flag set, even though the filesystem has not been mounted for case-insensitive operation. This check was added by Ted Ts'o in 2019 to fix a crash experienced while fuzzing the filesystem. Biggers wondered why the test was placed at the inode's point of use rather than when that inode is first read from the disk.
Ts'o answered that the inode can change after it has been read into memory, in certain conditions:
It's not enough to check it in ext4_iget, since the casefold flag can get set *after* the inode has been fetched, but before you try to use it. This can happen because syzbot has opened the block device for writing, and edits the superblock while it is mounted.
One might think that the case of writing to a mounted filesystem behind the implementation's back would be one of those "don't do that" situations. It is not an action that is going to lead to a satisfying conclusion. There is, however, disagreement over what should be done about this case; Ts'o continued:
One could say that this is an insane threat model, but the syzbot team thinks that this can be used to break out of a kernel lockdown after a UEFI secure boot. Which is fine, except I don't think I've been able to get any company (including Google) to pay for headcount to fix problems like this, and the unremitting stream of these sorts of syzbot reports have already caused one major file system developer to burn out and step down.
Biggers replied that fixing problems caused this way is the wrong approach:
Well, one thing that the kernel community can do to make things better is identify when a large number of bug reports are caused by a single issue ("userspace can write to mounted block devices"), and do something about that underlying issue instead of trying to "fix" large numbers of individual "bugs".
He pointed out that Jan Kara has posted a patch set that addresses that issue by adding a configuration option to prohibit writing to block devices that are currently mounted. Applying this series — and configuring the kernel appropriately — would simply close off that entire avenue of attack and, Biggers said, make a large number of syzbot-reported bugs go away.
There is a minor problem or two with this approach, though. One is that, as Kara describes in the cover letter, enabling this option breaks a number of things in both kernel and user space, including Btrfs mounting, loopback mounts, and filesystem resizing. Fixing these problems is seemingly not overly difficult, but one cannot just enable this option in the kernel until they have been fixed, and those fixes have found their way onto deployed systems. That is a process that will take years.
Even then, this series will prevent writing to a mounted partition, but not to the device as a whole. If /dev/sda1 is mounted it cannot be written to, but /dev/sda (which covers the whole device, including the sda1 partition) is still fair game. And even if that were fixed, as Ts'o pointed out, there are other possible attacks, such as opening the SCSI-generic device and sending commands directly to the storage device. There is, it seems, always another way for a sufficiently privileged account to create mayhem.
Yet another problem is that, according to
Ts'o, the syzbot developers are
unwilling to turn on this configuration option unless disabling it
would be hidden behind a new CONFIG_INSECURE option (to indicate
that doing so would make the system insecure). Ts'o
objected to that positioning "because that's presuming a threat model
that we have not all agreed is valid
".
So, even if Kara's series is applied to the kernel, it is a partial (albeit worthwhile) fix that cannot be enabled in deployed systems for years, and which will not be enabled by the people running the fuzzers. Filesystem developers will be limited to occasionally fixing symptoms of the problem as they appear while dealing with floods of fuzzing reports and questioning the basis on which these reports are made. It seems fair to say that this is not a great situation for anybody involved.
The real problem, arguably, is that there is no consensus within the
community regarding the threats that the kernel should try to address. A
threat model that includes defending the system against its own root user
will require a huge hardening effort that many developers feel is both
impossible and pointless and which, in any case, does not have the funding
it would need to have a chance at succeeding. The subset of the community
that is pushing for this threat model thus finds itself in conflict with
the rest. Resolving that disagreement may turn out to be the hardest
problem of all.
Index entries for this article | |
---|---|
Kernel | Filesystems/Security |
Kernel | Security |
Posted Aug 21, 2023 17:51 UTC (Mon)
by dullfire (guest, #111432)
[Link] (9 responses)
In fact CONFIG_USB_CONFIGFS_F_FS could also be use as such (make "worse" by the fact that most distro will turn around and auto-mount your trojan "USB drive", without root even having to take that step).
In my humble opinion, this attempt is never going to work out, and there will always be glaring holes in attempts to "secure" a system that way.
It would be better(in that there's a possibility it might be achievable) to have a mode (sysctl/sysfs twiddle maybe) that prevent new processes with uid 0 (possibly enforced at exec time). Of course that would be hard. And probably abusive to the system owners well.
Posted Aug 21, 2023 17:55 UTC (Mon)
by dullfire (guest, #111432)
[Link]
Posted Aug 21, 2023 18:29 UTC (Mon)
by intelfx (subscriber, #130118)
[Link]
I'd imagine that "being abusive to the system owners" is rather the point of all this commotion.
Posted Aug 22, 2023 10:39 UTC (Tue)
by epa (subscriber, #39769)
[Link] (6 responses)
Posted Aug 23, 2023 2:57 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (5 responses)
* Open any file regardless of permissions.
Problem is, traditionally, you couldn't actually design an OS where root could only do things like the above. You also needed an interface for doing more complicated stuff, and especially for doing things in kernelspace (loading modules, debugging, enabling realtime scheduling, etc.). There are a few ways around this, at least that I can think of:
* We could try to partition off the kernelspace-modifying actions into a separate user, as you suggest, or at least into a separate set of capabilities(7) or the like. The difficulty is that you'd probably have to break up CAP_SYS_ADMIN for this to work, so it would be a lot of code churn. Ultimately, I think the existing capabilities would have to be fundamentally redesigned for this to make sense. It is not enough to split off a permission here and a permission there - we have to think logically about the transitive closure of everything that a process with capability X can ever do, directly or indirectly, and the current design does not even attempt to do that. And then we have to think about all possible combinations of capabilities, or at least all combinations that can plausibly interact with each other to escalate privileges.
Posted Aug 23, 2023 3:47 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Microsoft has a notion of "protected processes" that block every access to themselves, even from the Administrator user. Linux doesn't really have a similar thing. The root user can trivially ptrace any process.
Posted Aug 23, 2023 6:36 UTC (Wed)
by epa (subscriber, #39769)
[Link]
Posted Aug 23, 2023 9:20 UTC (Wed)
by Wol (subscriber, #4433)
[Link]
Going back to Pr1mos, the ONLY thing that was hard-coded into the OS (and even that could be patched out) was that user "system" could edit the root of the permissions tree. And not really even that - it simply set over-ride permissions, which I would often use when testing stuff ...
SPAC <system> wol:none
then I would run loads of stuff in testing that could cause carnage if I'd made a mistake, secure in the knowledge that the live system was not even visible to my program.
Cheers,
Posted Aug 27, 2023 14:04 UTC (Sun)
by Baughn (subscriber, #124425)
[Link]
I have a computer that doesn’t boot with Secureboot disabled. They seem to be getting more common.
At the moment, I’m still able to use it as a regular computer thanks to Linux not locking itself down hard enough to stop me modifying the kernel. If a rule like that was added, then i suppose it’s game over.
Posted Aug 28, 2023 14:03 UTC (Mon)
by jwarnica (subscriber, #27492)
[Link]
It's a weird mental model that "root is special, protect it". See: https://xkcd.com/1200/ In a more enterprisy sense: consider that some app team has full permissions to /var/lib/pgsql, but the OS team has root, so the app team needs to open a ticket to restart the server. Yah! I guess the app team isn't able to put a NIC in promiscuous mode, but who isn't using switches?
Presume that which ever human runs the kernel has access to everything; that is either tolerable trust or a massive breach depending on the organizational requirement. And then protect the kernels running, from each other. Harden the VM layer, harden the network layer, harden the APIs.
Posted Aug 21, 2023 19:56 UTC (Mon)
by leromarinvit (subscriber, #56850)
[Link] (7 responses)
I'm not sure I can see any workable solution for this other than carefully auditing all involved kernel code with that attack vector in mind. Even disabling USB or forbidding mounting removable media is not a 100% remedy, it only moves the bar up a bit (never mind such a setup not being terribly likely to be welcome for many use cases). A sufficiently motivated attacker might as well present this malicious device via a SATA or NVMe interface - maybe a bit more complicated and less ready-made hardware and software stacks to choose from, but not impossible.
Posted Aug 22, 2023 1:30 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
Posted Aug 22, 2023 7:45 UTC (Tue)
by geert (subscriber, #98403)
[Link]
I can easily imagine a small and cheap device with a USB host and a USB device connector, which sits between the computer and a USB memory stick, introducing (not so) random corruptions to data read from the memory stick to attack the host.
Posted Aug 22, 2023 8:53 UTC (Tue)
by pbonzini (subscriber, #60935)
[Link]
Posted Aug 22, 2023 16:03 UTC (Tue)
by zeno_kdab (guest, #165579)
[Link] (2 responses)
For external devices maybe an idea would be to just use unprivileged FUSE to mount? It seems rather unlikely to have a use case where you need maximum FS performance but at the same time can't trust your hardware...
Posted Aug 23, 2023 14:03 UTC (Wed)
by draco (subscriber, #1792)
[Link] (1 responses)
It's fair to say that in a scenario where you're computing in malicious environments that you must be able to trust some of your hardware — if you can't trust the CPU itself, you're doomed, sure. But with a trusted computing core and IOMMU, you can (in principle) mitigate malicious I/O if you write the drivers defensively.
Posted Aug 23, 2023 17:21 UTC (Wed)
by zeno_kdab (guest, #165579)
[Link]
Imho either you trust your hardware, and don't want your FS drivers to be slowed down by being implemented super defensively, always rechecking everything etc. Or you don't trust, but then you should be fine taking the perf hit by using FUSE or a VM to isolate the hardware handling from your host kernel.
Having said that, I always dream about a new OS kernel that transcends the monolithic/micro-dichotomy by easily allowing to move all kinds of driver into userspace and back ;)
Posted Aug 22, 2023 20:06 UTC (Tue)
by zorg24 (subscriber, #138982)
[Link]
Posted Aug 21, 2023 20:47 UTC (Mon)
by amarao (subscriber, #87073)
[Link] (13 responses)
Can the same thing be done for fs? Storage layer, which guarantee correctness of data for the next layer.
May be filesystem is like HTTP working with Ethernet frames...
Posted Aug 21, 2023 21:29 UTC (Mon)
by leromarinvit (subscriber, #56850)
[Link] (10 responses)
Network protocols are usually designed to be easy to validate (at least sane ones), and nothing terrible typically happens when you drop non-conforming packets sent from a buggy, non-malicious source. And yet, even in networking, the common approach used to be "conservative in what you send out, liberal in what you accept" until relatively recently.
File systems, OTOH, are usually designed with performance as the main goal, and malicious images aren't the top concern (and they certainly weren't when most of today's widely used file systems were designed). Also, resilience against corruption is a concern often diametrically opposed to strict validation - if corruption causes 10% of my files to contain garbage (or turns 10% of a single corrupted file into garbage), I'd much prefer my FS to give me the remaining 90% without fuss.
Posted Aug 22, 2023 11:57 UTC (Tue)
by pizza (subscriber, #46)
[Link] (9 responses)
And it's often only possible to tell if a given on-disk metedata structure is "conformant" after loading *every other* bit of metadata into memory and effectively doing a full consistency/fsck pass. Of course you're still vulnerable to stuff being written to disk behind your back, so the only way to handle that is to always keep the full metadata in memory, and never re-read anything from disk.
Posted Aug 22, 2023 13:39 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (8 responses)
Posted Aug 22, 2023 14:10 UTC (Tue)
by leromarinvit (subscriber, #56850)
[Link] (3 responses)
Posted Aug 22, 2023 16:08 UTC (Tue)
by DemiMarie (subscriber, #164188)
[Link]
Posted Aug 22, 2023 17:19 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
I don't know…the trust line has to go somewhere here. For example, Rust is not safe against `/proc/self/mem` editing. I'm not sure what one *could* do in the face of such power because the only thing you have is "my registers are not accessible" and "the program counter will keep moving".
Note that I am usually all about defensive programming and covering bases, but I also don't interface with hardware directly and have some baseline level of viable behavior. The tales I've heard here (and from linked blogs, etc.) make me happy about my course so far. I am extremely grateful for those that do that work, but I do not envy their jobs.
Posted Aug 23, 2023 17:13 UTC (Wed)
by leromarinvit (subscriber, #56850)
[Link]
I also should probably have qualified the "never crash" with "in a way that potentially allows privilege escalation". If removable media were by default mounted using something like lklfuse, that would IMHO be a big step in the right direction. But I think this should be mainlined, or decoupled from the actual driver code so much that it can use arbitrary kernel images or modules. Using different versions of the same fs driver (with a different set of features and bugs), potentially interchangeably on the same device, sounds like a recipe for compatibility issues.
Posted Aug 24, 2023 7:12 UTC (Thu)
by Karellen (subscriber, #67644)
[Link] (3 responses)
Posted Aug 24, 2023 12:33 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
Posted Aug 24, 2023 14:16 UTC (Thu)
by Karellen (subscriber, #67644)
[Link] (1 responses)
Posted Aug 30, 2023 9:45 UTC (Wed)
by taladar (subscriber, #68407)
[Link]
Posted Aug 22, 2023 1:49 UTC (Tue)
by geofft (subscriber, #59789)
[Link] (1 responses)
The reason the attacks here are about data in the superblock and not e.g. within an inode is because you can reasonably cache a little bit of data in from the block device and then validate it once you've read it. Maybe you load a page worth of data, and then you validate its layout, and then you can use the validated page. For instance, maybe there's a uint32 that specifies how long the filename is, which is restricted by spec to something more reasonable like 1024 bytes. If you've already copied the data into memory you trust, you can check it and then have other functions use it directly without worrying about them doing a kmalloc(4G).
For a network protocol parser (at any layer), that's all it does! It's received some bytes from the network into RAM, and then the authoritative copy of the data is in your own trusted RAM for you to handle as you like. You can parse it and interpret it and pass it on, or you can drop it. Then you get more bytes from the network. Even if you're receiving a large amount of data, you're handling one packet at a time, and each packet becomes fully yours when you receive it.
For filesystems, you have terabytes of data that you're repeatedly going back to. There's a lot of structure of superblock to directories to inodes to data. Not all of those blocks stay in memory. So maybe you've read a superblock once, determined that it's valid, and then it changes and for whatever reason the superblock is no longer in memory. Then the next function down the line might not get the same bytes that you validated. You can't copy the entire filesystem into RAM up front because half the point of a filesystem is to be bigger than what you can fit in RAM. You can't parse things as you receive them because you're doing random access.
You _can_ revalidate data each time you need it, but the argument being made is that writing code this way is a very unnatural and unpleasant experience.
Posted Aug 22, 2023 12:12 UTC (Tue)
by SLi (subscriber, #53131)
[Link]
To me, filesystems are in many ways an exceptionally nicely contained thing. They largely follow a well defined, narrow API with well defined semantics. Exceptions to it are probably fairly easy to express. Regardless of the filesystem, you can say things like "if I write a file, then read it back without other writes to the same file, I should get the same data (or an error) back".
That is, they seem exceptionally amenable to formal specification and analysis, and from that perspective, how they are designed today seems quite ad hoc. It shouldn't be as hard as with many other systems to actually formally define the operations (up to what gets written to the disk where) and verify that the requires properties hold, as well as do a lot of analysis on performance etc., play with different design ideas without needing to convert and boot kernels, etc. You could treat tolerance to bogus data in the same way, allowing a conscious decision on exactly how you are allowed to fail in different situations.
Now I'm not saying that should necessarily be the same as the code that gets executed (or even generated from it), but parts of it could well be if desired. Verifying the design should give quite a bit of confidence, and effort could be directed at the performance critical parts.
Posted Aug 21, 2023 21:07 UTC (Mon)
by flussence (guest, #85566)
[Link] (7 responses)
Posted Aug 22, 2023 0:36 UTC (Tue)
by kmeyer (subscriber, #50720)
[Link] (6 responses)
Posted Aug 22, 2023 1:28 UTC (Tue)
by Paf (subscriber, #91811)
[Link]
Posted Aug 22, 2023 2:04 UTC (Tue)
by geofft (subscriber, #59789)
[Link] (4 responses)
https://github.com/lkl/linux is a fork of the Linux kernel that turns all the interesting routines into a library, with a couple of neat tech demos of what you can do with it - including a FUSE wrapper for the filesystems in the kernel. So any filesystem that's already been implemented once, in the kernel, now has a userspace version.
There's also other ways to do it, such as UML or hardware-assisted virtualization.
Yes, you will lose some performance. I think the triangle of security, performance, and nicheness is a "pick two" situation - if you want both security and performance, you will need to attract enough interest and enthusiasm to pick up the work, possibly defining newer and easier-to-handle on-disk formats as the work happens. Otherwise, you can use an old implementation that made sense in the '90s at full performance with the security of the '90s, or you can use it at the performance of the '90s (which should be enough, honestly!).
(I'd also be very curious to see what the actual performance loss is even for day-to-day filesystems, and whether there are things that can be done to address performance like reviving the zero-copy FUSE patchset. I think I actually do very few things that are ridiculously sensitive to filesystem performance per se: most of the time I'm either working with large single files like giant CSVs or git pack files or game textures, for which the filesystem is essentially a constant factor and it's the raw I/O performance that matters, or reading and writing lots of small files like source code, which can mostly stay in the VFS cache, in theory. Applications that care very much about disk performance, like databases, tend to make a large contiguously-allocated single file anyway - and they subdivide it in userspace.)
Posted Aug 22, 2023 9:48 UTC (Tue)
by khim (subscriber, #9252)
[Link]
In a sane world we would have both. FUSE-filesystem to deal with USB or other untrusted sources and in-kernel implementation for root fs.
Posted Aug 24, 2023 10:29 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
The only significant gotcha is that which implementation to use (userspace or kernelspace) is not about the filesystem in use, but rather about the degree to which the backing storage and the user are trustworthy.
In one system, I might want to use both the kernelspace implementation of xfs for my root FS, using something like fs-verity to protect against a malicious root user, and the userspace implementation for home directories. For added fun, I might want the userspace implementation to run multiple instances, so that an exploit is less likely to affect other instances (only affects other instances if it can be used to write to the backing store); this comes in handy with something along the lines of Android's user-per-application model, where I won't be able to mutate in-memory state that affects another application.
Posted Aug 27, 2023 23:32 UTC (Sun)
by kmeyer (subscriber, #50720)
[Link] (1 responses)
Posted Aug 28, 2023 0:31 UTC (Mon)
by geofft (subscriber, #59789)
[Link]
But also I don't think punting major filesystems to FUSE is really out of the question. It was the vision of the microkernels of the '90s, which failed not because there was anything fundamentally wrong with microkernels but because overhead was high. We've learned a lot about writing efficient software that spans multiple address spaces since then (it's in many senses similar to HPC work or GPU programming), and also the physical computers are way faster. As I mentioned, without an actual benchmark, I think saying that this just has to be done in kernelspace is premature optimization.
(We also know a lot more about software fault isolation now than we did in the '90s - we could use something like eBPF or wasm or Native Client to keep these filesystems in the kernel but limit the impact of bugs.)
We Linux folks rightly make fun of Windows for having done font rendering in the kernel for so long and having had a bunch of ring-0 privilege escalation bugs as a result. It made sense in the '90s when they cared a lot about font rendering performance and basically not at all about malicious fonts; it doesn't make sense today. I don't think filesystems are a fundamentally different story.
Posted Aug 21, 2023 21:44 UTC (Mon)
by jengelh (subscriber, #33263)
[Link] (1 responses)
Well, syzbot just needs to find a code path which makes the filesystem implementation itself issue the destructive write. Then everyone will jump to fix it. :-)
Posted Aug 25, 2023 21:24 UTC (Fri)
by calumapplepie (guest, #143655)
[Link]
A filesystem failing to handle concurrent modification is less of a bug.
Posted Aug 22, 2023 4:41 UTC (Tue)
by ebiggers (subscriber, #130760)
[Link]
It is helpful to not conflate these two cases. This makes it clear why it's useful to e.g. forbid writes to /dev/sda1 while still allowing /dev/sda. Even just forbidding buffered writes would solve this problem; O_DIRECT writes could still be allowed.
Posted Aug 22, 2023 13:43 UTC (Tue)
by magfr (subscriber, #16052)
[Link] (1 responses)
I do not expect the kernel to handle that scenario.
Posted Aug 22, 2023 16:13 UTC (Tue)
by willy (subscriber, #9762)
[Link]
Posted Aug 23, 2023 5:03 UTC (Wed)
by mcassaniti (subscriber, #83878)
[Link] (2 responses)
Posted Aug 25, 2023 2:19 UTC (Fri)
by smammy (subscriber, #120874)
[Link] (1 responses)
Posted Aug 29, 2023 3:15 UTC (Tue)
by matthias (subscriber, #94967)
[Link]
In a way, this is the case. Think of the on disk partition table as a configuration "file" that tells the kernel how to configure its internal partition table. The API allows to re-read this configuration after userspace has changed it on disk. And it will fail, it the kernel thinks this is not safe to do. Back in the days, it was entirely impossible to re-read a partition table if any filesystem on the disk was mounted, always requiring a reboot if one modified the partition table on the primary disk. Nowadays it is a bit more permissive.
You just have to mentally differ between the kernel partition table (which is always in RAM) and the partition table on disk. Changing the latter one is no issue at all, as it will only be used by the kernel when explicitly told so or on the next boot. And this design makes a lot of sense. You can do modifications on disk that are only safe to apply after the next boot and then reboot. If the only ways of changing the on disk partition table where by means of an API that directly manipulates the internal partition table such changes would always require to boot another OS (rescue CD etc.).
Posted Aug 25, 2023 22:49 UTC (Fri)
by calumapplepie (guest, #143655)
[Link]
The threat model of "root is evil" is apparently a valid one supported by kernel_lockdown(7). However, it isn't valid unless the filesystem is booted up in lockdown mode: if it isn't, root can just use kmem and such. At a minimum, we could gate edits to devices containing mounted filesystems and on touching the SCISI_GENERIC device behind a requirement that the kernel isn't locked down. Kernel lockdown already breaks a number of things; what's a few more?
Alternatively, start with a CONFIG_LOCKDOWN_STRICT option, which when enabled tightens the restrictions of lockdown to prohibit such things as mounted block-device writes. For those users who require a root -> kernel barrier, they can enable that option, and with it some more restrictions that might break semi-niche application code. Yes, I'm considering online resizing to be 'semi-niche'. If you really require the security guarantees of a strict lockdown, you enable the config; otherwise, leave it disabled.
For those users who are just running a distro kernel, which enables CONFIG_LOCKDOWN_LSM but not STRICT because they want all the features available, this means that (for a period of time) they will be vulnerable to novel attacks using this threat model. However, the goal will be to move this patch into the basic CONFIG_LOCKDOWN eventually; thus fixing all such bugs. As we do so, we can add additional hardening behind CONFIG_LOCKDOWN_STRICT, for instance disabling a wider variety of sysfs files or locking down old drivers. You can also remove the ability to disable the lockdown LSM on the command line; a command line which can be edited for the next boot by root on most machines.
This two-phase mechanism ensures that those who want a strict lockdown will need to deal with the breakage that it causes in userspace. Those who don't need a strict lockdown, but enable lockdown anyways for hardening get to benefit over time from the work of those who need a stricter mode. It's similar to the realtime stuff; if you want a realtime kernel, you have to configure yourself a realtime kernel. If you want a kernel that actually blocks all ring0 compromise, then you have to build it yourself.
In other words: There are some folks who actually want this threat model secured, and many more who don't really care but appreciate the hardening it produces. Differentiate between the two with config options, document the difference in all the places that talk about lockdown, and let those who want it strictly secured deal with the breakage and performance regressions from it.
TLDR: Make the security model of the kernel a kconfig option, and limit features for those using the root-is-evil threat model until those features can be made secure.
Posted Mar 11, 2024 2:26 UTC (Mon)
by lathiat (subscriber, #18567)
[Link] (1 responses)
This would be a nice default anyway, even if it has some kind of override method for the weird cases.
Posted Mar 11, 2024 5:09 UTC (Mon)
by adobriyan (subscriber, #30858)
[Link]
or developers from running fio job with wrong filename=
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
* Impersonate any user with setuid(2) (or some equivalent).
* Send any signal to any process, and make other adjustments to the process's state (such as renicing it).
* Mount and unmount filesystems.
* And probably a few other highly standardized actions (i.e. *not* Linux-specific things) I've forgotten about.
* We could say "if you want to modify your kernel, either don't enable secureboot, or reimage your kernel with the appropriate changes pre-configured." The effect would be to disable the kernelspace-modifying actions altogether, and maybe even patch out their codepaths entirely so that they can't be used as ROP gadgets, but only in secureboot-enabled kernels (so that people who "just want a normal kernel" and don't want to put up with this sort of thing can ignore it). The main difficulty here is that, to my understanding, much of the existing "pre-configure your system" tooling currently lives in userspace (e.g. systemd). You'd probably need to provide a rich set of kernelspace configuration options that can be set before the system is first booted, and I'm not sure how feasible that is.
* We could partition off all of the "dangerous" permissions into a series of daemons like systemd and polkit, and administer the system by asking those daemons nicely to do it for us. That would extend secureboot trust to a much wider array of system services, which is probably undesirable (now your systemd has to be secureboot-signed?). OTOH, it's not like Microsoft maintains a strong segregation between the Windows NT kernel and the modern Windows userspace. From their perspective, I imagine the trusted component is a pretty large subset of the operating system, and I doubt they draw the line exactly at ring 0.
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
The difficulty is that you'd probably have to break up CAP_SYS_ADMIN for this to work
Exactly right. CAP_SYS_ADMIN is the "big kernel lock" of permissions. Or it's fcntl(). Or any other design that started out as a reasonable idea but became more and more overloaded and treated as a receptacle for anything and everything.
Defending mounted filesystems from the root user
SPAC <data> wol:none
Wol
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Anything less and you're just deferring discovering the bogus writes until the next mount time.
Why is that a problem?
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
you're just deferring "something edited my FS" problems from "direct memory access" to "when I load from disk next time".
I get that. I just don't see why it's a problem. Surely checking for consistency and deciding what to do if there's a problem is easier at mount time than it is while the filesystem is in use?
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
Defending mounted filesystems from the root user
:-(