The beginning of the 6.13 merge window
Some of the most significant changes pulled in the first part of the 6.13 merge window include:
Architecture-specific
- The arm64 architecture can now run Linux in virtual machines under the Arm Confidential Compute Architecture.
- Arm64 also now supports user-space shadow stacks with the Guarded Control Stack feature.
- The s390 architecture has gained support for the resizing of virtual machines with virtio-mem. There is an overview in this commit message.
- Split-lock detection is now supported on AMD CPUs.
- There is now support for MIPS multi-cluster interrupt controllers.
Core kernel
- The PIDFD_GET_INFO ioctl() operation, which will fetch information about a process represented by a pidfd, has been merged.
- The io_uring subsystem has a new command, IORING_REGISTER_RESIZE_RINGS, that allows on-the-fly resizing of the submission and completion rings. This allows applications to start with a pair of relatively small rings, and grow them later should the need arise.
- The lazy preemption patches have been merged. This work greatly simplifies the kernel's preemption logic (and configuration options) while maintaining performance for throughput-oriented configurations. It is a significant change that should, eventually, reduce the amount of scheduling-related logic scattered through the non-scheduler parts of the kernel.
- Some preliminary work needed to implement proxy execution, an improved approach to the avoidance of priority inversion, has been merged. The proxy execution feature itself, though, has not yet landed.
Filesystems and block I/O
- There have been a few tries to implement fine-grained timestamps
for file metadata; the last one ran
aground at the end of 2023. Another attempt is being made for
6.13; this merge
message contains the details of how it works.
In short: as before, filesystems only need to track fine-grained change times for a given file if the time is being actively queried; most of the time, low-resolution timestamps are good enough. That is important, since lower-resolution timestamps do not need to be written back to persistent storage as frequently. The previous implementation ran into problems, though, where a low-resolution timestamp could appear to be earlier than a high-resolution timestamp, even though the actual changes happened in the opposite order.
In the new implementation, the kernel remembers the last fine-grained timestamp that was given out and ensures that any coarse-grained timestamps assigned for file modifications are later than the that last fine-grained value. This technique avoids the above-mentioned problem, ensuring that timestamps always correctly reflect the order in which files were modified.
See this documentation commit for more information.
- There is a new sysctl knob, fs.dentry-negative, that controls whether the virtual filesystem (VFS) layer deletes a file's kernel-internal directory entry ("dentry") when the file itself is deleted. It seems that some benchmarks do better when dentries are removed, while others benefit from having a negative dentry left behind, so the kernel developers have put the decision into the system administrator's hands. The default value (zero) means that dentries are not automatically deleted, matching the behavior of previous kernels.
- The statmount() system call has gained options to return the filesystem subtype, superblock source, and security mount options. There is also a new flag, STATMOUNT_OPT_ARRAY, that returns filesystem options as a series of NUL-separated strings and without the usual "\000" escaping.
- There have been some deep reference-counting changes within the VFS layer that yield a 3-5% performance improvement on highly threaded workloads; see this merge message for some details.
- It is now possible to assemble an overlayfs stack using file descriptors rather than path names; see this merge message for details.
- The tmpfs filesystem can now be mounted in a case-folding mode where file names are no longer case-sensitive. See this documentation commit for the relevant mount options.
- Limited support for atomic write operations has been added to the ext4 and XFS filesystems.
- There is a new set of system calls for the management of extended attributes: setxattrat(), getxattrat(), listxattrat(), and removexattrat(). They are variants of setxattr(), getxattr(), listxattr(), and removexattr() that include a directory file descriptor as the starting point for the path-name search.
- The new BTRFS_IOC_SUBVOL_SYNC_WAIT ioctl() command for the Btrfs filesystem will wait for the cleaning of one or more subvolumes. It is an unprivileged operation, and is intended to allow the "btrfs subvolume sync" command to work without privilege.
- Btrfs now supports performing encoded reads (reading of compressed extents directly, without decompression) via io_uring.
Hardware support
- Hardware monitoring: Renesas ISL28022 power monitors and Nuvoton NCT7363Y fan controllers.
- Miscellaneous: Marvell PEM performance-monitoring units, Airoha true HW random number generators, Broadcom BCM74110 random number generators, Renesas RZ/V2H(P) interrupt control units, and THEAD C9XX ACLINT S-mode IPI interrupt controllers.
Miscellaneous
- There is a new user-space API allowing administrators to set thermal thresholds on specific devices; notifications will be generated when a threshold is crossed. This commit gives an overview of the functionality, but the actual (netlink-based) API is uncompromisingly undocumented.
Security-related
- The SELinux security module can now manage policies for individual netlink operations; see this commit message for a terse overview.
- The /sys/fs/selinux/user configuration knob has been deprecated and will be removed in a future release.
Internal kernel changes
- There are now Rust abstractions for a number of VFS data structures and interfaces — enough to support the Rust implementation of binder.
- The bulk of the file-descriptor memory-safety work has been merged.
- The kernel's cryptographic subsystem has gained a new internal API for signature generation. There is some kerneldoc documentation available.
- There is a new variant of the sleepable RCU ("SRCU") API that makes the read side cheaper at the cost of more expensive write operations. Documentation for the new functions — srcu_read_lock_lite() and srcu_read_unlock_lite() — can be found by reading through this commit.
- The debugobjects subsystem has been massively reworked for better performance and robustness; see this merge message for details.
- The venerable dontdiff file has been removed from the documentation tree.
The 6.13 merge window can be expected to remain open through
December 1. That closing date is immediately after a significant
holiday weekend in the US, but past experience suggests that the 6.13-rc1
release will come out on schedule regardless. LWN will have an update of
the remaining changes from this merge window once it closes.
Index entries for this article | |
---|---|
Kernel | Releases/6.13 |
Posted Nov 21, 2024 17:52 UTC (Thu)
by josh (subscriber, #17465)
[Link]
I wonder if it would make sense to add a flag to unlinkat, which lets userspace indicate on a case-by-case basis whether the removed file is likely to be looked for again or not? Userspace is in a great position to know whether a filename is likely to be looked for again or not.
Posted Nov 21, 2024 22:58 UTC (Thu)
by l1k (subscriber, #112260)
[Link]
Author here. More accurately the goal of my patches was to move sign/verify operations out of akcipher and into a new, separate crypto algorithm type. akcipher is thus now solely for asymmetric encrypt/decrypt. Of note here is that the new sign/verify API uses kernel buffers, whereas akcipher uses sglists.
Herbert Xu started the transition to the new crypto algorithm type for sign/verify a year ago by introducing a frontend:
https://lore.kernel.org/linux-crypto/ZIg4b8kAeW7x%2FoM1@gondor.apana.org.au/
I completed that effort by adding a backend and migrating all asymmetric sign/verify algorithms to it:
https://lore.kernel.org/all/cover.1725972333.git.lukas@wunner.de/
We currently have 3 algorithms in the tree: RSA (only PKCS1 encoding), ECDSA (X9.62 encoding and from v6.13 also P1363) and ECRDSA (aka GOST). Signing is currently only supported by the RSA algorithm implementation. Verifying by all three.
Posted Nov 22, 2024 1:14 UTC (Fri)
by jalla (guest, #101175)
[Link] (2 responses)
Posted Nov 22, 2024 5:00 UTC (Fri)
by burki99 (subscriber, #17149)
[Link] (1 responses)
Posted Nov 22, 2024 8:17 UTC (Fri)
by koverstreet (subscriber, #4296)
[Link]
I'm a little burned out on it anyways.
unlinkat flag?
Internal kernel changes - crypto subsystem
The kernel's cryptographic subsystem has gained a new internal API for signature generation. There is some kerneldoc documentation available.
bcachefs?
bcachefs?
Kent‘s own take (he has a hard time acknowledging behavioral issues, instead he prefers pointing out technical ones): https://www.patreon.com/posts/trouble-in-116412665
I am sure the editors could say much more. It seems to be a very unfortunate situation where technical innovation might fail to come to the kernel due to the inability of its author to respectfully listen and cooperate with his peers.
bcachefs?