Toward safe transmutation in Rust
Currently in Rust, there is no efficient and safe way to turn an array of bytes into a structure that corresponds to the array. Changing that was the topic of Jack Wrenn's talk this year at RustConf: "Safety Goggles for Alchemists". The goal is to be able to "transmute" — Rust's name for this kind of conversion — values into arbitrary user-defined types in a safer way. Wrenn justified the approach that the project has taken to accomplish this, and spoke about the future work required to stabilize it.
The basic plan is to take the existing unsafe std::mem::transmute() function, which instructs the compiler to reinterpret part of memory as a different type (but requires the programmer to ensure that this is reasonable), and make a safe version that can check the necessary invariants itself. The first part of Wrenn's talk focused on what those invariants are, and how to check them.
The first thing to worry about is bit validity — whether every pattern of bits that can be produced by the input type is also valid for the output type. So, for example, transmuting bool to u8 is valid, because every boolean value is stored as one byte and therefore is also a valid u8. On the other hand, transmuting a u8 to a bool is invalid, because some values of u8 don't correspond to a bool (such as, for example, 17). The next invariant to worry about is alignment. Some types must be aligned to a particular boundary in memory. For example, u16 values must be aligned to even addresses on most platforms. Converting from one type to another is only valid if the storage of the type is aligned to a large enough boundary for values of the target type.
Code implementing transmutation in any language would need to worry about bit validity and alignment, but there are also two requirements for safe transmutation that are unique to Rust: lifetimes and safety invariants upheld by constructors. Both of these are related to the way that Rust can validate programmer-specified invariants using the type system. If a transmutation would break Rust's lifetime tracking, it is invalid. But it could also be invalid if it let someone construct a type that does not have a public constructor. For example, many Rust APIs hand out guard objects that do something when they are dropped. If a programmer could transmute a byte array into a MutexGuard for some mutex without locking it, that could cause significant problems. So transmutation should also not be used to create types that uphold safety requirements by having smart constructors.
Still — if the programmer can ensure that these four criteria are met, transmutation can be quite useful. Wrenn gave the example of parsing a UDP packet. In a traditional parser, the programmer would have to copy all of the data in the UDP header at least once in order to move it from the incoming buffer into a structure. But UDP headers were designed to be possible to simply interpret directly as a structure, as long as its fields have the correct sizes. This could let the program parse a packet without any copying whatsoever.
So it would be really nice to have safe transmutation. This has prompted the
Rust community to produce several crates that provide safe abstractions around
transmutation. The two that Wrenn highlighted were
bytemuck
and zerocopy. He is the
co-maintainer of zerocopy, so he chose that crate to "pick on
".
Both of these crates work by adding a marker trait, he explained — a trait which has no methods, and only exists so that the programmer can write type bounds that specify that a type needs to implement that trait to be used in some function. The trait is unsafe to implement, so implementing it is essentially a promise to zerocopy that the programmer has read the relevant documentation and ensured that the type meets the library's requirement. Then the library itself can include implementations for primitive types, as well as a macro to implement the marker trait for structures where it is safe to do so. This approach works. Google uses it in the networking stack for the Fuchsia operating system, he said.
But zerocopy has a "dirty secret
": it depends on nearly 14,000 lines of
subtle unsafe code, Wrenn warned. Worse, most of this code is repeating
analyses that the compiler already has to do for other reasons. It would be more
useful if this kind of capability came built-in to the compiler.
"Project Safe Transmute"
All of this is what motivated the creation of "Project Safe Transmute", Wrenn said. That project is an attempt to bring native support for safe transmutation to the Rust compiler.
That effort is based around a particular "theory of type alchemy
", Wrenn
explained. The idea is to track whether all possible values of one type are also
possible values of another. For example, a NonZeroU8 can be converted
to a u8 without a check, but not vice versa. But determining this kind
of relationship automatically is trickier than it might initially appear.
Performing the analysis naively, by reasoning in terms of sets of possible
values, quickly becomes inefficient.
Instead, the compiler models a type as
a finite-state machine, Wrenn said.
Each field or piece of padding in the type becomes a state, with edges representing valid values.
Therefore all values are represented by a path through the machine, and can be
worked with using relatively straightforward algorithms, but the representation
does not blow up in size as a type gets more complicated.
With this theory in place, it was practical to implement this analysis in the compiler. So Wrenn and his collaborators implemented it, resulting in the following trait that is automatically implemented on the fly by the compiler for any two compatible types:
unsafe trait TransmuteFrom<Src: ?Sized> { fn transmute(src: Src) -> Dst where Src: Sized, Self: Sized; }
Since this work is integrated into the compiler, attempting to convert two types that are not compatible will give a custom error message explaining why. The compiler checks all four requirements Wrenn described previously — which is exactly the source of the next problem. How can the compiler know whether a user-defined type has safety requirements that are checked by a constructor? It can't, so it must conservatively assume that user-defined types cannot be the target of a transmutation (although they can still be the input to one).
This "isn't all that useful
", though. Transmuting things into
user-defined types was a requirement for the use cases Wrenn had discussed. It
turns out that often what people want is not safe transmutation, but
safer transmutation. So the people working on transmutation added an
extra generic parameter to the TransmuteFrom trait that the programmer
can use in order to promise the compiler that one or more of the safety
requirements is met, even if the compiler cannot prove that. The parameters are
Assume::VALIDITY for bit-validity, Assume::ALIGNMENT for
alignment, Assume::LIFETIMES for lifetimes, and Assume::SAFETY
for user safety invariants. Now, it is possible
to target user types by giving a Assume::SAFETY parameter to the operation:
#[repr(transparent)] pub struct Even { // The compiler doesn't know about the following, // but our code depends on this for some reason: // SAFETY: Always an even number! n: u8 } fn u8_to_even(src: u8) -> Even { assert!(src % 2 == 0) unsafe { TransmuteFrom::<_, Assume::SAFETY>::transmute(src) } }
It may seem as though requiring the use of unsafe to do transmutation represents
a lack of progress. But this design has the advantage that the programmer only needs to assert the
safety of the specific invariant that the compiler is unable to prove — the above code still
uses the compile-time checks for bit-validity, alignment, and lifetime problems.
So the work, which is
available for testing on nightly, doesn't make transmutation completely safe,
but it does provide "effective safety goggles
" to make sure that as much
as possible is checked by the compiler, and that therefore the programmer only
needs to check the things that are genuinely not possible for the compiler to
ascertain.
Future outlook
Wrenn ended by summarizing the future work needed to polish the feature: supporting dynamically sized types, adding an API for fallible transmutation, optimizing the implementation of the bit-validity checks in the compiler, improving the portability of type layouts, and finally stabilizing the work. He hopes that TransmuteFrom might have an RFC for stabilization in 2025, but said that it needed testing and feedback before that, and called on the audience to provide that testing. Whether users will find this API to be an improvement over the existing crates remains to be seen, but it seems clear that transmutation is too useful not to support as part of Rust itself in some way.
Index entries for this article | |
---|---|
Conference | RustConf/2024 |
Posted Oct 23, 2024 20:37 UTC (Wed)
by ju3Ceemi (subscriber, #102464)
[Link] (2 responses)
As far as I lnow, you can cast your char* buffer into a struct whatever* and then access the struct's fields with no copy.
Posted Oct 23, 2024 20:41 UTC (Wed)
by daroc (editor, #160859)
[Link]
The traditional parsers I was trying to distinguish are libraries that parse structures using actual parsing techniques — parser combinators, grammars, etc.
Posted Oct 23, 2024 21:01 UTC (Wed)
by intelfx (subscriber, #130118)
[Link]
Yes, and that's exactly the kind of UB-laden thing that is much harder to do (properly) than to talk about it. It is absolutely invalid to "just" cast an arbitrary buffer into a "struct whatever", unless multiple very specific conditions are met. It has to be done with utter attention to detail (*if* it even can be done under specific circumstances), and the article is precisely about offloading some of that utter care and attention to the compiler.
Posted Oct 24, 2024 6:26 UTC (Thu)
by philipptoelke (subscriber, #101554)
[Link] (1 responses)
Posted Oct 24, 2024 7:43 UTC (Thu)
by gspr (subscriber, #91542)
[Link]
Posted Oct 24, 2024 7:53 UTC (Thu)
by epa (subscriber, #39769)
[Link] (2 responses)
In that case the language could let you define a ‘check’ method as an extra step which runs after the constructor. When casting (‘transmuting’) an area of memory to an instance of the type, the check method is run. Then the requirement like ‘must be even’ can be expressed in code and the programmer doesn’t have to promise the compiler he or she has already checked it.
Posted Oct 24, 2024 19:01 UTC (Thu)
by ringerc (subscriber, #3071)
[Link] (1 responses)
In this case the type would want to be able to mark itself as able to be transmuted to, but only when the type itself is doing the transmuting. Essentially a private access marker trait. I don't know enough Rust to know if this is possible.
Posted Oct 28, 2024 7:43 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link]
You can sort of do that now:
But this is just a thin wrapper around transmute. It's sound, in this case, but if MyType is more complicated, then you have all the problems that the article describes. The compiler is not going to do very much for you here (the only check that the compiler performs in the above example code is to make sure that a pointer-to-u8 is the same size as a pointer-to-MyType, which is rather obvious anyway).
(If this looks like a lot of boilerplate code, bear in mind that Rust has macros, so you don't have to write all of this out repeatedly if you're doing it a lot.)
Posted Oct 24, 2024 9:52 UTC (Thu)
by fishface60 (subscriber, #88700)
[Link]
I hope this gets used more. You've still got to recheck the safety invariants when things change, but this should help when a type you're transmuting changes underneath you.
I think there would have to be an eventual goal of deprecating std::mem::transmute() though, since TransmuteFrom::<_, Assume::SAFETY>::transmute(src) is more work to use and people will tend towards using the simplest option, so the language should aim to make the best option the simplest to use.
Posted Oct 24, 2024 11:17 UTC (Thu)
by roc (subscriber, #30627)
[Link] (6 responses)
Posted Oct 24, 2024 20:30 UTC (Thu)
by heftig (subscriber, #73632)
[Link] (2 responses)
Posted Oct 25, 2024 0:49 UTC (Fri)
by roc (subscriber, #30627)
[Link] (1 responses)
Posted Oct 26, 2024 15:27 UTC (Sat)
by asahilina (subscriber, #166071)
[Link]
Posted Oct 25, 2024 15:54 UTC (Fri)
by jpab (subscriber, #105231)
[Link] (1 responses)
That is, if I'm in a scope where I can write a literal `Foo { x, y, z }` then I should also be able to transmute from an appropriate source to a Foo, without using `unsafe`, but relying on the compiler to apply all the checks.
If I'm in a scope where I can't directly construct a Foo (ie, to get one then I need to call some function like Foo::new()) then I clearly need unsafe. Though I'm not sure when I would want to do that anyway.
Visibility of direct construction is already an essential element of enforcing invariants in values. And since safe transmute already requires compiler magic to enforce the various structural safety constraints it could presumably check visibility as well.
There is some care needed because you need to have visibility not only to directly construct a value of the target type but also to directly construct values of each field type (and of their fields recursively).
Posted Oct 25, 2024 17:25 UTC (Fri)
by jpab (subscriber, #105231)
[Link]
Posted Oct 29, 2024 23:21 UTC (Tue)
by riking (subscriber, #95706)
[Link]
Posted Oct 27, 2024 21:16 UTC (Sun)
by amarao (subscriber, #87073)
[Link]
Cast
Cast
Cast
zerocopy
zerocopy
Constructor checking
Constructor checking
Constructor checking
In this case the type would want to be able to mark itself as able to be transmuted to, but only when the type itself is doing the transmuting.
// Simple one-field struct as an example, can be replaced with a more complicated struct.
#[repr(C)] // But it does need to be repr(C) or repr(transparent)
struct MyType(u64);
use std::convert::TryFrom;
impl<'a> TryFrom<&'a [u8]> for &'a MyType {
type Error = &'static str; // XXX: In a real codebase, use a proper Error type and not a string.
fn try_from(x: &'a [u8]) -> Result<Self, &'static str> {
if x.len() != std::mem::size_of::<MyType>() {
return Err("Wrong size!");
}
let ptr: *const MyType = unsafe { std::mem::transmute(x.as_ptr()) };
if ptr.is_aligned() {
Ok(unsafe { &*ptr })
} else {
Err("Not aligned!")
}
}
}
// Implementation for &'a mut [u8] omitted because it's nearly identical.
// Now safe code can call try_from() and try_into() for this conversion.
Should aim to deprecate std::mem::transmute()
> represents a lack of progress. But this design has the advantage that the
> programmer only needs to assert the safety of the specific invariant that
> the compiler is unable to prove — the above code still uses the
> compile-time checks for bit-validity, alignment, and lifetime problems.
Transmute to user-defined types with all public fields?
Transmute to user-defined types with all public fields?
Transmute to user-defined types with all public fields?
Transmute to user-defined types with all public fields?
Transmute to user-defined types with all public fields?
Transmute to user-defined types with all public fields?
Transmute to user-defined types with all public fields?
garbage in, garbage out