Description
oxc-project/oxc#8437 surfaced unsoundness in SemanticBuilder
- a use after free bug.
@bluurryy also hit this previously while working on oxc-project/oxc#6668 (discussed on Discord).
The root cause of that problem is the lifetime extension in Visit::alloc
(which all the walk_*
methods call to create AstKind
s).
The problem
Looking at how the lifetime extension causes oxc-project/oxc#8437:
SemanticBuilder::build
takes a &Program<'a>
(unspecified lifetime on the &
borrow of Program
) and returns SemanticBuilderReturn<'a>
:
Tracing the effect of this through the different types:
SemanticBuilderReturn<'a>
containsSemantic<'a>
.Semantic<'a>
containsAstNodes<'a>
.AstNodes<'a>
contains aVec
ofAstNode<'a>
.AstNode<'a>
containsAstKind<'a>
.AstKind<'a>
contains&'a
refs to AST nodes (including&'a Program<'a>
).
So the end result is that SemanticBuilder::build
borrows the Program
only for the duration of build
(i.e. the borrow ends when build
returns). But it's returning SemanticBuilderReturn<'a>
which ultimately contains a &'a Program<'a>
. i.e. the &Program
in input params only guarantees the Program
lives for 'very_short_time
but then it returns a reference &'a Program
which it claims will live as long as the allocator.
So nothing stops you dropping the Program
while retaining a reference to it = use after free.
Solution for SemanticBuilder
In the case of SemanticBuilder
, I think the fix is fairly simple. We can make SemanticBuilder::build
take a &'a Program<'a>
. This makes the lifetime extension in Visit::alloc
valid, and removes the unsoundness.
See oxc-project/oxc#8437 (comment).
Broader solution
The problem is wider than just SemanticBuilder
, though. Visit::alloc
is, in general, unsound. And that means Visit::enter_node
and Visit::leave_node
are unsound too.
The question is how to fix it in a way that's ergonomic.
The "correct" approach is to add a 2nd lifetime to AstKind
:
/// `'a` is lifetime of the AST nodes.
/// `'r` is lifetime of the *references* to the AST nodes.
enum AstKind<'a, 'r> {
BooleanLiteral(&'r BooleanLiteral),
NullLiteral(&'r NullLiteral),
NumericLiteral(&'r NumericLiteral<'a>),
// ...
}
But then this probably also requires a 2nd lifetime on Visit
:
pub trait Visit<'a, 'r>: Sized {
fn enter_node(&mut self, kind: AstKind<'a, 'r>) {}
fn leave_node(&mut self, kind: AstKind<'a, 'r>) {}
// ...
}
or maybe we can avoid that by putting the lifetime on Visit::enter_node
instead:
pub trait Visit<'a>: Sized {
fn enter_node<'r>(&'r mut self, kind: AstKind<'a, 'r>) {}
fn leave_node<'r>(&'r mut self, kind: AstKind<'a, 'r>) {}
// ...
}
Either way, this is not at all ideal. In the linter (main user of AstKind
), the current single-lifetime AstKind<'a>
is fine, as linter immutably borrows the whole AST for the duration of it's operation, and introducing a 2nd lifetime complicates all that code for no gain.
Adding a 2nd lifetime to Visit
would be really annoying. We use Visit
in a lot of places, and most of them don't use AstKind
or enter_node
/ leave_node
so, again, it's a complication for no gain.
I'm not entirely sure of solution, but maybe we could:
- Make
AstKind
a linter-only thing. - Remove
Visit::enter_node
+leave_node
. Or make them receive anAstType
instead ofAstKind
, likeVisitMut::enter_node
does -AstType
has no lifetime at all. SemanticBuilder
can createAstKind
s itself, and handle lifetimes correctly internally.
In any case, SemanticBuilder
should not be using enter_node
/ leave_node
because it hurts performance (#72). But that refactor will be a fairly large task.
Battle plan
I suggest:
- Solve ast: Out of bounds memory access crash in AST Vititor oxc#8437 first with simple fix of making
SemanticBuilder::build
take a&'a Program<'a>
. - Tackle the broader problem later.
- In meantime, try to remove use of
enter_node
from the codebase as much as possible (it's not used much, and most usages outside ofSemanticBuilder
can be removed with simple refactoring).