Make Input/Scalar AutoCloseable; fix UDF resource leaks #15765

mfussenegger · 2024-03-26T10:14:44Z

Within the PolyglotScalar we create a GraalVM polyglot Context that
can allocate native resources, these native resources are not
automatically released - leading to potential leaks.

We didn't notice this with JS as embedded language, but trying to add
Python surfaced this issue as the thread leak detection in the test
suite triggered.

Relevant part from the Context.close docs:

Closes this context and frees up potentially allocated native resources. A context may not
free all native resources allocated automatically. For this reason it is recommended to close
contexts after use. If the context is currently being executed on another thread, then an
{@link IllegalStateException} is thrown. To close concurrently executing contexts see
{@link #close(boolean)}.

This is a first, but incomplete step towards addressing the problem:

Makes Input/Scalar closeable
Updates the PolyglotScalar to properly close the context

What's incomplete is to go through all the Input usages and actually
close them after use. (But it would already be an improvement over the status quo, and we could already merge this and iterate on it)

Within the `PolyglotScalar` we create a GraalVM polyglot Context that can allocate native resources, these native resources are not automatically released - leading to potential leaks. We didn't notice this with JS as embedded language, but trying to add Python surfaced this issue as the thread leak detection in the test suite triggered. Relevant part from the `Context.close` docs: Closes this context and frees up potentially allocated native resources. A context may not free all native resources allocated automatically. For this reason it is recommended to close contexts after use. If the context is currently being executed on another thread, then an {@link IllegalStateException} is thrown. To close concurrently executing contexts see {@link #close(boolean)}. This is a first, but incomplete step towards addressing the problem: - Makes Input/Scalar closeable - Updates the PolyglotScalar to properly close the context What's incomplete is to go through all the `Input` usages and actually close them after use.

mfussenegger · 2024-03-27T11:25:09Z

@seut @matriv @romseygeek thoughts on this approach?

Alternatives:

Add some kind of per-query Context that needs to be provided to Scalar.compile, and that can be bound to the life-cycle of components like CollectTask. But I'm not sure if this is any simpler
Introduce some kind of ContextsPool that uses thread-locals. They'd exist over the lifetime of a node. But this wouldn't solve the leaks, only restrict how much we leak - and is dangerous if accessed by scaling thread pools. Also wouldn't solve the thread leak triggered from the test framework if we add python udf test cases.
Variant of ^ but with manual acquire & release instead of thread-locals. Would have the advantage that we can properly clean up the contexts on stop and it would fix the thread leak issue in the test layer. But depending on when we close we'd still have the Context resources allocated over the lifetime of a node. If we release more eagerly, we need to add close calls all over the place, same as with the approach this PR takes currently.

Currently I'm tending to either the approach in this PR, or 3)

romseygeek · 2024-03-27T17:04:16Z

An alternative to option 1) could be to make TransactionContext implement AutoCloseable, and add a trackResource(Closeable c) method that just adds c to an internally held list, which is closed when the context's close method is called. Then PolyglotLanguage.getFunctionValue can take the transaction context as an argument, and register its language Context on it.

We create transaction contexts in a lot of places so I don't know if it would be any less invasive than making Input closeable. But it feels more like a lifecycle object somehow?

matriv · 2024-04-03T10:36:46Z

I kind of prefer @romseygeek's suggestion, or something similar. I'm sceptical regarding the Input solution, when I see this: https://github.com/crate/crate/pull/15765/files#diff-8b855ba352a12dde36505be36a145486feefcc303ab2f45e5c76019b1dd944f6R213
which means apart from going through existing usages, we don't have a "hard" api that enforces to use Input correctly (try-with-resources).

mfussenegger · 2024-04-03T11:50:54Z

I'm sceptical regarding the Input solution, when I see this: https://github.com/crate/crate/pull/15765/files#diff-8b855ba352a12dde36505be36a145486feefcc303ab2f45e5c76019b1dd944f6R213
which means apart from going through existing usages, we don't have a "hard" api that enforces to use Input correctly (try-with-resources).

There's no way we can force that something gets closed. This problem will exist no matter what.

So the main criteria here are:

how difficult it is to reason about and verify if something gets closed. The bigger the gap between acquire and release, the harder.
does it fit into the component from a responsibility perspective
can we piggyback onto something that already gets closed (like Task, where we also have leak detections in place - but that's afaik not available everywhere we need to evaluate functions)

The life-cycle & responsibility of the transaction context would fit, but looking at how and where we create transaction contexts I don't get the impression that it's much easier to reason about if and where they would get closed.

matriv · 2024-04-03T13:00:58Z

Apologies, I didn't phrase it correctly, of course you have to somehow use try-with-resources or explicitly call close(), I'm skeptical about the fine grain level, of following this, if it's added on Input. I haven't looked into how easy/difficult is to do something with TxContext, I'm only thinking about it from a high-level view. If doing it at a higher-level is not easy then let's go with the Input approach.

mfussenegger changed the title ~~Make Input/Scalar AutoCloseable~~ Make Input/Scalar AutoCloseable; fix UDF resource leaks Mar 26, 2024

mfussenegger force-pushed the j/closeable-input branch from 548a98d to 24344c5 Compare March 27, 2024 10:02

mfussenegger force-pushed the j/closeable-input branch from 24344c5 to 8adcabd Compare March 27, 2024 11:16

mfussenegger requested review from seut and matriv April 3, 2024 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Input/Scalar AutoCloseable; fix UDF resource leaks #15765

Make Input/Scalar AutoCloseable; fix UDF resource leaks #15765

Make Input/Scalar AutoCloseable; fix UDF resource leaks #15765

Are you sure you want to change the base?

Make Input/Scalar AutoCloseable; fix UDF resource leaks #15765

Conversation