8000 Go: [BREAKING] explicit destruction for futures and transactions by gm42 · Pull Request #12052 · apple/foundationdb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Go: [BREAKING] explicit destruction for futures and transactions #12052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

gm42
Copy link
Collaborator
@gm42 gm42 commented Mar 26, 2025
  • Explicit destruction for futures and transactions.
  • Introduction of context for transactions and futures.
  • Simplification of Future interface and internal value caching

This is a breaking change

This change makes the Go binding stop relying on Go's GC finalizers and instead use explicit Close() calls to release resources, which is more Go-idiomatic; additionally, context.Context is added to function signatures and supported through transaction cancellation.

NOTE: once merged this will be a breaking change for users because memory leaks would start appear on their FoundationDB clients, unless Close() is called for futures, transactions and range iterators.

Code-Reviewer Section

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

I have not yet updated test code as I'd like to discuss this change first.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

return
}

kv = ri.kvs[ri.index]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: on next advance call, this slice is thrown away. I'd like to address this in this PR, or a separate one, by having Advance() return the whole slice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up switching from Advance()< 10000 /code>/GetSlice() to NextBatch()/Get()

Copy link
Contributor
@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Could we update the test cases to make use of the new close method (and directly document there how to use them)?

@@ -88,7 +88,9 @@ func (opt TransactionOptions) setOpt(code int, param []byte) error {
}, param)
}

func (t *transaction) destroy() {
// Close will destroy the underlying transaction.
// It must be called after all the transaction-associated futures have been closed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we document what happens if the transaction is closed before all the associated futures have been closed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact destroying the transaction causes the transaction (and all its futures) to be cancelled, so this needs some re-thinking/re-writing as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make some attempts and submit a change; I am thinking to return a "closer" function that caller can defer.

Copy link
Collaborator Author
@gm42 gm42 Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've submitted a version of this; will update all tests code.

This change also introduces an implementation of context with automatic cancellation, as I thought it would be a good time to break the function signature only once.

@gm42 gm42 force-pushed the main branch 3 times, most recently from 20a73a5 to 1e768c3 Compare March 31, 2025 07:58
@johscheuer johscheuer closed this Apr 1, 2025
@johscheuer johscheuer reopened this Apr 1, 2025
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux CentOS 7

  • Commit ID: 1e768c3
  • Duration 0:22:16
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 1e768c3
  • Duration 0:31:43
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 1e768c3
  • Duration 0:35:50
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 1e768c3
  • Duration 0:39:30
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 1e768c3
  • Duration 0:40:00
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 1e768c3
  • Duration 0:45:09
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 1e768c3
  • Duration 1:01:18
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@gm42
Copy link
Collaborator Author
gm42 commented Apr 1, 2025

@johscheuer I have fixed the build errors in the directory layer; please trigger CI again

@gm42 gm42 requested a review from johscheuer April 1, 2025 16:35
@jzhou77 jzhou77 closed this Apr 2, 2025
@jzhou77 jzhou77 reopened this Apr 2, 2025
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux CentOS 7

  • Commit ID: b68c700
  • Duration 0:22:28
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: b68c700
  • Duration 0:40:40
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: b68c700
  • Duration 0:48:30
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: b68c700
  • Duration 0:55:18
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor
6D40

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: b68c700
  • Duration 1:00:12
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: b68c700
  • Duration 1:01:10
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: b68c700
  • Duration 1:06:03
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@gm42
Copy link
Collaborator Author
gm42 commented Apr 4, 2025

Thanks for triggering the CI @jzhou77.

I guess next step is to discuss the changes

@johscheuer johscheuer requested a review from vishesh April 4, 2025 15:43
@gm42
Copy link
Collaborator Author
gm42 commented May 12, 2025

I also suggest that the Future APIs should be reworked so that MustGet or Get close the future.

Yes, I reached the same conclusion last week.

I think the following would be best:

thanks for the feedback @Semisol!

For parity, all futures' Get (except internal ones) are updated to cache return values.

This seems doable, I will try to add it to this PR.

Each future has a new sync.RWMutex guarding f.ptr. f.ptr is set to nil after closure. This is only relevant for internal use

Not sure about this one; will try to find out whether it's the only option or not, as I change more code.

The Destroy and Cancel methods are merged into Cancel.

I think you mean Close and Cancel; however, if you look at the C API there are specific behaviors associated to them. Will eventually remove Destroy if it's not necessary anymore for the user to call it.

  • Get will automatically close the future after values are ready.

Yes, this is doable thanks to the callback. I will instrument it further.

  • Developers are advised to do Cancel if they determine a future will not be used, or if they are not sure if it will (after any potential users go out of scope).

I think the current pattern with Close() matches what we do in Go in other cases (for example, with HTTP response bodies) and I would stick to it. Rather than asking developers to conditionally use Cancel(), it's safer to tell them to always use Close() and we have the method do nothing when there's nothing to do.

Will submit some changes later as I rewrite part of the current ones and add context.Context also to the future get functions

@gm42
Copy link
Collaborator Author
gm42 commented May 13, 2025

Will submit some changes later as I rewrite part of the current ones and add context.Context also to the future get functions

I just submitted the first part of this rewrite:

  • use a channel instead of sync.Mutex to wait for callback (will favour integration with context.Context)
  • add context.Context to future Getter functions

What's left to be done:

  • align futures to internally-cached approach
  • release future when the callback re-enters
  • always destroy instead of using fdb_future_release_memory

@gm42 gm42 force-pushed the main branch 2 times, most recently from 3f302af to 5d8b534 Compare May 13, 2025 14:10
@gm42
Copy link
Collaborator Author
gm42 commented May 13, 2025

I've completed my changes; a couple methods have been removed from the Future interface, I think this way it's easier to use it.

By the way, the C FDB API implementation does indeed treat destroy and cancel as almost the same:

extern "C" DLLEXPORT void fdb_future_cancel(FDBFuture* f) {
	CATCH_AND_DIE(TSAVB(f)->addref(); TSAVB(f)->cancel(););
}

extern "C" DLLEXPORT void fdb_future_destroy(FDBFuture* f) {
	CATC
F438
H_AND_DIE(TSAVB(f)->cancel(););
}

@gm42
Copy link
Collaborator Author
gm42 commented May 13, 2025

If we want to further simplify this:

  1. the Close() method can be removed
  2. when a future is created, the Get() is called in background (this means the constructor for the future will need a context)

Pros:

  • no need to call Close()

Cons:

  • context in future constructor (although should be already there for the transaction)
  • user loses control on when Go memory is allocated (large values will be fetched and stored even if ultimately unnecessary)

I do not like this direction because it would be wasteful by design; if we really want to pursue it, it would be best to do it only for the scalar types (not the KV or K arrays)

@Semisol
Copy link
Semisol commented May 13, 2025

I have done some of my own changes here, with some opiniation: https://github.com/semisol/foundationdb

I'll look into the changes here and get back with feedback soon.

(I have removed contexts from Transact and ReadTransact as well in my fork. I think the current design of that is quite iffy, abstracting a normal transaction, and a transaction provider that does retries as one.)

@gm42
Copy link
Collaborator Author
gm42 commented May 14, 2025

I think the current design of that is quite iffy, abstracting a normal transaction, and a transaction provider that does retries as one

I noticed that, but rather than an even bigger change I opted for adding context without pulling that apart. There is no performance/correctness concern other than it does not look elegant.

@gm42
Copy link
Collaborator Author
gm42 commented May 14, 2025

I have done some of my own changes here, with some opiniation: https://github.com/semisol/foundationdb

I'll look into the changes here and get back with feedback soon.

I checked your branch; aside from undoing cancellation and context I think it introduces transaction leaks (with no finalizer and no closer, the transaction will not be closed). I saw you wanted to counteract that by closing the transaction in the retryable, but not sure that you can retry anything if the transaction has already been closed.

Looking forward to your feedback!

@gm42
Copy link
Collaborator Author
gm42 commented May 14, 2025

Let me write what is (at least for me) an interesting problem regarding closing transactions: there are basically two use cases to support:

  1. opening a transaction, creating some futures, resolving them and getting their values before commit or close
  2. opening a transaction, creating some futures, committing the transaction, then resolving futures and getting their value, then closing the transaction.

In (2), closing the transaction causes all futures to be cancelled, thus it can only be done after user is done using the futures.
I think (2) is a valid use-case and we should not stop supporting it because it allows to commit a transaction and transfer the resolved futures' values from C memory to Go memory only as needed, after the transaction commit. This allows to complete a commit very quickly and process the gathered results afterwards.

Note: the lifecycle is similar for both transactions that will be committed and transactions that will not be committed: all transactions must be destroyed, even those which are not committed, the only variation is the commit happening before destruction. When doing read-only operations, transactions are always destroyed but never committed.

@Semisol
Copy link
Semisol commented May 15, 2025

I checked your branch; aside from undoing cancellation and context I think it introduces transaction leaks (with no finalizer and no closer, the transaction will not be closed). I saw you wanted to counteract that by closing the transaction in the retryable, but not sure that you can retry anything if the transaction has already been closed.

retryable returns only when it is not possible to retry anymore, so closure in that case is correct.
Instead, there should be tr.Reset(), to restart the transaction. I will push a commit with this soon.

(2) opening a transaction, creating some futures, committing the transaction, then resolving futures and getting their value, then closing the transaction.

This should be avoided in most cases, as you cannot retry once you are outside of the transaction, except with application logic. At that point, the user should use the CreateTransaction API and integrate it into their application logic.

I am not against supporting such an API though, but I believe it should be distinct.

@gm42
Copy link
Collaborator Author
gm42 commented May 15, 2025

retryable returns only when it is not possible to retry anymore, so closure in that case is correct.

It is not correct IMO because closing a transaction implicitly cancels all associated futures, which can be used after transaction commit.

This should be avoided in most cases, as you cannot retry once you are outside of the transaction, except with application logic. At that point, the user should use the CreateTransaction API and integrate it into their application logic.

I am not against supporting such an API though, but I believe it should be distinct.

Futures are perfectly valid to be used after transaction has been committed; this is currently supported when using the C binding and when using the Go binding. There is no need to retry because resolving a future doesn't involve any future-specific retry.

gm42 added 7 commits May 15, 2025 10:09
Stop relying on Go's GC finalizers and instead use explicit Close() calls
to release resources, which is more Go-idiomatic.
The Get() method is removed from RangeIterator.

NOTE: this would be a breaking change for users because without any code breakage
they would now have huge memory leaks on their FoundationDB clients; a release management
solution must be devised for this problem.
Other functions renamed:
* GetSliceWithError() -> Get()
* GetSliceOrPanic() -> MustGet()
Using a channel allows an easier integration with context.Context
Use a goroutine to automatically cancel transaction when context has an error.
Correctly close transactions by returning a function to caller.
Futures will be cancelled in case of context cancellation
Explicitly mention that BlockUntilReady is a reimplementation of fdb_future_block_until_ready
Only a few futures are currently being cached; this change
make all futures be cached and adds the missing check for fdb_future_get_error
in the previously cached futures.
Simplification for the usage of futures:
* if Get() is called on a future, all further calls will return a cached value and
  the future will be automatically closed
* if Get() is never called then future will be closed when user calls Close()

Cancel() is no more necessary: user can call Close()
Similarly, BlockUntilReady() is removed because a call to Get() (ignoring return value)
is equivalent.
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: e242faa
  • Duration 0:25:20
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: e242faa
  • Duration 0:33:02
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: e242faa
  • Duration 0:34:15
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: e242faa
  • Duration 0:41:17
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: e242faa
  • Duration 0:42:25
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: e242faa
  • Duration 0:44:36
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: e242faa
  • Duration 0:54:24
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0