8000 [SURE-9419] Move agent registration into controller container by p-se · Pull Request #3416 · rancher/fleet · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[SURE-9419] Move agent registration into controller container #3416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 7, 2025

Conversation

p-se
Copy link
Contributor
@p-se p-se commented Mar 4, 2025

Since we have changed the agent from using a StatefulSet to a Deployment, we also needed and have implemented leader election for every container of the agent pod. This ensures that

  • only one cluster registration is created at a time from one downstream cluster,
  • one agent is running on a downstream cluster at a time and
  • one container is updating the status at at time,

even if the replicas are temporarily (as in a rollout) or permanently (as in replica size) above one.

Now, if replicas are to be set to anything above one, this will currently lead to the agent of the second pod to perform the cluster registration, even though the agent cannot run after it completed the registration. The cluster registration will complete, but the controller will wait for the lease. This state potentially leads to confusion, as there are cluster registrations in the upstream cluster that have never and may never be used. Apparently this state is also considered inconsistent with the UI. Lastly, we use more leases than necessary.

While theoretically possible that the name for the lease is the same for the cluster registration container as well as for the agent controller, this may lead to extended waiting times, as the leases need to be acquired. Also, it is not guaranteed that the agent controller of the same pod will get the lease after the cluster registration container has let go of the lease , potentially leading to the same problem with unused cluster registrations (imagine replica size 3 from 1).

For those reasons the agent registration is to be moved into the controller container.

Part of #3377

@p-se p-se force-pushed the SURE-9419-agent-registration branch 5 times, most recently from ff27a4b to 9caf17e Compare March 4, 2025 16:18
@p-se p-se marked this pull request as ready for review March 5, 2025 09:30
@p-se p-se requested a review from a team as a code owner March 5, 2025 09:30
@p-se p-se changed the title Move agent registration into controller container [SURE-9419] Move agent registration into controller container Mar 5, 2025
Copy link
Contributor
@weyfonk weyfonk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, this looks cleaner to me than the previous approach.

@@ -125,6 +124,15 @@ func (a *FleetAgent) Run(cmd *cobra.Command, args []string) error {
RenewDeadline: *leaderOpts.RenewDeadline,
Callbacks: leaderelection.LeaderCallbacks{
OnStartedLeading: func(ctx context.Context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so if I properly understand the new process, the (single) agent container will seek to acquire a leader election lease and, once it has it, will register the cluster before proceeding to running the agent reconcilers. 👍

In my understanding (just checking if it is aligned with yours), the next step would involve launching a goroutine, in parallel to running the reconcilers, for reporting cluster status. Happy to discuss if I've misunderstood any of this ;)

Copy link
Contributor Author
@p-se p-se Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so if I properly understand the new process, the (single) agent container will seek to acquire a leader election lease and, once it has it, will register the cluster before proceeding to running the agent reconcilers. 👍

Correct.

In my understanding (just checking if it is aligned with yours), the next step would involve launching a goroutine, in parallel to running the reconcilers, for reporting cluster status. Happy to discuss if I've misunderstood any of this ;)

As we've discussed, my initial idea was to make it an ordinary reconciler and use requeue to immitate a ticker, but now I'm going to put the functionality into a (controller-runtime) Runnable and have the manager execute it. Then I'll take care of migrating the other parts from wrangler/lasso to controller-runtime.

@p-se p-se force-pushed the SURE-9419-agent-registration branch 2 times, most recently from 76e12dd to 63e038a Compare March 6, 2025 16:14
@p-se p-se force-pushed the SURE-9419-agent-registration branch from 63e038a to 87714f9 Compare March 6, 2025 16:32
@p-se p-se merged commit b787666 into rancher:main Mar 7, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0