-
Notifications
You must be signed in to change notification settings - Fork 61
e2e: Wait for resource deletion during undeploy and unprotect #2035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
@parikshithb we need to consume this in ramnectl. If will probably reveal hidden bugs in ramen when running |
Looking at the tests, underlay takes some time time as we expect, except for applet tests - underlay seems too quick to be true. Please check that we delete and wait for all resources.
|
b54130c
to
33c6bf7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Previously we used util.Timeout when waiting for resources. In flow when we have multiple waits, we could wait 10 minutes for each change before failing, which is way too much. This change adds a deadline to the underlying context.Context returned by TestContext.Context(). We set the deadline to 10 minutes after the current time when starting one of the e2e operations (deploy, undeploy, enable, disable, failover, relocate). Since we we wait using util.Sleep(), the call will fail with context.DaadlineExceeded on timeouts. Advantages of using context based deadline: - Limiting the timeout for flows including multiple waits, getting more predictable test time. - Simplify waiting for multiple objects with same deadline, added in RamenDR#2035 - Allows ramenctl to control the deadline when running multiple operations in parallel - Enforcing the deadline in external calls using a context - Simplify wait loops Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Previously we used util.Timeout when waiting for resources. In flow when we have multiple waits, we could wait 10 minutes for each change before failing, which is way too much. This change adds a deadline to the underlying context.Context returned by TestContext.Context(). We set the deadline to 10 minutes after the current time when starting one of the e2e operations (deploy, undeploy, enable, disable, failover, relocate). Since we we wait using util.Sleep(), the call will fail with context.DaadlineExceeded on timeouts. Advantages of using context based deadline: - Limiting the timeout for flows including multiple waits, getting more predictable test time. - Simplify waiting for multiple objects with same deadline, added in RamenDR#2035 - Allows ramenctl to control the deadline when running multiple operations in parallel - Enforcing the deadline in external calls using a context - Simplify wait loops This change requires a change in ramenctl, setting a deadline on the context passed to e2e. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Previously we used util.Timeout when waiting for resources. In flow when we have multiple waits, we could wait 10 minutes for each change before failing, which is way too much. This change adds a deadline to the underlying context.Context returned by TestContext.Context(). We set the deadline to 10 minutes after the current time when starting one of the e2e operations (deploy, undeploy, enable, disable, failover, relocate). Since we we wait using util.Sleep(), the call will fail with context.DaadlineExceeded on timeouts. Advantages of using context based deadline: - Limiting the timeout for flows including multiple waits, getting more predictable test time. - Simplify waiting for multiple objects with same deadline, added in RamenDR#2035 - Allows ramenctl to control the deadline when running multiple operations in parallel - Enforcing the deadline in external calls using a context - Simplify wait loops This change requires a change in ramenctl, setting a deadline on the context passed to e2e. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Previously we used util.Timeout when waiting for resources. In flow when we have multiple waits, we could wait 10 minutes for each change before failing, which is way too much. This change adds a deadline to the underlying context.Context returned by TestContext.Context(). We set the deadline to 10 minutes after the current time when starting one of the e2e operations (deploy, undeploy, enable, disable, failover, relocate). Since we we wait using util.Sleep(), the call will fail with context.DaadlineExceeded on timeouts. Advantages of using context based deadline: - Limiting the timeout for flows including multiple waits, getting more predictable test time. - Simplify waiting for multiple objects with same deadline, added in RamenDR#2035 - Allows ramenctl to control the deadline when running multiple operations in parallel - Enforcing the deadline in external calls using a context - Simplify wait loops It would be nice to add WithDeadline() to types.Context interface, but it does not work since we cannot overload this interface to return a TestContext in the TestContext interface, and having an interface returning a Context from TestContext is unwanted. For now we implement WithDeadline() on the concrete type so we cannot use it in the actual tests. Test code that need shorter timeouts can use .Context() directly. This change requires a change in ramenctl, setting a deadline on the context passed to e2e. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Previously we used util.Timeout when waiting for resources. In flow when we have multiple waits, we could wait 10 minutes for each change before failing, which is way too much. This change adds a deadline to the underlying context.Context returned by TestContext.Context(). We set the deadline to 10 minutes after the current time when starting one of the e2e operations (deploy, undeploy, enable, disable, failover, relocate). Since we we wait using util.Sleep(), the call will fail with context.DaadlineExceeded on timeouts. Advantages of using context based deadline: - Limiting the timeout for flows including multiple waits, getting more predictable test time. - Simplify waiting for multiple objects with same deadline, added in RamenDR#2035 - Allows ramenctl to control the deadline when running multiple operations in parallel - Enforcing the deadline in external calls using a context - Simplify wait loops It would be nice to add WithDeadline() to types.Context interface, but it does not work since we cannot overload this interface to return a TestContext in the TestContext interface, and having an interface returning a Context from TestContext is unwanted. For now we implement WithDeadline() on the concrete type so we cannot use it in the actual tests. Test code that need shorter timeouts can use .Context() directly. This change requires a change in ramenctl, setting a deadline on the context passed to e2e. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Previously we used util.Timeout when waiting for resources. In flow when we have multiple waits, we could wait 10 minutes for each change before failing, which is way too much. This change adds a deadline to the underlying context.Context returned by TestContext.Context(). We set the deadline to 10 minutes after the current time when starting one of the e2e operations (deploy, undeploy, enable, disable, failover, relocate). Since we we wait using util.Sleep(), the call will fail with context.DaadlineExceeded on timeouts. Advantages of using context based deadline: - Limiting the timeout for flows including multiple waits, getting more predictable test time. - Simplify waiting for multiple objects with same deadline, added in RamenDR#2035 - Allows ramenctl to control the deadline when running multiple operations in parallel - Enforcing the deadline in external calls using a context - Simplify wait loops It would be nice to add WithDeadline() to types.Context interface, but it does not work since we cannot overload this interface to return a TestContext in the TestContext interface, and having an interface returning a Context from TestContext is unwanted. For now we implement WithDeadline() on the concrete type so we cannot use it in the actual tests. Test code that need shorter timeouts can use .Context() directly. This change requires a change in ramenctl, setting a deadline on the context passed to e2e. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Previously we used util.Timeout when waiting for resources. In flow when we have multiple waits, we could wait 10 minutes for each change before failing, which is way too much. This change adds a deadline to the underlying context.Context returned by TestContext.Context(). We set the deadline to 10 minutes after the current time when starting one of the e2e operations (deploy, undeploy, enable, disable, failover, relocate). Since we we wait using util.Sleep(), the call will fail with context.DaadlineExceeded on timeouts. Advantages of using context based deadline: - Limiting the timeout for flows including multiple waits, getting more predictable test time. - Simplify waiting for multiple objects with same deadline, added in RamenDR#2035 - Allows ramenctl to control the deadline when running multiple operations in parallel - Enforcing the deadline in external calls using a context - Simplify wait loops It would be nice to add WithTimeout() to types.Context interface, but it does not work since we cannot overload this interface to return a TestContext in the TestContext interface, and having an interface returning a Context from TestContext is unwanted. For now we implement WithTimeout() on the concrete type so we cannot use it in the actual tests. Test code that need shorter timeouts can use .Context() directly. This change requires a change in ramenctl, setting a deadline on the context passed to e2e. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Previously we used util.Timeout when waiting for resources. In flow when we have multiple waits, we could wait 10 minutes for each change before failing, which is way too much. This change adds a deadline to the underlying context.Context returned by TestContext.Context(). We set the deadline to 10 minutes after the current time when starting one of the e2e operations (deploy, undeploy, enable, disable, failover, relocate). Since we we wait using util.Sleep(), the call will fail with context.DaadlineExceeded on timeouts. Advantages of using context based deadline: - Limiting the timeout for flows including multiple waits, getting more predictable test time. - Simplify waiting for multiple objects with same deadline, added in #2035 - Allows ramenctl to control the deadline when running multiple operations in parallel - Enforcing the deadline in external calls using a context - Simplify wait loops It would be nice to add WithTimeout() to types.Context interface, but it does not work since we cannot overload this interface to return a TestContext in the TestContext interface, and having an interface returning a Context from TestContext is unwanted. For now we implement WithTimeout() on the concrete type so we cannot use it in the actual tests. Test code that need shorter timeouts can use .Context() directly. This change requires a change in ramenctl, setting a deadline on the context passed to e2e. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
@parikshithb #2043 is merged, please rebase. |
Replace --timeout=5m with --wait=false in the kubectl delete command to ensure that DeleteDiscoveredApps returns immediately after initiating deletion instead of waiting for resources to be fully deleted. This change makes the function's behavior consistent with other delete operations that separately handle deletion initiation and waiting for completion, improving control flow. Signed-off-by: Parikshith <parikshithb@gmail.com>
This commit adds infra to wait for resources are fully deleted before tests continue, preventing conditions where subsequent tests might be affected by lingering resources. - Add WaitForResourceDelete generic function to wait for resources to be fully deleted - Implement wait funcs for different resource types: - ApplicationSet - ConfigMap - Placement - ManagedClusterSetBinding Signed-off-by: Parikshith <parikshithb@gmail.com>
This change updates the undeploy function for the ApplicationSet by adding wait for all resource deleted with a deadline part of the context. Signed-off-by: Parikshith <parikshithb@gmail.com>
This separation infra allows for more control over the namespace deletion process to explicitly wait for namespace deletion after initiating namespace deletion. Updated undeploy functions of Disapp, Subscription and the EnsureChannelDeleted function by adding the separate wait for namespace with a shared deadline from context. Signed-off-by: Parikshith <parikshithb@gmail.com>
Since we are doing common delete ops on both clusters current comment needs to updated to indicate same. Signed-off-by: Parikshith <parikshithb@gmail.com>
This change replace waitDRPCDeleted with WaitForDRPCDelete with the shared deadline to DRPC and other related resource deletion like placement and mcsb. Signed-off-by: Parikshith <parikshithb@gmail.com>
tested this update using ramenctl
Broke rbd-mirror daemon by scaling it down on cluster dr2 while running
Ran
appset cleanup deletes and waits until drpc, appset, placement and mcsb is deleted:
disapp cleanup deletes and waits for dprcs, placement, mcsb and app namespaces to be deleted:
subapp cleanup deletes and waits for dprcs and subscription namespace to be deleted:
Complete clean logs: |
Improves how resources are cleaned up and waited. It implements a more consistent separation between resource deletion initiation and wait/verification of complete deletion of the resource with deadline.
Changes:
Deadline-based resource deletion
Implement a generic waitForResourceDelete function that accepts a deadline parameter
Replace individual timeouts with a shared deadline approach for all deletion operations
Add resource-specific wait functions for ApplicationSets, ConfigMaps, Placements, DRPCs, namespace and mcsb.
Separation of deletion and waiting
Refactor namespace deletion to separate the deletion request from the wait operation
Change kubectl commands to use --wait=false to return immediately and handle waiting ourselves
Update deployers and actions to use new wait functions with deadline
Example wait logs of different resources added:
Fixes #2019