8000 e2e: Use context deadline per action by nirs · Pull Request #2043 · RamenDR/ramen · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

e2e: Use context deadline per action #2043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8000
Merged
merged 3 commits into from
May 20, 2025
Merged

Conversation

nirs
Copy link
Member
@nirs nirs commented May 18, 2025

Previously we used util.Timeout when waiting for resources. In flow when
we have multiple waits, we could wait 10 minutes for each change before
failing, which is way too much.

This change adds a deadline to the underlying context.Context returned
by TestContext.Context() and Context.Context(). We set the deadline before
calling one of the e2e operations (deploy, undeploy, enable, disable,
failover, relocate, or before cleaning up (EnsureChannelDeleted). Since we
wait using util.Sleep(), the call will fail with context.DaadlineExceeded on
timeouts.

Advantages of using context based deadline:

  • Limiting the timeout for flows including multiple waits, getting more
    predictable test time.
  • Simplify waiting for multiple objects with same deadline, added in
    e2e: Wait for resource deletion during undeploy and unprotect #2035
  • Allows ramenctl to control the deadline when running multiple
    operations in parallel
  • Enforcing the deadline in external calls using a context
  • Simplify wait loops

This change requires a change in ramenctl:
RamenDR/ramenctl#185

We use the parent Context() by default sharing the same context in all
tests. This allows using a child context with different deadline for
each test operation in a future change.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Copy link
Member
@parikshithb parikshithb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@nirs nirs force-pushed the context-deadline branch from 7548728 to 7db23dc Compare May 19, 2025 21:45
@nirs nirs marked this pull request as ready for review May 19, 2025 21:46
@nirs nirs force-pushed the context-deadline branch 2 times, most recently from f09f975 to cc66279 Compare May 19, 2025 22:59
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
WithDeadline returns a derived command with a deadline. The command
context will fail with context.DeadlineExceeded on timeout.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
Add a 30 seconds deadline to validate and setup. These operations take
less than a second and we can safely fail quickly if they do not
complete in time.

Add a 1 minute deadline for cleanup. This takes usually 5 seconds. This
is the same timeout used in current ramen/2e. Next ramen/e2e will use
the deadline passed via the context.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
Previously we returned parent.Context(), but we need to keep our own
context to be able to create a test.Context with a deadline.

Context.WithDeadline returns a derive context with a deadline. The
context will fail with context.DeadlineExceeded on timeout.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
We need to pass a types.TestContext to the e2e testing interface, but
using the concrete type will allow adding deadlines for the test
operations.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
Add the standard deadline for test operations (deploy, undeploy,
protect, unprotect, failover, relocate) and for our special cleanup
operation.

We will use more fine grain timeout later. This work will be done in
ramen/e2e and we will consume the new finer grain timeouts.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
Add a 30 seconds deadline to validate and setup. These operations take
less than a second and we can safely fail quickly if they do not
complete in time.

Add a 1 minute deadline for cleanup. This takes usually 5 seconds. This
is the same timeout used in current ramen/2e. Next ramen/e2e will use
the deadline passed via the context.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
Previously we returned parent.Context(), but we need to keep our own
context to be able to create a test.Context with a deadline.

Context.WithDeadline returns a derive context with a deadline. The
context will fail with context.DeadlineExceeded on timeout.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
We need to pass a types.TestContext to the e2e testing interface, but
using the concrete type will allow adding deadlines for the test
operations.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
Add the standard deadline for test operations (deploy, undeploy,
protect, unprotect, failover, relocate) and for our special cleanup
operation.

We will use more fine grain timeout later. This work will be done in
ramen/e2e and we will consume the new finer grain timeouts.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
@@ -84,6 +86,15 @@ func (c *Context) Context() context.Context {
return c.context
}

// WithDeadline returns a derived context with a deadline. Call cancel to release resources associated with the context
// as soon as the operation running in the context complete.
func (c Context) WithDeadline(d time.Time) (*Context, context.CancelFunc) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify the code by using WithTimeout so callers do not need to compute the deadline. context.WithTimeout compute the deadline internally.

@nirs nirs marked this pull request as draft May 20, 2025 00:40
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
WithTimeout() returns a derived command with a deadline. The command
context will fail with context.DeadlineExceeded on timeout.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
Add a 30 seconds deadline to validate and setup. These operations take
less than a second and we can safely fail quickly if they do not
complete in time.

Add a 1 minute deadline for cleanup. This takes usually 5 seconds. This
is the same timeout used in current ramen/2e. Next ramen/e2e will use
the deadline passed via the context.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
Previously we returned parent.Context(), but we need to keep our own
context to be able to create a test.Context with a deadline.

Context.WithTimeout() returns a derive context with a deadline. The
context will fail with context.DeadlineExceeded on timeout.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
We need to pass a types.TestContext to the e2e testing interface, but
using the concrete type will allow adding deadlines for the test
operations.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
Add the standard deadline for test operations (deploy, undeploy,
protect, unprotect, failover, relocate) and for our special cleanup
operation.

We will use more fine grain timeout later. This work will be done in
ramen/e2e and we will consume the new finer grain timeouts.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added 2 commits May 20, 2025 04:11
Previously we used util.Timeout when waiting for resources. In flow when
we have multiple waits, we could wait 10 minutes for each change before
failing, which is way too much.

This change adds a deadline to the underlying context.Context returned
by TestContext.Context(). We set the deadline to 10 minutes after the
current time when starting one of the e2e operations (deploy, undeploy,
enable, disable, failover, relocate). Since we we wait using
util.Sleep(), the call will fail with context.DaadlineExceeded on
timeouts.

Advantages of using context based deadline:

- Limiting the timeout for flows including multiple waits, getting more
  predictable test time.
- Simplify waiting for multiple objects with same deadline, added in
  RamenDR#2035
- Allows ramenctl to control the deadline when running multiple
  operations in parallel
- Enforcing the deadline in external calls using a context
- Simplify wait loops

It would be nice to add WithTimeout() to types.Context interface, but
it does not work since we cannot overload this interface to return a
TestContext in the TestContext interface, and having an interface
returning a Context from TestContext is unwanted. For now we implement
WithTimeout() on the concrete type so we cannot use it in the actual
tests. Test code that need shorter timeouts can use .Context() directly.

This change requires a change in ramenctl, setting a deadline on the
context passed to e2e.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Like TestContext, this allows specifying a deadline for operations using
the global context. For now this is only util.EnsureChannelDeleted().

This changes DeleteNamespace() to consider the context deadline instead
of using its own 60 seconds timeout. The previous timeout was too short
when used Undeploy() and DisableProtection() with test context, and did
not consider the test deadline. Now we will wait until the context
deadline is exceeded.

Like TestContext we implement WithTimeout() on the concrete type and not
on types.Context interface, so deadline can be configured only at the
test level. Code that want to use smaller deadlines need to use the
.Context() directly.

This change requires a change in ramenctl to add a deadline for the
command context.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
@nirs nirs marked this pull request as ready for review May 20, 2025 01:21
@nirs nirs requested a review from parikshithb May 20, 2025 01:36
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
WithTimeout() returns a derived command with a deadline. The command
context will fail with context.DeadlineExceeded on timeout.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
Add a 30 seconds deadline to validate and setup. These operations take
less than a second and we can safely fail quickly if they do not
complete in time.

Add a 1 minute deadline for cleanup. This takes usually 5 seconds. This
is the same timeout used in current ramen/2e. Next ramen/e2e will use
the deadline passed via the context.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
Previously we returned parent.Context(), but we need to keep our own
context to be able to create a test.Context with a deadline.

Context.WithTimeout() returns a derive context with a deadline. The
context will fail with context.DeadlineExceeded on timeout.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
We need to pass a types.TestContext to the e2e testing interface, but
using the concrete type will allow adding deadlines for the test
operations.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ramenctl that referenced this pull request May 20, 2025
Add the standard deadline for test operations (deploy, undeploy,
protect, unprotect, failover, relocate) and for our special cleanup
operation.

We will use more fine grain timeout later. This work will be done in
ramen/e2e and we will consume the new finer grain timeouts.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
@nirs
Copy link
Member Author
nirs commented May 20, 2025

Tested with RamenDR/ramenctl#185 using:

diff --git a/go.mod b/go.mod
index 4203c2d..d78f329 100644
--- a/go.mod
+++ b/go.mod
@@ -87,3 +87,5 @@ require (
        sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd // indirect
        sigs.k8s.io/structured-merge-diff/v4 v4.4.1 // indirect
 )
+
+replace github.com/ramendr/ramen/e2e v0.0.0-20250424122329-3f0bedcd598d => ../ramen/e2e

I broke rbd-mirror daemon by scaling it down on cluster dr2.

% ramenctl test run -o test
⭐ Using report "test"
⭐ Using config "config.yaml"

🔎 Validate config ...
   ✅ Config validated

🔎 Setup environment ...
   ✅ Environment setup

🔎 Run tests ...
   ✅ Application "appset-deploy-rbd" deployed
   ✅ Application "appset-deploy-cephfs" deployed
   ✅ Application "disapp-deploy-rbd" deployed
   ✅ Application "disapp-deploy-cephfs" deployed
   ✅ Application "subscr-deploy-rbd" deployed
   ✅ Application "subscr-deploy-cephfs" deployed
   ✅ Application "appset-deploy-cephfs" protected
   ✅ Application "subscr-deploy-cephfs" protected
   ✅ Application "disapp-deploy-cephfs" protected
   ✅ Application "appset-deploy-cephfs" failed over
   ✅ Application "subscr-deploy-cephfs" failed over
   ✅ Application "disapp-deploy-cephfs" failed over
   ✅ Application "disapp-deploy-cephfs" relocated
   ✅ Application "subscr-deploy-cephfs" relocated
   ✅ Application "disapp-deploy-cephfs" unprotected
   ✅ Application "subscr-deploy-cephfs" unprotected
   ✅ Application "subscr-deploy-cephfs" undeployed
   ✅ Application "appset-deploy-cephfs" relocated
   ✅ Application "disapp-deploy-cephfs" undeployed
   ❌ Failed to protect application "appset-deploy-rbd"
   ❌ Failed to protect application "disapp-deploy-rbd"
   ❌ Failed to protect application "subscr-deploy-rbd"
   ✅ Application "appset-deploy-cephfs" unprotected
   ✅ Application "appset-deploy-cephfs" undeployed

🔎 Gather data ...
   ✅ Gathered data from cluster "hub"
   ✅ Gathered data from cluster "dr2"
   ✅ Gathered data from cluster "dr1"

❌ failed (3 passed, 3 failed, 0 skipped, 0 canceled)

In the report we see that all rbd tests failed in exactly 600 seconds:

build:
  commit: bc100b4dfa79e81995171bbc3edf3ddc969f8a87
  version: v0.5.1-30-gbc100b4
config:
  channel:
    name: https-github-com-ramendr-ocm-ramen-samples-git
    namespace: test-gitops
  clusterSet: default
  clusters:
    c1:
      kubeconfig: /Users/nsoffer/.config/drenv/rdr/kubeconfigs/dr1
    c2:
      kubeconfig: /Users/nsoffer/.config/drenv/rdr/kubeconfigs/dr2
    hub:
      kubeconfig: /Users/nsoffer/.config/drenv/rdr/kubeconfigs/hub
  distro: k8s
  drPolicy: dr-policy
  namespaces:
    argocdNamespace: argocd
    ramenDRClusterNamespace: ramen-system
    ramenHubNamespace: ramen-system
    ramenOpsNamespace: ramen-ops
  pvcSpecs:
  - accessModes: ReadWriteOnce
    name: rbd
    storageClassName: rook-ceph-block
  - accessModes: ReadWriteMany
    name: cephfs
    storageClassName: rook-cephfs-fs1
  repo:
    branch: main
    url: https://github.com/RamenDR/ocm-ramen-samples.git
  tests:
  - deployer: appset
    pvcSpec: rbd
    workload: deploy
  - deployer: appset
    pvcSpec: cephfs
    workload: deploy
  - deployer: subscr
    pvcSpec: rbd
    workload: deploy
  - deployer: subscr
    pvcSpec: cephfs
    workload: deploy
  - deployer: disapp
    pvcSpec: rbd
    workload: deploy
  - deployer: disapp
    pvcSpec: cephfs
    workload: deploy
created: "2025-05-20T18:20:00.61724+03:00"
duration: 645.790974124
host:
  arch: arm64
  cpus: 10
  os: darwin
name: test-run
status: failed
steps:
- duration: 0.0240695
  name: validate
  status: passed
- duration: 0.013582208
  name: setup
  status: passed
- duration: 645.753322416
  items:
  - config:
      deployer: appset
      pvcSpec: rbd
      workload: deploy
    duration: 600.033530166
    items:
    - duration: 0.031533666
      name: deploy
      status: passed
    - duration: 600.0019965
      name: protect
      status: failed
    name: appset-deploy-rbd
    status: failed
  - config:
      deployer: appset
      pvcSpec: cephfs
      workload: deploy
    duration: 645.753191959
    items:
    - duration: 0.032834625
      name: deploy
      status: passed
    - duration: 75.174977417
      name: protect
      status: passed
    - duration: 150.157500042
      name: failover
      status: passed
    - duration: 360.306835417
      name: relocate
      status: passed
    - duration: 60.067133625
      name: unprotect
      status: passed
    - duration: 0.013910833
      name: undeploy
      status: passed
    name: appset-deploy-cephfs
    status: passed
  - config:
      deployer: subscr
      pvcSpec: rbd
      workload: deploy
    duration: 605.134674125
    items:
    - duration: 5.134197958
      name: deploy
      status: passed
    - duration: 600.000476167
      name: protect
      status: failed
    name: subscr-deploy-rbd
    status: failed
  - config:
      deployer: subscr
      pvcSpec: cephfs
      workload: deploy
    duration: 582.111058459
    items:
    - duration: 5.1412195
      name: deploy
      status: passed
    - duration: 90.406721833
      name: protect
      status: passed
    - duration: 150.202424584
      name: failover
      status: passed
    - duration: 270.259307417
      name: relocate
      status: passed
    - duration: 60.053813875
      name: unprotect
      status: passed
    - duration: 6.04757125
      name: undeploy
      status: passed
    name: subscr-deploy-cephfs
    status: passed
  - config:
      deployer: disapp
      pvcSpec: rbd
      workload: deploy
    duration: 601.536050167
    items:
    - duration: 1.535898542
      name: deploy
      status: passed
    - duration: 600.000151625
      name: protect
      status: failed
    name: disapp-deploy-rbd
    status: failed
  - config:
      deployer: disapp
      pvcSpec: cephfs
      workload: deploy
    duration: 587.649268041
    items:
    - duration: 1.536133666
      name: deploy
      status: passed
    - duration: 120.187045833
      name: protect
      status: passed
    - duration: 151.872959375
      name: failover
      status: passed
    - duration: 238.258093625
      name: relocate
      status: passed
    - duration: 60.09480525
      name: unprotect
      status: passed
    - duration: 15.700230292
      name: undeploy
      status: passed
    name: disapp-deploy-cephfs
    status: passed
  name: tests
  status: failed
summary:
  canceled: 0
  failed: 3
  passed: 3
  skipped: 0

We see the expected errors in the log:

2025-05-20T18:30:00.702+0300    ERROR   appset-deploy-rbd       test/test.go:169        Step "protect" failed: drpc not ready in cluster "hub" (Available: true, PeerReady: true, ProgressionCompleted: true, lastGroupSyncTime: <nil>): context deadline exceeded
github.com/ramendr/ramenctl/pkg/test.(*Test).failStep
        /Users/nsoffer/src/ramenctl/pkg/test/test.go:169
github.com/ramendr/ramenctl/pkg/test.(*Test).Protect
        /Users/nsoffer/src/ramenctl/pkg/test/test.go:85
github.com/ramendr/ramenctl/pkg/test.(*Command).runFlow
        /Users/nsoffer/src/ramenctl/pkg/test/command.go:244
github.com/ramendr/ramenctl/pkg/test.(*Command).runFlowFunc.func1
        /Users/nsoffer/src/ramenctl/pkg/test/command.go:223

nirs added a commit to RamenDR/ramenctl that referenced this pull request May 20, 2025
WithTimeout() returns a derived command with a deadline. The command
context will fail with context.DeadlineExceeded on timeout.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to RamenDR/ramenctl that referenced this pull request May 20, 2025
Add a 30 seconds deadline to validate and setup. These operations take
less than a second and we can safely fail quickly if they do not
complete in time.

Add a 1 minute deadline for cleanup. This takes usually 5 seconds. This
is the same timeout used in current ramen/2e. Next ramen/e2e will use
the deadline passed via the context.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to RamenDR/ramenctl that referenced this pull request May 20, 2025
Previously we returned parent.Context(), but we need to keep our own
context to be able to create a test.Context with a deadline.

Context.WithTimeout() returns a derive context with a deadline. The
context will fail with context.DeadlineExceeded on timeout.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to RamenDR/ramenctl that referenced this pull request May 20, 2025
We need to pass a types.TestContext to the e2e testing interface, but
using the concrete type will allow adding deadlines for the test
operations.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to RamenDR/ramenctl that referenced this pull request May 20, 2025
Add the standard deadline for test operations (deploy, undeploy,
protect, unprotect, failover, relocate) and for our special cleanup
operation.

We will use more fine grain timeout later. This work will be done in
ramen/e2e and we will consume the new finer grain timeouts.

This change is required to consume ramen/e2e context deadline added in
RamenDR/ramen#2043

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
@nirs nirs merged commit 1bb9ff6 into RamenDR:main May 20, 2025
30 of 31 checks passed
@nirs nirs deleted the context-deadline branch May 20, 2025 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0