8000 max_parallel ignored when draining multiple nodes · Issue #25979 · hashicorp/nomad · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
max_parallel ignored when draining multiple nodes #25979
Open
@madsholden

Description

@madsholden

Nomad version

Nomad v1.10.0
BuildDate 2025-04-09T16:40:54Z
Revision e26a2bd

Operating system and Environment details

Ubuntu 24.04.2 LTS
aarch64

Issue

I have a service job with count 2, and migrate.max_parallel 1. When we want to redeploy client nodes, we start new nodes, then drain the old ones. In this case we have two new nodes, and two old nodes. If the service is running with one instance on each node, and I drain both at the same time, it will stop both jobs right away.

If both service instances are on the same node, it seems to work most of the time, but it's also not stable.

The documentation here: https://developer.hashicorp.com/nomad/tutorials/manage-clusters/node-drain says (referencing a job with count 9 and max_parallel 2):

Even if multiple nodes running allocations for this job were draining at the same time, only 2 allocations would be migrated at a time.

It seems like this is not the case.

Reproduction steps

Start a cluster with 4 client nodes, start the job, making sure it's running on two different nodes. Drain those two nodes at the same time.

Expected Result

It should only stop one job at a time.

Actual Result

Both jobs are stopped right away, causing downtime on the whole service.

Job file (if appropriate)

job "test-job" {
  type = "service"
  region = "eu-west-1"

  migrate {
    max_parallel = 1
    health_check = "checks"
    min_healthy_time = "15s"
    healthy_deadline = "5m"
  }

  group "test-job" {
    count = 2

    network {
      port "http" { }
    }

    task "test-job" {
      driver = "docker"
      config {
        image = "hashicorp/http-echo:1.0.0"
        args  = ["-text", "ok", "-listen", ":${NOMAD_PORT_http}"]
        ports = ["http"]
      }

      service {
        name = "test-job"
        port = "http"
        check {
          name = "http-ok"
          type = "http"
          path = "/"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}

Nomad Client logs (if appropriate)

$ nomad node drain -enable -yes f36dd73a
2025-06-04T12:32:30+02:00: Ctrl-C to stop monitoring: will not cancel the node drain
2025-06-04T12:32:30+02:00: Node "f36dd73a-f5c9-8d8d-ce92-0c250f11568f" drain strategy set
2025-06-04T12:32:32+02:00: Alloc "63b9d8f4-68e0-9ce6-f2ff-e5c518e59581" marked for migration

$ nomad node drain -enable -yes 74995fa4
2025-06-04T12:32:30+02:00: Ctrl-C to stop monitoring: will not cancel the node drain
2025-06-04T12:32:30+02:00: Node "74995fa4-8cc4-6efc-8a81-a7154e9cdec5" drain strategy set
2025-06-04T12:32:31+02:00: Alloc "3d97c300-e298-ca06-cdfe-4d2c88b355ca" marked for migration

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Needs Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0