8000 Is the zone responsible for a domain changes, cert-manager will not pick it up · Issue #7760 · cert-manager/cert-manager · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Is the zone responsible for a domain changes, cert-manager will not pick it up #7760
Open
@c00

Description

@c00

Describe the bug:

cert-manager caches the results of which zone is responsible for which domain. If this information changes, cert-manage will not learn about it until it restarts.

When a challenge is presented cert-manager attempts to find the associated zone (util.FindZoneByFqdn). It then caches this result. However, in the case of a delegated zone that changes (for whatever reason, in my case initial misconfiguration) it can happen that the wrong zone is cached. This cache is not cleared until cert-manager restarts.

So if the zone associated with a domain changes, (e.g. because some subdomain is delegated to other NS servers), this change will not be detected until the service restarts.

Expected behaviour:

If the zone responsible for a domain changes, cert-manager should (eventually) pick this up without a restart.

Possible ways to achieve this:

  • cache should be cleared after some time;
  • unsuccessful challenges have their related cache items invalidated; or
  • on deletion of ingress / cert, associated cache items are also invalidated

Steps to reproduce the bug:

(I assume that cert-manager is running in k8s.)

  • Have 2 different DNS providers. One that manages example.com (Provider 1) and one that manages sub.example.com (Provider 2).
  • Setup an issuer with a DNS01 solver for provider 2.
  • Set-up the DNS records like this:
# Records on Provider 1
NS   example.com   ns.provider-1.example

# Records on Provider 2 
NS   sub.example.com   ns.provider-2.example
A    my-website.sub.example.com   ns.provider-2.example

Note: The above records are purposefully not correctly configured. The first provider is missing a NS sub.example.com ns.provider-2.example record. This will be added later.

  • Issue a challenge for my-website.sub.example.com.
  • Note that the resolved zone will be example.com and the challenge will fail because the configured issuer does not have that zone. This is expected, and correct behavior.
  • Now correct the misconfiguration in Provider 1 by adding the NS record delegating the subdomain:
# Record to add on Provider 1
NS   sub.example.com   ns.provider-2.example
  • Now issue the same challenge again.
  • Note that the resolved zone still goes to example.com, even tho it should now go to sub.example.com. This is wrong. The challenge fails again.
  • Restart the cert-manager deployment
  • Note that the challenge now resolves the zone correctly and the certificate gets issues as expected.

Anything else we need to know?:

In the file pkg/issuer/acme/dns/util/wait.go, there is a function FindZoneByFqdn that sets the cache. This is the cache that is never cleaned.

Environment details:

  • Kubernetes version: 1.31.1
  • Cloud-provider/provisioner: syseleven
  • cert-manager version: v1.12.7
  • Install method: helm

/kind bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0