Using non-443 ports for admission webhooks requires firewall rule in GKE

What happened:

I was investigating kubernetes-retired/kubefed#1024 and I stumbled across an issue which I believe might be a bug in Kubernetes.

I have successfully recreated this issue using some test configuration, so you don't need to deploy kubefed to reproduce it.

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: "codeclou/docker-nginx-self-signed-ssl:latest"
        imagePullPolicy: Always
        ports:
        - containerPort: 4443

--
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  selector:
    app: nginx
  ports:
  - port: 443
    targetPort: 4443
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  creationTimestamp: null
  name: testconfigs.example.io
spec:
  group: example.io
  version: v1
  versions:
    - name: v1
      storage: true
      served: true
  names:
    kind: TestConfig
    plural: testconfigs
  scope: Namespaced
  validation:
    openAPIV3Schema:
      type: object
      properties:
        spec:
          type: object
          properties:
            TestString:
              description: This is a test string
              type: string
---
apiVersion: admissionregistration.k8s.io/v1beta1
kind: ValidatingWebhookConfiguration
metadata:
  name: test-webhook
webhooks:
- name: testconfigs.example.io
  clientConfig:
    service:
      namespace: default
      name: nginx
  rules:
  - operations:
    - CREATE
    - UPDATE
    apiGroups:
    - example.io
    apiVersions:
    - v1
    resources:
    - testconfigs
  failurePolicy: Fail

After applying the YAML, you can see that a pod gets created:

❯ kubectl get pods -o wide
NAME                                READY   STATUS    RESTARTS   AGE     IP             NODE                                           NOMINATED NODE   READINESS GATES
nginx-deployment-5cf877cd99-n4n9l   1/1     Running   0          9m52s   10.156.17.45   gke-simons-cluster-preemptible-899b51b7-m0zk   <none>           <none>

As well as a service:

❯ kubectl get services -o wide
NAME         TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE     SELECTOR
nginx        ClusterIP   10.156.32.6   <none>        443/TCP   10m     app=nginx

The other resources to be deployed are:

a simple generic CRD of kind: TestConfig and
an admission validation webhook test-webhook.

The webhook is invoked when a new resource of kind:TestConfig is created or updated. I would expect it to work as follows: an HTTPS request should be made made to service.name: nginx which validates the request and the object creation succeeds.

Let's attempt to create a TestConfig object:

---
apiVersion: example.io/v1
kind: TestConfig
metadata:
  name: test-object
  namespace: default
spec:
  TestString: "This is my test string"

I observe that the validation webhook request times out, causing the resource creation to fail.

❯ kubectl apply -f test-resource.yaml
Error from server (Timeout): error when creating "test-resource.yaml": Timeout: request did not complete within requested timeout 30s

After some investigation, I realised that the admission control webhook attempts to hit the pod's IP address (10.156.17.45), rather than the service's (10.156.32.6), on the port specified by the service's targetPort (4443). This packet is intercepted by my GCE VPC firewall and gets denied.

{
 insertId:  "a05tpmfdwoniw"  
 jsonPayload: {
  connection: {
   dest_ip:  "10.156.17.45"    
   dest_port:  4443    
   protocol:  6    
   src_ip:  "10.172.0.3"    
   src_port:  57446    
  }
  disposition:  "DENIED"
...

GKE operates the master plane on a separate VPC network in a separate GCE account. A firewall rule is deployed automatically during cluster creation to allow traffic between the master and the node pools on ports 443 and 10250 only.

As soon as I add port 4443 to this firewall rule and the request to the pod succeeds, I observe that the admission control webhooks fires a correct request to the service name on the correct port and the validation webhook succeeds (in my specific case, I see a certificate mismatch error since no webhook CA was configured and I used a test container).

❯ kubectl apply -f test-resource.yaml
Error from server (InternalError): error when creating "test-resource.yaml": Internal error occurred: failed calling webhook "testconfigs.example.io": Post https://nginx.default.svc:443/?timeout=30s: x509: certificate is valid for local.codeclou.io, not nginx.default.svc

When I remove port 4443 from the above-mentioned GCP firewall rule, the timeout issue doesn't represent itself.

What you expected to happen:

I didn't expect any communication to happen between the admission controller webhook (on the master network) and a specific pod's IP address.

The only communication I expect to happen should be between the webhook and the service.

How to reproduce it (as minimally and precisely as possible):
Please look above to reproduce the issue or attempt to deploy the latest kubefed on a kubernetes cluster where the traffic between the master and the node networks are restricted with only ports 443 and 10250 allowed.

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):

kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-20T04:49:16Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.7-gke.8", Git

Cloud provider or hardware configuration:
GKE "v1.13.7-gke.8 (latest at the time of writing)
OS (e.g: cat /etc/os-release): Container optimised-OS
Kernel (e.g. uname -a): Google managed
Install tools: Google managed
Network plugin and version (if this is a network-related bug): Google managed
Others:

/sig api-machinery

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions