8000 v1.10 backports 2022-03-03 by pchaigno · Pull Request #19023 · cilium/cilium · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

v1.10 backports 2022-03-03 #19023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Mar 7, 2022
Merged

Conversation

pchaigno
Copy link
Member
@pchaigno pchaigno commented Mar 3, 2022

Once this PR is merged, you can update the PR labels via:

$ for pr in 18506 18827 18893 18886 18891 18877 18911 18952; do contrib/backporting/set-labels.py $pr done 1.10; done

tklauser and others added 12 commits March 3, 2022 16:38
…tlink

[ upstream commit 3222f50 ]

As of Linux kernel commit torvalds/linux@68ac0f3810e7 ("xfrm: state and
policy should fail if XFRMA_IF_ID 0"), specifying xfrm if_id = 0 leads
to EINVAL being returned. So far, the netlink library always specified
the XFRMA_IF_ID netlink attribute, regardless of its value. Upstream PR
vishvananda/netlink#727 changed this behavior to
only set XFRMA_IF_ID in case XfrmState.Ifid or XfrmPolicy.Ifid are != 0
to fix the issue.

Updated using:

    go get github.com/vishvananda/netlink@main
    go mod tidy
    go mod vendor

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit a6e8477 ]

Document the intent of NodeValidateImplementation().

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 7e5022e ]

These if-statements are unnecessary because upon code analysis, we can
tell that it's not possible for the input to be nil. Remove these
statements to simplify the flow of the function. In other words, now we
know for a fact that calling these function will result in a route
insert.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit e41aea0 ]

This helps in scenarios where the user reports this log msg, but we are
missing the actual CIDR from the route that failed to be deleted or
created.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 0bd4e04 ]

With ENI IPAM mode and IPsec enabled, users were reporting cases where
connectivity to particular pods breaks, and correlated with those drops,
the following error msg:

```
Unable to delete the IPsec route OUT from the host routing table
```

In addition, it was also reported that the connectivity outage would
only last for a few minutes before resolving itself.

The issue turned out to be that upon node deletion, the logic to handle
the IPsec cleanup is asymmetric with the IPsec logic to handle a node
create / update.

Here's how:

  * With ENI mode and IPsec, subnet encryption mode is enabled
    implicitly.
  * Background: Users can explicitly enable subnet encryption mode by
    configuring `--ipv4-pod-subnets=[cidr1,cidr2,...]`.
    * Background: ENIs are part of subnet(s).
    * Cilium with ENI mode automatically appends the node's ENIs'
      subnets' CIDRs to this slice.
    * For example, node A has ENI E which is a part of subnet S with
      CIDR C. Therefore, `--ipv4-pod-subnets=[C]`.
  * This means that each node should have an IPsec OUT routes for each
    pod subnet, i.e. each ENI's subnet, as shown by
    (*linuxNodeHandler).nodeUpdate() which contains the IPsec logic on a
    node create / update.
  * Upon a node delete [(*linuxNodeHandler).nodeDelete()], we clean up
    the "old" node. When it gets to the IPsec logic, it removes the
    routes for the pod subnets as well, i.e. removes the route to the
    ENI's subnet from the local node. From the example above, it'd
    remove the route for CIDR C.
  * This is problematic because in ENI mode, different nodes can share
    the same ENI's subnet, meaning subnets are NOT exclusive to a node.
    For example, a node B can also have ENI E with a subnet C attached
    to it.
  * As for how the nodes were fixing themselves, it turns out that
    (*Manager).backgroundSync() runs on an interval which calls
    NodeValidateImplementation() which calls down to
    (*linuxNodeHandler).nodeUpdate() thereby running the IPsec logic of
    a node create / update which reinstates the missing routes.

Therefore, we shouldn't be deleting these routes because pods might
still be relying on them. By comparing the IPsec delete logic with [1],
we see that they're asymmetric. This commit fixes this asymmetry.

[1]: Given subnetEncryption=true, notice how we only call
     enableSubnetIPsec() if the node is local. That is not the case on
     node delete.

     ```
     func (n *linuxNodeHandler) nodeUpdate(oldNode, newNode *nodeTypes.Node, firstAddition bool) error {
     	...

     	if n.nodeConfig.EnableIPSec && !n.subnetEncryption() && !n.nodeConfig.EncryptNode {
     		n.enableIPsec(newNode)
     		newKey = newNode.EncryptionKey
     	}

     	...

     	if n.nodeConfig.EnableIPSec && !n.subnetEncryption() {
     		n.encryptNode(newNode)
     	}

     	if newNode.IsLocal() {
     		isLocalNode = true
     		...
     		if n.subnetEncryption() {
     			n.enableSubnetIPsec(n.nodeConfig.IPv4PodSubnets, n.nodeConfig.IPv6PodSubnets)
     		}
     		...
     		return nil
     	}
     ```

Fixes: 645de9d ("cilium: remove encryption route and rules if crypto
is disabled")

Co-authored-by: John Fastabend <john@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 6c647ba ]

Commit 706c900 ("docs: re-write docs to create clusters with
tainted nodes") removed the command for this instruction step, so
there's no need to have the instruction any more. Remove it.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 4c3bd27 ]

We are hitting this timeout sometimes, and it seems it was previously
updated on the regular pipelines (see
31a622e) but not on the runtime
pipeline.

We remove the inner timeout as the outer one is pratically redundant
here, as the steps outside of the inner loop are almost instantaneous.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit d9d23ba ]

When installing Cilium in an AKS cluster, the Cilium Operator requires
an Azure Service Principal with sufficient privileges to the Azure API
for the IPAM allocator to be able to work.

Previously, the `az ad sp create-for-rbac` was assigning by default the
`Contributor` role to new Service Principals when none was provided via
the optional `--role` flag, whereas it now does not assign any role at
all. This of course breaks IPAM allocation due to insufficient
permissions, resulting in operator failures of this kind:

```
level=warning msg="Unable to synchronize Azure virtualnetworks list" error="network.VirtualNetworksClient#ListAll: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code=\"AuthorizationFailed\" Message=\"The client 'd09fb531-793a-40fc-b934-7af73ca60e32' with object id 'd09fb531-793a-40fc-b934-7af73ca60e32' does not have authorization to perform action 'Microsoft.Network/virtualNetworks/read' over scope '/subscriptions/22716d91-fb67-4a07-ac5f-d36ea49d6167' or the scope is invalid. If access was recently granted, please refresh your credentials.\"" subsys=azure
level=fatal msg="Unable to start azure allocator" error="Initial synchronization with instances API failed" subsys=cilium-operator-azure
```

We update the documentation guidelines for new installations to assign
the `Contributor` role to new Service Principals used for Cilium.

We also take the opportunity to:

- Update Azure IPAM required privileges documentation.
- Make it so 
8000
users can now set up all AKS-specific required variables
  for a Helm install in a single command block, rather than have it
  spread over several command blocks with intermediate steps and
  temporary files.
- Have the documentation recommend creating Service Principals with
  privileges over a restricted scope (AKS node resource group) for
  increased security.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 7e8b651 ]

Cilium agent crashes when an L7/HTTP flow is passed to TCP flag flow
filter (`filterByTCPFlags`). This is because HTTP flows will have
some L4/TCP info such as src/dst port in the flow struct, but will not
contain TCP flags.

Added `nil` check for TCP flag pointer to avoid the segfault.

Fixes: cilium#18830

Signed-off-by: Wazir Ahmed <wazir@accuknox.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 7bc5761 ]

When using Azure's IPAM mode, we don't have non-overlapping pod CIDRs
for each node, so we can't rely on the default IPsec mode where we use
the destination CIDRs to match the xfrm policies.

Instead, we need to enable subnet IPsec as in EKS. In that case, the
dir=out xfrm policy and state look like:

    src 0.0.0.0/0 dst 10.240.0.0/16
        dir out priority 0
        mark 0x3e00/0xff00
        tmpl src 0.0.0.0 dst 10.240.0.0
             proto esp spi 0x00000003 reqid 1 mode tunnel

    src 0.0.0.0 dst 10.240.0.0
        proto esp spi 0x00000003 reqid 1 mode tunnel
        replay-window 0
        mark 0x3e00/0xff00 output-mark 0xe00/0xf00
        aead rfc4106(gcm(aes)) 0x567a47ff70a43a3914719a593d5b12edce25a971 128
        anti-replay context: seq 0x0, oseq 0x105, bitmap 0x00000000
        sel src 0.0.0.0/0 dst 0.0.0.0/0

As can be seen the xfrm policy matches on a broad /16 encompassing all
endpoints in the cluster. The xfrm state then matches the policy's
template. Finally, to write the proper outer destination IP, we need to
define the IP_POOLS macro in our datapath. That way, our BPF programs
will determine the outer IP from the ipcache lookup.

Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit d7f6407 ]

This function is not specific to ENI IPAM mode anymore since Alibaba
and Azure's IPAM modes are also using it.

Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit a0f5c0d ]

Commit c9810bf introduced garbage collection for cleaning orphan entries
in the nat maps whereby concurrent accesses to the maps weren't serialized.

The nat maps accessed via ct map construct are susceptible
to data races in asynchronously running goroutines upon
agent restart when endpoints are restored -

```
2022-02-21T02:42:13.757888057Z WARNING: DATA RACE
2022-02-21T02:42:13.757895621Z Write at 0x00c00081a830 by goroutine 360:
2022-02-21T02:42:13.757912783Z   github.com/cilium/cilium/pkg/bpf.(*Map).Close()
2022-02-21T02:42:13.757920669Z       /go/src/github.com/cilium/cilium/pkg/bpf/map_linux.go:581 +0x1c4
2022-02-21T02:42:13.757927597Z   github.com/cilium/cilium/pkg/maps/ctmap.doGC4·dwrap·4()
2022-02-21T02:42:13.757934561Z       /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:422 +0x39
2022-02-21T02:42:13.757941184Z   github.com/cilium/cilium/pkg/maps/ctmap.doGC4()
2022-02-21T02:42:13.757947352Z       /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:482 +0x4d5
2022-02-21T02:42:13.757953881Z   github.com/cilium/cilium/pkg/maps/ctmap.doGC()
2022-02-21T02:42:13.757960362Z       /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:517 +0x17e
2022-02-21T02:42:13.757966185Z   github.com/cilium/cilium/pkg/maps/ctmap.GC()
2022-02-21T02:42:13.757972307Z       /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:537 +0xc6
2022-02-21T02:42:13.757978599Z   github.com/cilium/cilium/pkg/endpoint.(*Endpoint).garbageCollectConntrack()
2022-02-21T02:42:13.757986321Z       /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:1034 +0x804
2022-02-21T02:42:13.757992160Z   github.com/cilium/cilium/pkg/endpoint.(*Endpoint).scrubIPsInConntrackTableLocked()
2022-02-21T02:42:13.757998853Z       /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:1039 +0x173
2022-02-21T02:42:13.758004601Z   github.com/cilium/cilium/pkg/endpoint.(*Endpoint).scrubIPsInConntrackTable()
2022-02-21T02:42:13.758010701Z       /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:1049 +0x44
2022-02-21T02:42:13.758016604Z   github.com/cilium/cilium/pkg/endpoint.(*Endpoint).runPreCompilationSteps.func1()
2022-02-21T02:42:13.758022804Z       /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:761 +0x132

2022-02-21T02:42:13.758034551Z Previous read at 0x00c00081a830 by goroutine 100:
2022-02-21T02:42:13.758040659Z   github.com/cilium/cilium/pkg/bpf.(*Map).DumpReliablyWithCallback()
2022-02-21T02:42:13.758046461Z       /go/src/github.com/cilium/cilium/pkg/bpf/map_linux.go:756 +0x804
2022-02-21T02:42:13.758053696Z   github.com/cilium/cilium/pkg/maps/nat.(*Map).DumpReliablyWithCallback()
2022-02-21T02:42:13.758059818Z       /go/src/github.com/cilium/cilium/pkg/maps/nat/nat.go:121 +0x7b7
2022-02-21T02:42:13.758065580Z   github.com/cilium/cilium/pkg/maps/ctmap.PurgeOrphanNATEntries()
2022-02-21T02:42:13.758072272Z       /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:612 +0x790
2022-02-21T02:42:13.758078005Z   github.com/cilium/cilium/pkg/maps/ctmap/gc.runGC()
2022-02-21T02:42:13.758084196Z       /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:214 +0xb0c
2022-02-21T02:42:13.758090362Z   github.com/cilium/cilium/pkg/maps/ctmap/gc.runGC()
2022-02-21T02:42:13.758096712Z       /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:189 +0x6a5
2022-02-21T02:42:13.758103134Z   github.com/cilium/cilium/pkg/maps/ctmap/gc.runGC()
2022-02-21T02:42:13.758109338Z       /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:189 +0x6a5
2022-02-21T02:42:13.758127517Z   github.com/cilium/cilium/pkg/maps/ctmap/gc.Enable.func1()
2022-02-21T02:42:13.758135513Z       /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:90 +0x40a
2022-02-21T02:42:13.758141621Z   github.com/cilium/cilium/pkg/maps/ctmap/gc.runGC()
2022-02-21T02:42:13.758147715Z       /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:189 +0x6a5
2022-02-21T02:42:13.758153712Z   github.com/cilium/cilium/pkg/maps/ctmap/gc.Enable.func1()
2022-02-21T02:42:13.758160127Z       /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:90 +0x40a

```

Fixes: c9810bf ("ctmap: GC orphan SNAT entries")

Signed-off-by: Aditi Ghag <aditi@cilium.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
@pchaigno pchaigno requested review from a team as code owners March 3, 2022 15:51
@pchaigno pchaigno requested a review from nathanjsweet March 3, 2022 15:51
@pchaigno pchaigno added backport/1.10 kind/backports This PR provides functionality previously merged into master. labels Mar 3, 2022
@pchaigno
Copy link
Member Author
pchaigno commented Mar 3, 2022

/test-backport-1.10

Copy link
Member
@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one LGTM, thanks Paul!

@nathanjsweet nathanjsweet requested review from tklauser and removed request for nathanjsweet March 3, 2022 16:08
Copy link
Member
@nbusseneau nbusseneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My commits LGTM.

@pchaigno
Copy link
Member Author
pchaigno commented Mar 4, 2022

We're only missing the review from Aditi but there weren't any conflicts on her changes, so probably fine to skip. Marking ready to merge.

@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 4, 2022
Copy link
Member
@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes for my PR look good. Thanks!

@gandro gandro merged commit e062349 into cilium:v1.10 Mar 7, 2022
@pchaigno pchaigno deleted the pr/v1.10-backport-2022-03-03 branch March 7, 2022 09:10
@joestringer
Copy link
Member

@seanmwinn It doesn't makes sense to add "needs-backport/1.9" to a backport PR like this. If you intended to raise a specific PR for backport, please go to the original PR and add the label there. I'll remove the label since it's unclear which PR from this set of already-backported PRs that you intended to be backported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants
0