-
Notifications
You must be signed in to change notification settings - Fork 3.2k
v1.10 backports 2022-03-03 #19023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.10 backports 2022-03-03 #19023
Conversation
…tlink [ upstream commit 3222f50 ] As of Linux kernel commit torvalds/linux@68ac0f3810e7 ("xfrm: state and policy should fail if XFRMA_IF_ID 0"), specifying xfrm if_id = 0 leads to EINVAL being returned. So far, the netlink library always specified the XFRMA_IF_ID netlink attribute, regardless of its value. Upstream PR vishvananda/netlink#727 changed this behavior to only set XFRMA_IF_ID in case XfrmState.Ifid or XfrmPolicy.Ifid are != 0 to fix the issue. Updated using: go get github.com/vishvananda/netlink@main go mod tidy go mod vendor Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit a6e8477 ] Document the intent of NodeValidateImplementation(). Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 7e5022e ] These if-statements are unnecessary because upon code analysis, we can tell that it's not possible for the input to be nil. Remove these statements to simplify the flow of the function. In other words, now we know for a fact that calling these function will result in a route insert. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit e41aea0 ] This helps in scenarios where the user reports this log msg, but we are missing the actual CIDR from the route that failed to be deleted or created. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 0bd4e04 ] With ENI IPAM mode and IPsec enabled, users were reporting cases where connectivity to particular pods breaks, and correlated with those drops, the following error msg: ``` Unable to delete the IPsec route OUT from the host routing table ``` In addition, it was also reported that the connectivity outage would only last for a few minutes before resolving itself. The issue turned out to be that upon node deletion, the logic to handle the IPsec cleanup is asymmetric with the IPsec logic to handle a node create / update. Here's how: * With ENI mode and IPsec, subnet encryption mode is enabled implicitly. * Background: Users can explicitly enable subnet encryption mode by configuring `--ipv4-pod-subnets=[cidr1,cidr2,...]`. * Background: ENIs are part of subnet(s). * Cilium with ENI mode automatically appends the node's ENIs' subnets' CIDRs to this slice. * For example, node A has ENI E which is a part of subnet S with CIDR C. Therefore, `--ipv4-pod-subnets=[C]`. * This means that each node should have an IPsec OUT routes for each pod subnet, i.e. each ENI's subnet, as shown by (*linuxNodeHandler).nodeUpdate() which contains the IPsec logic on a node create / update. * Upon a node delete [(*linuxNodeHandler).nodeDelete()], we clean up the "old" node. When it gets to the IPsec logic, it removes the routes for the pod subnets as well, i.e. removes the route to the ENI's subnet from the local node. From the example above, it'd remove the route for CIDR C. * This is problematic because in ENI mode, different nodes can share the same ENI's subnet, meaning subnets are NOT exclusive to a node. For example, a node B can also have ENI E with a subnet C attached to it. * As for how the nodes were fixing themselves, it turns out that (*Manager).backgroundSync() runs on an interval which calls NodeValidateImplementation() which calls down to (*linuxNodeHandler).nodeUpdate() thereby running the IPsec logic of a node create / update which reinstates the missing routes. Therefore, we shouldn't be deleting these routes because pods might still be relying on them. By comparing the IPsec delete logic with [1], we see that they're asymmetric. This commit fixes this asymmetry. [1]: Given subnetEncryption=true, notice how we only call enableSubnetIPsec() if the node is local. That is not the case on node delete. ``` func (n *linuxNodeHandler) nodeUpdate(oldNode, newNode *nodeTypes.Node, firstAddition bool) error { ... if n.nodeConfig.EnableIPSec && !n.subnetEncryption() && !n.nodeConfig.EncryptNode { n.enableIPsec(newNode) newKey = newNode.EncryptionKey } ... if n.nodeConfig.EnableIPSec && !n.subnetEncryption() { n.encryptNode(newNode) } if newNode.IsLocal() { isLocalNode = true ... if n.subnetEncryption() { n.enableSubnetIPsec(n.nodeConfig.IPv4PodSubnets, n.nodeConfig.IPv6PodSubnets) } ... return nil } ``` Fixes: 645de9d ("cilium: remove encryption route and rules if crypto is disabled") Co-authored-by: John Fastabend <john@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 6c647ba ] Commit 706c900 ("docs: re-write docs to create clusters with tainted nodes") removed the command for this instruction step, so there's no need to have the instruction any more. Remove it. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 4c3bd27 ] We are hitting this timeout sometimes, and it seems it was previously updated on the regular pipelines (see 31a622e) but not on the runtime pipeline. We remove the inner timeout as the outer one is pratically redundant here, as the steps outside of the inner loop are almost instantaneous. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit d9d23ba ] When installing Cilium in an AKS cluster, the Cilium Operator requires an Azure Service Principal with sufficient privileges to the Azure API for the IPAM allocator to be able to work. Previously, the `az ad sp create-for-rbac` was assigning by default the `Contributor` role to new Service Principals when none was provided via the optional `--role` flag, whereas it now does not assign any role at all. This of course breaks IPAM allocation due to insufficient permissions, resulting in operator failures of this kind: ``` level=warning msg="Unable to synchronize Azure virtualnetworks list" error="network.VirtualNetworksClient#ListAll: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code=\"AuthorizationFailed\" Message=\"The client 'd09fb531-793a-40fc-b934-7af73ca60e32' with object id 'd09fb531-793a-40fc-b934-7af73ca60e32' does not have authorization to perform action 'Microsoft.Network/virtualNetworks/read' over scope '/subscriptions/22716d91-fb67-4a07-ac5f-d36ea49d6167' or the scope is invalid. If access was recently granted, please refresh your credentials.\"" subsys=azure level=fatal msg="Unable to start azure allocator" error="Initial synchronization with instances API failed" subsys=cilium-operator-azure ``` We update the documentation guidelines for new installations to assign the `Contributor` role to new Service Principals used for Cilium. We also take the opportunity to: - Update Azure IPAM required privileges documentation. - Make it so 8000 users can now set up all AKS-specific required variables for a Helm install in a single command block, rather than have it spread over several command blocks with intermediate steps and temporary files. - Have the documentation recommend creating Service Principals with privileges over a restricted scope (AKS node resource group) for increased security. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 7e8b651 ] Cilium agent crashes when an L7/HTTP flow is passed to TCP flag flow filter (`filterByTCPFlags`). This is because HTTP flows will have some L4/TCP info such as src/dst port in the flow struct, but will not contain TCP flags. Added `nil` check for TCP flag pointer to avoid the segfault. Fixes: cilium#18830 Signed-off-by: Wazir Ahmed <wazir@accuknox.com> Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 7bc5761 ] When using Azure's IPAM mode, we don't have non-overlapping pod CIDRs for each node, so we can't rely on the default IPsec mode where we use the destination CIDRs to match the xfrm policies. Instead, we need to enable subnet IPsec as in EKS. In that case, the dir=out xfrm policy and state look like: src 0.0.0.0/0 dst 10.240.0.0/16 dir out priority 0 mark 0x3e00/0xff00 tmpl src 0.0.0.0 dst 10.240.0.0 proto esp spi 0x00000003 reqid 1 mode tunnel src 0.0.0.0 dst 10.240.0.0 proto esp spi 0x00000003 reqid 1 mode tunnel replay-window 0 mark 0x3e00/0xff00 output-mark 0xe00/0xf00 aead rfc4106(gcm(aes)) 0x567a47ff70a43a3914719a593d5b12edce25a971 128 anti-replay context: seq 0x0, oseq 0x105, bitmap 0x00000000 sel src 0.0.0.0/0 dst 0.0.0.0/0 As can be seen the xfrm policy matches on a broad /16 encompassing all endpoints in the cluster. The xfrm state then matches the policy's template. Finally, to write the proper outer destination IP, we need to define the IP_POOLS macro in our datapath. That way, our BPF programs will determine the outer IP from the ipcache lookup. Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit d7f6407 ] This function is not specific to ENI IPAM mode anymore since Alibaba and Azure's IPAM modes are also using it. Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit a0f5c0d ] Commit c9810bf introduced garbage collection for cleaning orphan entries in the nat maps whereby concurrent accesses to the maps weren't serialized. The nat maps accessed via ct map construct are susceptible to data races in asynchronously running goroutines upon agent restart when endpoints are restored - ``` 2022-02-21T02:42:13.757888057Z WARNING: DATA RACE 2022-02-21T02:42:13.757895621Z Write at 0x00c00081a830 by goroutine 360: 2022-02-21T02:42:13.757912783Z github.com/cilium/cilium/pkg/bpf.(*Map).Close() 2022-02-21T02:42:13.757920669Z /go/src/github.com/cilium/cilium/pkg/bpf/map_linux.go:581 +0x1c4 2022-02-21T02:42:13.757927597Z github.com/cilium/cilium/pkg/maps/ctmap.doGC4·dwrap·4() 2022-02-21T02:42:13.757934561Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:422 +0x39 2022-02-21T02:42:13.757941184Z github.com/cilium/cilium/pkg/maps/ctmap.doGC4() 2022-02-21T02:42:13.757947352Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:482 +0x4d5 2022-02-21T02:42:13.757953881Z github.com/cilium/cilium/pkg/maps/ctmap.doGC() 2022-02-21T02:42:13.757960362Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:517 +0x17e 2022-02-21T02:42:13.757966185Z github.com/cilium/cilium/pkg/maps/ctmap.GC() 2022-02-21T02:42:13.757972307Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:537 +0xc6 2022-02-21T02:42:13.757978599Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).garbageCollectConntrack() 2022-02-21T02:42:13.757986321Z /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:1034 +0x804 2022-02-21T02:42:13.757992160Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).scrubIPsInConntrackTableLocked() 2022-02-21T02:42:13.757998853Z /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:1039 +0x173 2022-02-21T02:42:13.758004601Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).scrubIPsInConntrackTable() 2022-02-21T02:42:13.758010701Z /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:1049 +0x44 2022-02-21T02:42:13.758016604Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).runPreCompilationSteps.func1() 2022-02-21T02:42:13.758022804Z /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:761 +0x132 2022-02-21T02:42:13.758034551Z Previous read at 0x00c00081a830 by goroutine 100: 2022-02-21T02:42:13.758040659Z github.com/cilium/cilium/pkg/bpf.(*Map).DumpReliablyWithCallback() 2022-02-21T02:42:13.758046461Z /go/src/github.com/cilium/cilium/pkg/bpf/map_linux.go:756 +0x804 2022-02-21T02:42:13.758053696Z github.com/cilium/cilium/pkg/maps/nat.(*Map).DumpReliablyWithCallback() 2022-02-21T02:42:13.758059818Z /go/src/github.com/cilium/cilium/pkg/maps/nat/nat.go:121 +0x7b7 2022-02-21T02:42:13.758065580Z github.com/cilium/cilium/pkg/maps/ctmap.PurgeOrphanNATEntries() 2022-02-21T02:42:13.758072272Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:612 +0x790 2022-02-21T02:42:13.758078005Z github.com/cilium/cilium/pkg/maps/ctmap/gc.runGC() 2022-02-21T02:42:13.758084196Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:214 +0xb0c 2022-02-21T02:42:13.758090362Z github.com/cilium/cilium/pkg/maps/ctmap/gc.runGC() 2022-02-21T02:42:13.758096712Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:189 +0x6a5 2022-02-21T02:42:13.758103134Z github.com/cilium/cilium/pkg/maps/ctmap/gc.runGC() 2022-02-21T02:42:13.758109338Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:189 +0x6a5 2022-02-21T02:42:13.758127517Z github.com/cilium/cilium/pkg/maps/ctmap/gc.Enable.func1() 2022-02-21T02:42:13.758135513Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:90 +0x40a 2022-02-21T02:42:13.758141621Z github.com/cilium/cilium/pkg/maps/ctmap/gc.runGC() 2022-02-21T02:42:13.758147715Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:189 +0x6a5 2022-02-21T02:42:13.758153712Z github.com/cilium/cilium/pkg/maps/ctmap/gc.Enable.func1() 2022-02-21T02:42:13.758160127Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:90 +0x40a ``` Fixes: c9810bf ("ctmap: GC orphan SNAT entries") Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io>
/test-backport-1.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Update netlink library to not set XFRMA_IF_ID = 0 by default #18506 (@tklauser)
- Skipped the first commit as the unit test doesn't exist in v1.10.
- Re-updated the github.com/vishvananda/netlink dep from scratch using the same commit as on master.
That one LGTM, thanks Paul!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My commits LGTM.
We're only missing the review from Aditi but there weren't any conflicts on her changes, so probably fine to skip. Marking ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes for my PR look good. Thanks!
@seanmwinn It doesn't makes sense to add "needs-backport/1.9" to a backport PR like this. If you intended to raise a specific PR for backport, please go to the original PR and add the label there. I'll remove the label since it's unclear which PR from this set of already-backported PRs that you intended to be backported. |
Once this PR is merged, you can update the PR labels via: