-
Notifications
You must be signed in to change notification settings - Fork 3.2k
v1.16 Backports 2025-04-15 #38949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
v1.16 Backports 2025-04-15 #38949
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ upstream commit 2e3c0bb ] The startWhenNeeded() runs in a forked goroutine and thus may run concurrently to a Stop() which calls r.wg.Wait(). Thus we cannot manipulate the WaitGroup anymore. Refactor startWhenNeeded() to call wg.Done() either when short-circuiting or when the informer is stopped so that we don't need to call Add(1). This should fix the following panic: PANIC package: github.com/cilium/cilium/operator/pkg/bgpv2 • TestClusterConfigConditions panic: sync: WaitGroup is reused before previous Wait has returned [recovered] panic: sync: WaitGroup is reused before previous Wait has returned ... sync.(*WaitGroup).Wait(0x416abf?) /root/sdk/go1.24.1/src/sync/waitgroup.go:120 +0x74 github.com/cilium/cilium/pkg/k8s/resource.(*resource[...]).Stop(0x47e40c, {0xc0009d3bc8?, 0x4f0053?}) /host/pkg/k8s/resource/resource.go:356 +0x2a github.com/cilium/hive/cell.(*DefaultLifecycle).Stop(0xc000c44ea0, 0xc000c7f600, {0x3b64078?, 0xc00037dab0?}) /host/vendor/github.com/cilium/hive/cell/lifecycle.go:167 +0x2e6 github.com/cilium/hive.(*Hive).Stop(0xc000c39a40, 0xc000c7f600, {0x3b64078, 0xc00037dab0}) /host/vendor/github.com/cilium/hive/hive.go:374 +0xd9 github.com/cilium/cilium/operator/pkg/bgpv2.TestClusterConfigConditions.func1(0xc000bfc700) /host/operator/pkg/bgpv2/cluster_test.go:872 +0x358 I validated that this fixes the issue by doing an early return in the test case to force Resource[T].Stop() to race with Resource[T].startWhenNeeded: > --- a/operator/pkg/bgpv2/cluster_test.go > +++ b/operator/pkg/bgpv2/cluster_test.go > @@ -842,6 +842,8 @@ func TestClusterConfigConditions(t *testing.T) { > watchersReady() > + return I ran this short-circuited test 1000 times with "stress". Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 1dd7cea ] When trying to disable netfilter in the kernel and starting up Cilium with the below options, it fails as follows: kernel# cat .config | grep NETFILTER # CONFIG_NETFILTER is not set # ./daemon/cilium-agent --enable-ipv4=true --enable-ipv6=false \ --bpf-lb-algorithm=maglev --bpf-lb-maglev-table-size=2039 \ --bpf-lb-mode=dsr --bpf-lb-acceleration=native --devices=enp5s0 \ --bpf-lb-dsr-dispatch=ipip --disable-envoy-version-check=true \ --k8s-kubeconfig-path=$HOME/.kube/config \ --kube-proxy-replacement=true --routing-mode=native \ --enable-ipv4-masquerade=false --ipam=cluster-pool \ --enable-ipip-termination --install-iptables-rules=false \ --enable-l7-proxy=false [...] time=2025-04-07T10:40:05Z level=info msg="All Cilium CRDs have been found and are available" module=agent.infra.k8s-synced-crdsync time="2025-04-07T10:40:05.954428144Z" level=info msg="Local boot ID is \"5efaef07-4a21-4db1-82e7-05c73687cff8\"" subsys=node time=2025-04-07T10:40:05Z level=error msg="Start hook failed" function="*linux.devicesController.Start (agent.datapath.devices-controller)" error="creating netlink handle: protocol not supported" time=2025-04-07T10:40:05Z level=error msg="Failed to start hive" error="creating netlink handle: protocol not supported" duration=57.406398ms time=2025-04-07T10:40:05Z level=info msg="Stopping hive" time=2025-04-07T10:40:05Z level=info msg="agent.datapath.sysctl.job-reconcile (rev=8)" module=health time=2025-04-07T10:40:05Z level=info msg="agent.datapath.sysctl.job-refresh (rev=9)" module=health time=2025-04-07T10:40:05Z level=info msg="agent.infra.k8s-synced-crdsync.job-sync-crds (rev=7)" module=health time="2025-04-07T10:40:05.955728969Z" level=info msg="Stopped gops server" address="127.0.0.1:9890" subsys=gops time="2025-04-07T10:40:05.955744039Z" level=fatal msg="failed to start: creating netlink handle: protocol not supported" subsys=daemon The device controller itself doesn't need anything netfilter related, thus fix up the handle. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit bcbc27e ] Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit d82ee40 ] In case of problem with k8s connectivity, Reflector performs relist operation and DeltaFIFO generates Sync events for already existing elements and Delete events for removed elements. However, we were not handling events DeletedFinalStateUnknown which resulted in stale cache for services. Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 181d574 ] Add a section to help users to know how to update the existing ip pool in multi-pool ipam mode since it's not editable. Signed-off-by: Liyi Huang <liyi.huang@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit a92cbc7 ] Currently, only the validation result of the last policy rule is propagated to the CiliumNetworkPolicy/CiliumClusterwideNetworkPolicy status field. Fix this by joining all sanitization errors instead of just the last one in a list of rules. Fixes #38795 Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 5b22071 ] The new test exercises the reconciler when a batched operation fails and a retry is attempted. When retrying, the cilium/statedb reconciler calls the sequential operations (e.g: `Update` instead of `UpdateBatch`) for each item in the failed batch. Since the sequential operations are not expected to be called in the current ipset implementation, the reconciler panics: === RUN TestOpsRetry panic: Unexpectedly Update() called for reconciliation goroutine 53 [running]: github.com/cilium/cilium/pkg/datapath/iptables/ipset.(*ops).Update(0xc000929000?, {0x7f2a50ec86d8?, 0x7f2a97b32a78?}, {0x70?, 0xc000780008?}, 0xc0006508c0?) /home/pippolo/cilium/cilium/pkg/datapath/iptables/ipset/ops.go:77 +0x25 github.com/cilium/statedb/reconciler.(*incrementalRound[...]).processSingle(0x37926c0, 0xc000929000, 0x2, 0x0) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:214 +0x170 github.com/cilium/statedb/reconciler.(*incrementalRound[...]).processRetries(0x37926c0) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:190 +0x58 github.com/cilium/statedb/reconciler.(*reconciler[...]).incremental(0x3778660, {0x37469f0, 0xc000952ab0}, {0x371ddb0, 0xc0006443c0}, 0xc0006303a0) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:73 +0x2de github.com/cilium/statedb/reconciler.(*reconciler[...]).reconcileLoop(0x3778660, {0x37469f0, 0xc000952ab0}, {0x374b780, 0xc000988000}) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/reconciler.go:92 +0x568 github.com/cilium/hive/job.(*jobOneShot).start(0xc00093cde0, {0x37469f0, 0xc000952ab0}, 0x0?, {0x374b780, 0xc00093cae0}, {{{0x0, 0x0, 0x0}}, 0xc0003493e0, ...}) /home/pippolo/cilium/cilium/vendor/github.com/cilium/hive/job/oneshot.go:138 +0x4fd created by github.com/cilium/hive/job.(*group).Start.func1 in goroutine 50 /home/pippolo/cilium/cilium/vendor/github.com/cilium/hive/job/job.go:167 +0x172 FAIL github.com/cilium/cilium/pkg/datapath/iptables/ipset 0.233s FAIL This will be fixed in a subsequent commit. Related: 7b80f94 ("iptables/ipset: Use batch operations") Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>
/test |
@tklauser caould you please apply the following diff to #38890 backport? diff --git a/pkg/datapath/iptables/ipset/ipset_test.go b/pkg/datapath/iptables/ipset/ipset_test.go
index 9df1cc3728..f8fa7bec8f 100644
--- a/pkg/datapath/iptables/ipset/ipset_test.go
+++ b/pkg/datapath/iptables/ipset/ipset_test.go
@@ -11,6 +11,7 @@ import (
"fmt"
"html/template"
"io"
+ "log/slog"
"net/netip"
"strings"
"sync/atomic"
@@ -461,7 +462,7 @@ func TestOpsRetry(t *testing.T) {
newOps,
newReconciler,
),
- cell.Provide(func(logger *slog.Logger) *ipset {
+ cell.Provide(func(logger logrus.FieldLogger) *ipset {
return &ipset{
executable: funcExecutable(func(ctx context.Context, command string, stdin string, arg ...string) ([]byte, error) {
// fail the operation at the first attempt
@@ -486,7 +487,7 @@ func TestOpsRetry(t *testing.T) {
log := hivetest.Logger(t, hivetest.LogLevel(slog.LevelError))
- require.NoError(t, hive.Start(log, t.Context()))
+ require.NoError(t, hive.Start(log, context.Background()))
obj := &tables.IPSetEntry{
Name: CiliumNodeIPSetV4,
@@ -514,7 +515,7 @@ func TestOpsRetry(t *testing.T) {
<-watch
}
- require.NoError(t, hive.Stop(log, t.Context()))
+ require.NoError(t, hive.Stop(log, context.Background()))
}
func TestIPSetList(t *testing.T) { This should make the test compatible with the stable branch. |
[ upstream commit 0491518 ] When a cilium/statedb reconciler relies on batched operations and a failure happens, the subsequent retry calls the non-batched operations (e.g: `Update` instead of `UpdateBatch`). Therefore, to avoid a panic in case of a retry, we need to define the non-batched operations too for the ipset reconciler. Fixes: 7b80f94 ("iptables/ipset: Use batch operations") Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>
0e49224
to
1d5fdfc
Compare
@pippolo84 done, thanks for the patch! |
/test |
pippolo84
approved these changes
Apr 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
marseel
approved these changes
Apr 15, 2025
liyihuang
approved these changes
Apr 15, 2025
joamaki
approved these changes
Apr 17, 2025
10 tasks
jschwinger233
pushed a commit
that referenced
this pull request
Apr 23, 2025
[ upstream commit 5b22071 ] [ backporter's note: applied #38949 (comment) ] The new test exercises the reconciler when a batched operation fails and a retry is attempted. When retrying, the cilium/statedb reconciler calls the sequential operations (e.g: `Update` instead of `UpdateBatch`) for each item in the failed batch. Since the sequential operations are not expected to be called in the current ipset implementation, the reconciler panics: === RUN TestOpsRetry panic: Unexpectedly Update() called for reconciliation goroutine 53 [running]: github.com/cilium/cilium/pkg/datapath/iptables/ipset.(*ops).Update(0xc000929000?, {0x7f2a50ec86d8?, 0x7f2a97b32a78?}, {0x70?, 0xc000780008?}, 0xc0006508c0?) /home/pippolo/cilium/cilium/pkg/datapath/iptables/ipset/ops.go:77 +0x25 github.com/cilium/statedb/reconciler.(*incrementalRound[...]).processSingle(0x37926c0, 0xc000929000, 0x2, 0x0) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:214 +0x170 github.com/cilium/statedb/reconciler.(*incrementalRound[...]).processRetries(0x37926c0) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:190 +0x58 github.com/cilium/statedb/reconciler.(*reconciler[...]).incremental(0x3778660, {0x37469f0, 0xc000952ab0}, {0x371ddb0, 0xc0006443c0}, 0xc0006303a0) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:73 +0x2de github.com/cilium/statedb/reconciler.(*reconciler[...]).reconcileLoop(0x3778660, {0x37469f0, 0xc000952ab0}, {0x374b780, 0xc000988000}) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/reconciler.go:92 +0x568 github.com/cilium/hive/job.(*jobOneShot).start(0xc00093cde0, {0x37469f0, 0xc000952ab0}, 0x0?, {0x374b780, 0xc00093cae0}, {{{0x0, 0x0, 0x0}}, 0xc0003493e0, ...}) /home/pippolo/cilium/cilium/vendor/github.com/cilium/hive/job/oneshot.go:138 +0x4fd created by github.com/cilium/hive/job.(*group).Start.func1 in goroutine 50 /home/pippolo/cilium/cilium/vendor/github.com/cilium/hive/job/job.go:167 +0x172 FAIL github.com/cilium/cilium/pkg/datapath/iptables/ipset 0.233s FAIL This will be fixed in a subsequent commit. Related: 7b80f94 ("iptables/ipset: Use batch operations") Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: gray <greyschwinger@gmail.com> s
jschwinger233
pushed a commit
that referenced
this pull request
Apr 23, 2025
[ upstream commit 5b22071 ] [ backporter's note: applied #38949 (comment) ] The new test exercises the reconciler when a batched operation fails and a retry is attempted. When retrying, the cilium/statedb reconciler calls the sequential operations (e.g: `Update` instead of `UpdateBatch`) for each item in the failed batch. Since the sequential operations are not expected to be called in the current ipset implementation, the reconciler panics: === RUN TestOpsRetry panic: Unexpectedly Update() called for reconciliation goroutine 53 [running]: github.com/cilium/cilium/pkg/datapath/iptables/ipset.(*ops).Update(0xc000929000?, {0x7f2a50ec86d8?, 0x7f2a97b32a78?}, {0x70?, 0xc000780008?}, 0xc0006508c0?) /home/pippolo/cilium/cilium/pkg/datapath/iptables/ipset/ops.go:77 +0x25 github.com/cilium/statedb/reconciler.(*incrementalRound[...]).processSingle(0x37926c0, 0xc000929000, 0x2, 0x0) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:214 +0x170 github.com/cilium/statedb/reconciler.(*incrementalRound[...]).processRetries(0x37926c0) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:190 +0x58 github.com/cilium/statedb/reconciler.(*reconciler[...]).incremental(0x3778660, {0x37469f0, 0xc000952ab0}, {0x371ddb0, 0xc0006443c0}, 0xc0006303a0) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:73 +0x2de github.com/cilium/statedb/reconciler.(*reconciler[...]).reconcileLoop(0x3778660, {0x37469f0, 0xc000952ab0}, {0x374b780, 0xc000988000}) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/reconciler.go:92 +0x568 github.com/cilium/hive/job.(*jobOneShot).start(0xc00093cde0, {0x37469f0, 0xc000952ab0}, 0x0?, {0x374b780, 0xc00093cae0}, {{{0x0, 0x0, 0x0}}, 0xc0003493e0, ...}) /home/pippolo/cilium/cilium/vendor/github.com/cilium/hive/job/oneshot.go:138 +0x4fd created by github.com/cilium/hive/job.(*group).Start.func1 in goroutine 50 /home/pippolo/cilium/cilium/vendor/github.com/cilium/hive/job/job.go:167 +0x172 FAIL github.com/cilium/cilium/pkg/datapath/iptables/ipset 0.233s FAIL This will be fixed in a subsequent commit. Related: 7b80f94 ("iptables/ipset: Use batch operations") Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: gray <greyschwinger@gmail.com> s
jschwinger233
pushed a commit
that referenced
this pull request
Apr 23, 2025
[ upstream commit 5b22071 ] [ backporter's note: applied #38949 (comment) ] The new test exercises the reconciler when a batched operation fails and a retry is attempted. When retrying, the cilium/statedb reconciler calls the sequential operations (e.g: `Update` instead of `UpdateBatch`) for each item in the failed batch. Since the sequential operations are not expected to be called in the current ipset implementation, the reconciler panics: === RUN TestOpsRetry panic: Unexpectedly Update() called for reconciliation goroutine 53 [running]: github.com/cilium/cilium/pkg/datapath/iptables/ipset.(*ops).Update(0xc000929000?, {0x7f2a50ec86d8?, 0x7f2a97b32a78?}, {0x70?, 0xc000780008?}, 0xc0006508c0?) /home/pippolo/cilium/cilium/pkg/datapath/iptables/ipset/ops.go:77 +0x25 github.com/cilium/statedb/reconciler.(*incrementalRound[...]).processSingle(0x37926c0, 0xc000929000, 0x2, 0x0) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:214 +0x170 github.com/cilium/statedb/reconciler.(*incrementalRound[...]).processRetries(0x37926c0) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:190 +0x58 github.com/cilium/statedb/reconciler.(*reconciler[...]).incremental(0x3778660, {0x37469f0, 0xc000952ab0}, {0x371ddb0, 0xc0006443c0}, 0xc0006303a0) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:73 +0x2de github.com/cilium/statedb/reconciler.(*reconciler[...]).reconcileLoop(0x3778660, {0x37469f0, 0xc000952ab0}, {0x374b780, 0xc000988000}) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/reconciler.go:92 +0x568 github.com/cilium/hive/job.(*jobOneShot).start(0xc00093cde0, {0x37469f0, 0xc000952ab0}, 0x0?, {0x374b780, 0xc00093cae0}, {{{0x0, 0x0, 0x0}}, 0xc0003493e0, ...}) /home/pippolo/cilium/cilium/vendor/github.com/cilium/hive/job/oneshot.go:138 +0x4fd created by github.com/cilium/hive/job.(*group).Start.func1 in goroutine 50 /home/pippolo/cilium/cilium/vendor/github.com/cilium/hive/job/job.go:167 +0x172 FAIL github.com/cilium/cilium/pkg/datapath/iptables/ipset 0.233s FAIL This will be fixed in a subsequent commit. Related: 7b80f94 ("iptables/ipset: Use batch operations") Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: gray <greyschwinger@gmail.com> s
github-merge-queue bot
pushed a commit
that referenced
this pull request
Apr 24, 2025
[ upstream commit 5b22071 ] [ backporter's note: applied #38949 (comment) ] The new test exercises the reconciler when a batched operation fails and a retry is attempted. When retrying, the cilium/statedb reconciler calls the sequential operations (e.g: `Update` instead of `UpdateBatch`) for each item in the failed batch. Since the sequential operations are not expected to be called in the current ipset implementation, the reconciler panics: === RUN TestOpsRetry panic: Unexpectedly Update() called for reconciliation goroutine 53 [running]: github.com/cilium/cilium/pkg/datapath/iptables/ipset.(*ops).Update(0xc000929000?, {0x7f2a50ec86d8?, 0x7f2a97b32a78?}, {0x70?, 0xc000780008?}, 0xc0006508c0?) /home/pippolo/cilium/cilium/pkg/datapath/iptables/ipset/ops.go:77 +0x25 github.com/cilium/statedb/reconciler.(*incrementalRound[...]).processSingle(0x37926c0, 0xc000929000, 0x2, 0x0) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:214 +0x170 github.com/cilium/statedb/reconciler.(*incrementalRound[...]).processRetries(0x37926c0) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:190 +0x58 github.com/cilium/statedb/reconciler.(*reconciler[...]).incremental(0x3778660, {0x37469f0, 0xc000952ab0}, {0x371ddb0, 0xc0006443c0}, 0xc0006303a0) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:73 +0x2de github.com/cilium/statedb/reconciler.(*reconciler[...]).reconcileLoop(0x3778660, {0x37469f0, 0xc000952ab0}, {0x374b780, 0xc000988000}) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/reconciler.go:92 +0x568 github.com/cilium/hive/job.(*jobOneShot).start(0xc00093cde0, {0x37469f0, 0xc000952ab0}, 0x0?, {0x374b780, 0xc00093cae0}, {{{0x0, 0x0, 0x0}}, 0xc0003493e0, ...}) /home/pippolo/cilium/cilium/vendor/github.com/cilium/hive/job/oneshot.go:138 +0x4fd created by github.com/cilium/hive/job.(*group).Start.func1 in goroutine 50 /home/pippolo/cilium/cilium/vendor/github.com/cilium/hive/job/job.go:167 +0x172 FAIL github.com/cilium/cilium/pkg/datapath/iptables/ipset 0.233s FAIL This will be fixed in a subsequent commit. Related: 7b80f94 ("iptables/ipset: Use batch operations") Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: gray <greyschwinger@gmail.com> s
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport/1.16
This PR represents a backport for Cilium 1.16.x of a PR that was merged to main.
kind/backports
This PR provides functionality previously merged into master.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Once this PR is merged, a GitHub action will update the labels of these PRs: