-
Notifications
You must be signed in to change notification settings - Fork 444
tracee-ebpf: use cgroup id for container id resolution #1130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracee-ebpf: use cgroup id for container id resolution #1130
Conversation
Good thing about having BTFHub embedded files now is that after rebasing with my dev branch Im able to quickly test it in multiple kernels... with that said, it looks like this code has issues loading in 5.4 kernel:
It works fine in kernel 5.8. Checking the code now... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall change is VERY nice. I liked the approach of having cgroup_id <-> container_id relationship maintained by the cgroup events. After you deal with the 5.4 relocation error, I'm ready to +1.
2450c30
to
a8c57dd
Compare
@yanivagman, Seems that I was able to add the type relocation feature in btfgen: kernfs_node->id is an union in kernels <= 5.4:
kernfs_node->id is an integer in kernels > 5.4:
So let's consider that part "okay" (unless there is a bug in my code). Now, I would like to mention something else observed: |
could you try to use the FULL BTF file (from BTFHub) in a 5.4.0 kernel and see if you're getting the "container_id" value ?
doing a bpf_printk() i was able to get the following:
but the container_id is zeroed. |
I'm able to get the container id if using the full BTF file for 5.13 kernel, for example:
|
Thanks @rafaeldtinoco for pointing me to this problem! The problem is that bpf_get_current_cgroup_id() helper returns cgroup v2 id, which doesn't match any of the cgroup ids found while walking cgroupfs (v1). The solution I have in mind for this:
WDYT? |
One more point that we should add to the docs: |
a8c57dd
to
80b8c63
Compare
80b8c63
to
d850e35
Compare
Note about my recent optimization in the last commit (cgroup v1 XOR v2, not both), taken from https://medium.com/nttlabs/cgroup-v2-596d035be4d7 : "cgroup v1 and v2 are incompatible and can’t be enabled simultaneously. Although there is “hybrid” configuration that allows mounting both v1 hierarchy and v2 hierarchy, the “hybrid” mode is underutilized for containers because you can’t enable v2 controllers that are already enabled for v1." |
Thanks @yanivagman, I'm reviewing this today and will let you know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it seems to be working now. I agree to the rational discussed about cgroups v1 and this commits looks good to me. A SPECIAL ATTENTION to this commit is that we're starting to use type-based checks (which will require btfgen to support type-based relocations).
I'll provide the changes at:
Yes, I think that is the secret. To rely only in a single cgroup subset and, yep, seems like cpuset is the one that should be taken. |
Sorry I accidentally closed the PR. Re-opened it. |
This PR changes the way we extract container id for events.
The current solution has several problems:
To solve these problems, let's move to use cgroup id to get a container id:
close #473
fix #958