-
Notifications
You must be signed in to change notification settings - Fork 444
default caching #1527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
default caching #1527
Conversation
24ee1ee
to
11eb9e8
Compare
return amountInGB * 1024 | ||
} | ||
AmountOfEvents := func(amountInMB int) int { | ||
return MBtoKB(KBtoB(amountInMB)) / eventSize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rafaeldtinoco FYI shouldn't this beKbtoB(MBtoKB(amountInMB))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, asking you as was part of your commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be if eventSize was 1 and not 1024 (if I got what you mean correctly).
If I did that, then eventSize would always have to be 1024 multiple (1k, 2k, 4k). The way it is, number can be 768 or 1280 (which makes a big difference at the end).
The most correct thing to do here would be to calculate the eventSize based on an existing number (trace.Event) BUT I'm afraid that it is more complex than just that (considering garbage collection and many other overhead from runtime).
I measured different cache sizes and found out 1024 was something close to a constant for calculating cache sizes VS amount of events (this won't be true if we start having other event types in the future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I am saying is that we are using function KBtoB(amountInMB)
and accepting something which is in units of MB. I don't see any constellation of input where this could be correct. I think should simply be the opposite:
KbtoB(MBtoKB(amountInMB))
return AmountOfEvents(q.eventsCacheMemSizeMB) | ||
} | ||
|
||
switch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we get here only if q.eventsCacheMemSizeMB < = 0
and the only relevant switch case will always be the first one (case q.eventsCacheMemSizeMB <= GBtoMB(1)
) . what am I missing? again, asking it here as this was part of your commits @rafaeldtinoco and maybe I am loosing something important in the logics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you provide a cache size ("mem-cache-size" cmdline option), then you know what you are doing... q.eventsCacheMemSizeMB won't be 0 (default) and I'll return the amount of events for the given cache size (no automatic cache size calculation is needed).
Now... if you ONLY provided "cache-type", but no "mem-cache-size", then q.eventsCacheMemSizeMB will be 0 AND tracee calculates the amount of events for a specific cache size. Cache size depends on the amount of existing host/node memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure we understood each other. the logics before checks
if q.eventsCacheMemSizeMB > 0 {
return AmountOfEvents(q.eventsCacheMemSizeMB)
}
if it doesn't enter in the if statement will ALWAYS enter in this case case q.eventsCacheMemSizeMB <= GBtoMB(1):
(as q.eventsCacheMemSizeMB will ALWAYS be <= 0).
so what's the point of the other switch conditions?
1b7226c
to
7b82e01
Compare
Under some circumstances, tracee-rules might be slower to consume events than tracee-ebpf is capable of generating them. This requires tracee-ebpf to deal with this possible lag, but, at the same time, perf-buffer consumption can't be left behind (or important events coming from the kernel might be loss, causing detection misses). There are 3 variables connected to this issue: 1) perf buffer could be increased to hold very big amount of memory pages: The problem with this approach is that the requested space, to perf-buffer, through libbpf, has to be contiguous and it is almost impossible to get very big contiguous allocations through mmap after a node is running for some time. 2) raising the events channel buffer to hold a very big amount of events: The problem with this approach is that the overhead of dealing with that amount of buffers, in a golang channel, causes event losses as well. It means this is not enough to relief the pressure from kernel events into perf-buffer. 3) create an internal, to tracee-ebpf, buffer based on the node size. This commit introduces (3): A generic interface for caching events in the pipeline. For now, there is a single interface implementation: an in-memory caching mechanism with cmdline settings for cache type and possible cache options. Needed tests were added to guarantee cmdline options are always right. In a near future, we may add new caching backends, such as offloading events to filesystem, to a broken, etc.
7b82e01
to
f7ec5c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your changes look good to me. If you're satisfied with my comments then feel free to merge. If not, then we can discuss things further.
return AmountOfEvents(q.eventsCacheMemSizeMB) | ||
} | ||
|
||
switch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you provide a cache size ("mem-cache-size" cmdline option), then you know what you are doing... q.eventsCacheMemSizeMB won't be 0 (default) and I'll return the amount of events for the given cache size (no automatic cache size calculation is needed).
Now... if you ONLY provided "cache-type", but no "mem-cache-size", then q.eventsCacheMemSizeMB will be 0 AND tracee calculates the amount of events for a specific cache size. Cache size depends on the amount of existing host/node memory.
return amountInGB * 1024 | ||
} | ||
AmountOfEvents := func(amountInMB int) int { | ||
return MBtoKB(KBtoB(amountInMB)) / eventSize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be if eventSize was 1 and not 1024 (if I got what you mean correctly).
If I did that, then eventSize would always have to be 1024 multiple (1k, 2k, 4k). The way it is, number can be 768 or 1280 (which makes a big difference at the end).
The most correct thing to do here would be to calculate the eventSize based on an existing number (trace.Event) BUT I'm afraid that it is more complex than just that (considering garbage collection and many other overhead from runtime).
I measured different cache sizes and found out 1024 was something close to a constant for calculating cache sizes VS amount of events (this won't be true if we start having other event types in the future).
BTW, @mtcherni95, whenever you do git logs/commits, always use imperative wording:
Cheers! |
@mtcherni95 Is this ready to merge? It seems like there's still discussion. |
Hi, I'd prefer to wait for Rafael's answer first. |
No description provided.