8000 default caching by mtcherni95 · Pull Request #1527 · aquasecurity/tracee · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

default caching #1527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 21, 2022
Merged

Conversation

mtcherni95
Copy link
Contributor

No description provided.

@mtcherni95 mtcherni95 changed the title Michael default caching default caching Mar 3, 2022
@mtcherni95 mtcherni95 force-pushed the michael-default-caching branch 5 times, most recently from 24ee1ee to 11eb9e8 Compare March 3, 2022 09:51
return amountInGB * 1024
}
AmountOfEvents := func(amountInMB int) int {
return MBtoKB(KBtoB(amountInMB)) / eventSize
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaeldtinoco FYI shouldn't this beKbtoB(MBtoKB(amountInMB))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, asking you as was part of your commits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be if eventSize was 1 and not 1024 (if I got what you mean correctly).

If I did that, then eventSize would always have to be 1024 multiple (1k, 2k, 4k). The way it is, number can be 768 or 1280 (which makes a big difference at the end).

The most correct thing to do here would be to calculate the eventSize based on an existing number (trace.Event) BUT I'm afraid that it is more complex than just that (considering garbage collection and many other overhead from runtime).

I measured different cache sizes and found out 1024 was something close to a constant for calculating cache sizes VS amount of events (this won't be true if we start having other event types in the future).

Copy link
Contributor Author
@mtcherni95 mtcherni95 Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I am saying is that we are using function KBtoB(amountInMB) and accepting something which is in units of MB. I don't see any constellation of input where this could be correct. I think should simply be the opposite:
KbtoB(MBtoKB(amountInMB))

return AmountOfEvents(q.eventsCacheMemSizeMB)
}

switch {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we get here only if q.eventsCacheMemSizeMB < = 0 and the only relevant switch case will always be the first one (case q.eventsCacheMemSizeMB <= GBtoMB(1)) . what am I missing? again, asking it here as this was part of your commits @rafaeldtinoco and maybe I am loosing something important in the logics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you provide a cache size ("mem-cache-size" cmdline option), then you know what you are doing... q.eventsCacheMemSizeMB won't be 0 (default) and I'll return the amount of events for the given cache size (no automatic cache size calculation is needed).

Now... if you ONLY provided "cache-type", but no "mem-cache-size", then q.eventsCacheMemSizeMB will be 0 AND tracee calculates the amount of events for a specific cache size. Cache size depends on the amount of existing host/node memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we understood each other. the logics before checks

if q.eventsCacheMemSizeMB > 0 {
		return AmountOfEvents(q.eventsCacheMemSizeMB)
	}

if it doesn't enter in the if statement will ALWAYS enter in this case case q.eventsCacheMemSizeMB <= GBtoMB(1): (as q.eventsCacheMemSizeMB will ALWAYS be <= 0).

so what's the point of the other switch conditions?

@mtcherni95 mtcherni95 force-pushed the michael-default-caching branch from 1b7226c to 7b82e01 Compare March 6, 2022 08:32
rafaeldtinoco and others added 6 commits March 7, 2022 13:23
Under some circumstances, tracee-rules might be slower to consume
events than tracee-ebpf is capable of generating them. This requires
tracee-ebpf to deal with this possible lag, but, at the same time,
perf-buffer consumption can't be left behind (or important events coming
from the kernel might be loss, causing detection misses).

There are 3 variables connected to this issue:

1) perf buffer could be increased to hold very big amount of memory
   pages: The problem with this approach is that the requested space,
   to perf-buffer, through libbpf, has to be contiguous and it is almost
   impossible to get very big contiguous allocations through mmap after
   a node is running for some time.

2) raising the events channel buffer to hold a very big amount of
   events: The problem with this approach is that the overhead of
   dealing with that amount of buffers, in a golang channel, causes
   event losses as well. It means this is not enough to relief the
   pressure from kernel events into perf-buffer.

3) create an internal, to tracee-ebpf, buffer based on the node size.

This commit introduces (3): A generic interface for caching events in
the pipeline.

For now, there is a single interface implementation: an in-memory
caching mechanism with cmdline settings for cache type and possible
cache options. Needed tests were added to guarantee cmdline options are
always right.

In a near future, we may add new caching backends, such as offloading
events to filesystem, to a broken, etc.
@mtcherni95 mtcherni95 force-pushed the michael-default-caching branch from 7b82e01 to f7ec5c5 Compare March 7, 2022 13:23
Copy link
Contributor
@rafaeldtinoco rafaeldtinoco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your changes look good to me. If you're satisfied with my comments then feel free to merge. If not, then we can discuss things further.

return AmountOfEvents(q.eventsCacheMemSizeMB)
}

switch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you provide a cache size ("mem-cache-size" cmdline option), then you know what you are doing... q.eventsCacheMemSizeMB won't be 0 (default) and I'll return the amount of events for the given cache size (no automatic cache size calculation is needed).

Now... if you ONLY provided "cache-type", but no "mem-cache-size", then q.eventsCacheMemSizeMB will be 0 AND tracee calculates the amount of events for a specific cache size. Cache size depends on the amount of existing host/node memory.

return amountInGB * 1024
}
AmountOfEvents := func(amountInMB int) int {
return MBtoKB(KBtoB(amountInMB)) / eventSize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be if eventSize was 1 and not 1024 (if I got what you mean correctly).

If I did that, then eventSize would always have to be 1024 multiple (1k, 2k, 4k). The way it is, number can be 768 or 1280 (which makes a big difference at the end).

The most correct thing to do here would be to calculate the eventSize based on an existing number (trace.Event) BUT I'm afraid that it is more complex than just that (considering garbage collection and many other overhead from runtime).

I measured different cache sizes and found out 1024 was something close to a constant for calculating cache sizes VS amount of events (this won't be true if we start having other event types in the future).

@rafaeldtinoco
Copy link
Contributor

BTW, @mtcherni95, whenever you do git logs/commits, always use imperative wording:

  1. Limit the subject line to 50 characters.
  2. Capitalize only the first letter in the subject line.
  3. Don't put a period at the end of the subject line.
  4. Put a blank line between the subject line and the body.
  5. Wrap the body at 72 characters.
  6. Use the imperative mood.
  7. Describe what was done and why, but not how.

Cheers!

@rafaeldtinoco rafaeldtinoco mentioned this pull request Mar 15, 2022
@grantseltzer
Copy link
Contributor

@mtcherni95 Is this ready to merge? It seems like there's still discussion.

@mtcherni95
Copy link
Contributor Author

@mtcherni95 Is this ready to merge? It seems like there's still discussion.

Hi, I'd prefer to wait for Rafael's answer first.

@rafaeldtinoco rafaeldtinoco merged commit af9e2dc into aquasecurity:main Mar 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0