-
Notifications
You must be signed in to change notification settings - Fork 1.5k
OOM handling
Many people think that in GNU/Linux malloc and friends can never return NULL. This is not 100% accurate. The description below applies to tcmalloc, but is partially applicable for other malloc implementations and flavors of Linux.
Indeed, in most cases Linux does virtual memory overcommit (e.g. see https://www.kernel.org/doc/Documentation/vm/overcommit-accounting). So app may successfully sbrk or mmap some memory and then some time later trigger OOM-killer when first touching this memory. I.e. because sbrk and mmap typically don't allocate physical pages. Actual page allocation happens on minor page fault, when virtual page is first touched. And since there is nowhere to return ENOMEM in this case, the kernel just kills one process to reclaim memory (google "linux oom killer" for some more details).
In tcmalloc, freshly allocated virtual pages are usually touched in tcmalloc itself when it tries to split large chunk of ram into small objects of specific class size. So in this case the app will not see NULL from malloc, but will die instead.
Note that machines equipped with swap will likely get into severe swap storm well before hitting oom condition. Which tends to nearly lock-up the machine in unusable state.
Modern world of containers changes this somewhat. Usually, container exceeding its memory limit is killed.
tcmalloc propagates ENOMEM from kernel syscalls down to app. And there are numerous reasons for kernel to return ENOMEM.
When such condition is reached reached, oom condition occurs. So malloc will return NULL. Similarly, nothrow version of operator new will also return NULL. And regular operator new will invoke new_handler (if set) and raise exception.
When mlockall is set, all memory allocations populate virtual pages with actual physical pages. So kernel may run out of memory trying to do that and will return ENOMEM.
This is logical limit enforced by gperftools version of tcmalloc. Hitting this self-imposed limit will trigger oom handling as described above. "Abseil" tcmalloc (at guthub.com/google/tcmalloc) has a similar feature (but they have made a choice to always crash on ooms).
Note that most mallocs will work around RLIMIT_DATA (which makes sbrk fail) by using mmap (which isn't counted against this limit). But in tcmalloc it is possible to explicitly disable mmap allocation and hit OOM condition on RLIMIT_DATA. RLIMIT_AS should make all mallocs fails on reaching address space limit.
With today's systems routinely equipped with far more than 4 gigs of ram it is not hard for programs to allocate 4 gigs and more without exhausting physical ram. But since 32-bit programs have limited address space, they'll hit ENOMEM when reaching 4 gig limit.
Certain kinds of address space abuse in 64 bit programs could in theory trigger ENOMEM too. I.e. because most 64-bit cpus don't really have fully usable 64-bits of address space. Usually something like 48 bits.
tcmalloc allows programs to override system allocator. I.e. replace sbrk or mmap call with whatever app wants. Such system allocator may return NULL or crash at it's choice using it's own definition of out of memory condition.
Linux overcommit has it's limits. And it is also possible to disable it completely. In the later case every allocated page, regardless of whether it has backing physical page, is assumed to consume system memory. And thus attempt to allocate more pages may hit system's limit and trigger ENOMEM.
But even when overcommit is enabled, it is not unlimited. Programs may eat into overcommit "reserve" by mmap-ing large regions of memory and not touching them. Or having large .bss section and not touching it. When system's overcommit limit is reached, the kernel returns ENOMEM and triggers usual oom handling in tcmalloc as described above.
Larger malloc requests tend to hit this case.
The author of this page thinks that it doesn't matter. But regardless of that, gperftools will keep maintaining "correct" OOM handling for that tiny fraction of programs that may need it.
In theory malloc returning NULL instead of crashing may help program survive. E.g. in the case of http or rpc server, some memory-expensive requests will fail, but less expensive requests will be handled. This is of course if program 100% correctly deals with memory failures, which is hard to implement (especially if your c++ program is built with -fno-exceptions) and even harder to test.
Also, failing memory allocation or calling special out of memory handler (e.g. std::set_new_handler) may allow program to free some memory and retry memory allocation. Doing this correctly is nearly impossible though. new handler will be invoked under all kinds of locks, so what it can do is very limited (e.g. most likely at most releasing predefined set of objects a-la weak references). And most (if not all) malloc implementations cannot guarantee that such freeing will actually make enough usable space for original allocation to succeed. Original memcached tried this approach (incoming "set" operation would try to malloc for incoming value, and in case of failure it would free up to 50 most inactive values to free up space and then retry) and found it to be suboptimal.
In practice, operating at near limit will hurt malloc's fragmentation more and more. Resulting in decreasing of effective memory capacity of the process over time. So this isn't necessarily good choice too.