[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vm.max_map_count to startup checks #17078

Open
henrikingo opened this issue Nov 27, 2024 · 7 comments
Open

Add vm.max_map_count to startup checks #17078

henrikingo opened this issue Nov 27, 2024 · 7 comments

Comments

@henrikingo
Copy link

Problem Statement

CrateDB has a user friendly feature where it will check various ulimit settings and log an error if they are not changed to meaningful scalable database-in-production values. Recently we had a case where we would hit failures due to too low vm.max_map_count. For some reason this wasn't caught in the startup checks.

Possible Solutions

Add vm.max_map_count to the startup checks already done to other ulimit or other configuration.

Considered Alternatives

No response

@mfussenegger
Copy link
Member

It's already part of the startup checks:

@Override
public BootstrapCheckResult check(final Settings settings) {
// we only enforce the check if a store is allowed to use mmap at all
if (IndexModule.NODE_STORE_ALLOW_MMAP.getWithFallback(settings)) {
long maxMapCount = getMaxMapCount();
if (maxMapCount != -1 && maxMapCount < LIMIT) {
final String message = String.format(
Locale.ROOT,
"max virtual memory areas vm.max_map_count [%d] is too low, " +
"increase to at least [%d] by adding `vm.max_map_count = 262144` to `/etc/sysctl.conf` " +
"or invoking `sysctl -w vm.max_map_count=262144`",
maxMapCount,
LIMIT);
return BootstrapCheckResult.failure(message);
} else {
return BootstrapCheckResult.success();
}
} else {
return BootstrapCheckResult.success();
}
}

https://cratedb.com/docs/guide/admin/bootstrap-checks.html

@mfussenegger
Copy link
Member

Whoops, didn't mean to close.
If people managed to get past the check but still ran into the issue it could be an indication that we'd need to bump the required value further. Downside of that is that it would be a breaking change for anyone already running CrateDB with a value that's currently passing the check and sufficient for their amount of shards.

It would also help to know what the actual configurated value was and how much they had to increase it to resolve it.

@henrikingo
Copy link
Author

Summoning @romseygeek who was also on the same call, do you happen to remember? (And if not, do you mind just asking the customer in the Slack channel. I'm happy to do it just trying to remove an unnecessary back and forth.)

I failed to google it, so just asking here: what is our max file size for the lucene index files? It seems like we are trying to ensure a node can hold and mmap enough files so that 4GB * "enough files" = about 10 TB? (Enough files = 262144).

@henrikingo
Copy link
Author

Also noting while waiting for response to the previous: We also at some point suspected vm.overcommit_memory to be the culprit, but concluded it was not. Just noting it for the record. (Given that map_max_count was already checked...)

@mfussenegger
Copy link
Member
mfussenegger commented Dec 2, 2024

what is our max file size for the lucene index files? It seems like we are trying to ensure a node can hold and mmap enough files so that 4GB * "enough files" = about 10 TB? (Enough files = 262144).

The number of mappings depends on the number of segments and their size.
Since Lucene 9.10 it mmap's Lucene indexes in chunks of 16 GB (before it was 1 GB).
(CrateDB 5.7 is the first release with Lucene 9.10)

So in an ideal scenario:

>>> 262144 * 16GB -> TB

  262144 × 16 GB ➞ TB

   = 4194.3 TB

Caveat that you need to subtract quite a bit from that for other files, and more importantly because not all segments will have their max sizes depending on how merges are going on the system. Merges can require additional files. And the mapping is also not a clear 1:1. Could be that you end up with 2 mapped files per segment. Or if transparent huge pages are enabled it's the other way around. cat /proc/<pid>/maps gives some insights into that.

But it should still be plenty to handle the typical hardware setup.

@romseygeek
Copy link
Contributor

IIRC because they had created completely new installs, it was set to the system default of 65530; they had updated the value to 262144 in the various conf files but hadn't restarted the node, so the new value hadn't taken. Possibly this means that /proc/sys/vm/max_map_count was reporting the new version, so the bootstrap check didn't fail?

@mfussenegger
Copy link
Member

chunks of 16 GB (before it was 1 GB).

Small correction, GiB not GB.

>>> 262144 * 16GiB -> TB

  262144 × 16 GiB ➞ TB

   = 4503.6 TB

they had updated the value to 262144 in the various conf files but hadn't restarted the node, so the new value hadn't taken. Possibly this means that /proc/sys/vm/max_map_count was reporting the new version, so the bootstrap check didn't fail?

If it was changed via sysctl -w vm.max_map_count=262144 it should take immediate effect, no reboot required.
And afaik /proc/sys/vm/max_map_count shows the currently active value, not a configured but unapplied value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants