8000 Don't perform memory check if client sets use_mmap true. by rick-github · Pull Request #8895 · ollama/ollama · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Don't perform memory check if client sets use_mmap true. #8895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

rick-github
Copy link
Collaborator
@rick-github rick-github commented Feb 6, 2025

If the client overrides use_mmap, don't prevent the model from loading due to apparent over-commit.

On Linux, a mmap'd file doesn't use swap backing store unless modified, so there's no need for the check. Windows has dynamic swap and so falls i 8000 n to the same bucket as darwin. Inference on deepseek-r1:671b-1.5b runs at ~0.15 t/s where the model requires swap on SSD, ~0.3 t/s with mmap instead of swap on the the same SSD, and ~1.4 t/s when the model is mapped on an NVME drive.

Also add OLLAMA_USE_MMAP for global configuration.

@DrShadow34
Copy link

Any chance that will be merged in one regular human lifetime?

@jmv2009
Copy link
jmv2009 commented Mar 20, 2025

This one I circumvent with generating a large zram swap, which is useful anyway. I normally load the models into a ramdrive anyway on a live linux. Then the models are already in memory, and duplication is avoided with mmap. I actually need to modify line 213 as well to not get no_mmap.

In this scenario line 213 acts insane: If the model is so small that it fits again into memory, it works, but uses mmap, and it actually does not duplicate and does not end up using that extra memory. If the model is so big that it does not fit again, it does not use mmap, and, with the zram swap, runs out of memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0