llm-mlx

Support for MLX models in LLM.

Read my blog for background on this project.

Installation

Install this plugin in the same environment as LLM. This plugin likely only works on macOS.

llm install llm-mlx

Usage

To install an MLX model from Hugging Face, use the llm mlx download-model command. This example downloads 1.8GB of model weights from mlx-community/Llama-3.2-3B-Instruct-4bit:

llm mlx download-model mlx-community/Llama-3.2-3B-Instruct-4bit

Then run prompts like this:

llm -m mlx-community/Llama-3.2-3B-Instruct-4bit 'Capital of France?' -s 'you are a pelican'

The mlx-community organization is a useful source for compatible models.

Models to try

The following models all work well with this plugin:

mlx-community/Qwen2.5-0.5B-Instruct-4bit - 278MB
mlx-community/Mistral-7B-Instruct-v0.3-4bit - 4.08GB
mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit - 18.5GB
mlx-community/Llama-3.3-70B-Instruct-4bit - 40GB

Model options

MLX models can use the following model options:

-o max_tokens INTEGER: Maximum number of tokens to generate in the completion (defaults to 1024)
-o unlimited 1: Generate an unlimited number of tokens in the completion
-o temperature FLOAT: Sampling temperature (defaults to 0.8)
-o top_p FLOAT: Sampling top-p (defaults to 0.9)
-o min_p FLOAT: Sampling min-p (defaults to 0.1)
-o min_tokens_to_keep INT: Minimum tokens to keep for min-p sampling (defaults to 1)
-o seed INT: Random number seed to use

For example:

llm -m mlx-community/Llama-3.2-3B-Instruct-4bit 'Joke about pelicans' -o max_tokens 60 -o temperature 1.0

Using models from Python

You can use this plugin in Python like this:

from llm_mlx import MlxModel
model = MlxModel("mlx-community/Llama-3.2-3B-Instruct-4bit")
print(model.prompt("hi").text())
# Outputs: How can I assist you today?

Using MlxModel directly in this way avoids needing to first use the download-model command.

If you have already registered models with that command you can use them like this instead:

import llm
model = llm.get_model("mlx-community/Llama-3.2-3B-Instruct-4bit")
print(model.prompt("hi").text())

The LLM Python API documentation has more details on how to use LLM models.

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd llm-mlx
python -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

llm install -e '.[test]'

To run the tests:

python -m pytest

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
llm_mlx.py		llm_mlx.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

llm-mlx

Installation

Usage

Models to try

Model options

Using models from Python

Development

About

Uh oh!

Releases 5

Sponsor this project

Uh oh!

Packages < 3A58 /h2>
No packages published

Uh oh!

Contributors 2

Uh oh!

Languages

Uh oh!

License

simonw/llm-mlx

Folders and files

Latest commit

History

Repository files navigation

llm-mlx

Installation

Usage

Models to try

Model options

Using models from Python

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Sponsor this project

Uh oh!

Packages 0< 3A58 /h2> No packages published

Uh oh!

Contributors 2

Uh oh!

Languages

Packages < 3A58 /h2>
No packages published