a tiny vision language model that kicks ass and runs anywhere
Image | Example | ||||
---|---|---|---|---|---|
What is the girl doing? The girl is sitting at a table and eating a large hamburger. What color is the girl's hair? The girl's hair is white. |
|||||
What is this? This is a computer server rack, which is a device used to store and manage multiple computer servers. The rack is filled with various computer servers, each with their own dedicated space and power supply. The servers are connected to the rack via multiple cables, indicating that they are part of a larger system. The rack is placed on a carpeted floor, and there is a couch nearby, suggesting that the setup is in a living or entertainment area. What is behind the stand? Behind the stand, there is a brick wall. |
Model | Precision | Download Size | Memory Usage | Best For | Download Link |
---|---|---|---|---|---|
Moondream 2B | int8 | 1,733 MiB | 2,624 MiB | General use, best quality | Download |
Moondream 0.5B | int8 | 593 MiB | 996 MiB | Edge devices, faster speed | Download |
First, install the client library:
pip install moondream==0.0.5
The recommended way to use the latest version of Moondream is through our Python client library:
import moondream as md
from PIL import Image
# Initialize with local model path. Can also read .mf.gz files, but we recommend decompressing
# up-front to avoid decompression overhead every time the model is initialized.
model = md.vl(model="path/to/moondream-2b-int8.mf")
# Load and process image
image = Image.open("path/to/image.jpg")
encoded_image = model.encode_image(image)
# Generate caption
caption = model.caption(encoded_image)["caption"]
print("Caption:", caption)
# Ask questions
answer = model.query(encoded_image, "What's in this image?")["answer"]
print("Answer:", answer)
For complete documentation of the Python client, including cloud API usage and additional features, see the Python Client README.
For JavaScript/TypeScript developers, we offer a full-featured Node.js client library. See the Node.js Client README for installation and usage instructions.
The Hugging Face hub version tracks the last official release of the 2B model. While more stable, it doesn't include the latest features or support for the 0.5B model. Use this if you need GPU acceleration or prefer the transformers ecosystem:
First, install the required packages:
pip install transformers torch einops
from transformers import AutoModelForCausalLM
from PIL import Image
model = AutoModelForCausalLM.from_pretrained(
"vikhyatk/moondream2",
revision="2025-01-09",
trust_remote_code=True,
# Uncomment to run on GPU.
# device_map={"": "cuda"}
)
image = Image.open('<IMAGE_PATH>')
enc_image = model.encode_image(image)
print(model.query(enc_image, "Describe this image."))