Swama is a high-performance machine learning runtime written in pure Swift, designed specifically for macOS and built on Apple's MLX framework. It provides a powerful and easy-to-use solution for local LLM (Large Language Model) and VLM (Vision Language Model) inference.
- π High Performance: Built on Apple MLX framework, optimized for Apple Silicon
- π OpenAI Compatible API: Standard
/v1/chat/completions
endpoint support - π± Menu Bar App: Elegant macOS native menu bar integration
- π» Command Line Tools: Complete CLI support for model management and inference
- πΌοΈ Multimodal Support: Support for both text and image inputs
- π¦ Smart Model Management: Automatic downloading, caching, and version management
- π Streaming Responses: Real-time streaming text generation support
- π HuggingFace Integration: Direct model downloads from HuggingFace Hub
Swama features a modular architecture design:
- SwamaKit: Core framework library containing all business logic
- Swama CLI: Command-line tool providing complete model management and inference functionality
- Swama.app: macOS menu bar application with graphical interface and background services
- macOS 14.0 or later
- Apple Silicon (M1/M2/M3)
- Xcode 15.0+ (for compilation)
- Swift 6.1+
-
Download the latest release
- Go to Releases
- Download
Swama.zip
from the latest release - Extract the zip file
-
Install the app
# Move to Applications folder mv Swama.app /Applications/ # Launch the app open /Applications/Swama.app
Note: On first launch, macOS may show a security warning. If this happens:
- Go to System Preferences > Security & Privacy > General
- Click "Open Anyway" next to the Swama app message
- Or right-click the app and select "Open" from the context menu
-
Install CLI tools
- Open Swama from the menu bar
- Click "Install Command Line Toolβ¦" to add
swama
command to your PATH
For developers who want to build from source:
# Clone the repository
git clone https://github.com/Trans-N-ai/swama.git
cd swama
# Build CLI tool
swift build -c release
sudo cp .build/release/swama /usr/local/bin/
# Build macOS app (requires Xcode)
cd swama-macos/Swama
xcodebuild -project Swama.xcodeproj -scheme Swama -configuration Release
After installing Swama.app, you can use either the menu bar app or command line:
# Use short aliases instead of full model names - auto-downloads if needed!
swama run qwen3 "Hello, AI"
swama run llama3.2 "Tell me a joke"
swama run deepseek-r1 "Explain quantum computing"
# Traditional way (also works)
swama run mlx-community/Llama-3.2-1B-Instruct-4bit "Hello, how are you?"
# List downloaded models
swama list
β¨ Smart Features:
- Model Aliases: Use friendly names like
qwen3
,llama3.2
,deepseek-r1
instead of long URLs - Auto-Download: Models are automatically downloaded on first use - no need to
pull
first! - Cache Management: Downloaded models are cached for future use
Alias | Full Model Name | Description |
---|---|---|
qwen3 |
mlx-community/Qwen3-8B-4bit |
Qwen3 8B (default) |
qwen3-1.7b |
mlx-community/Qwen3-1.7B-4bit |
Qwen3 1.7B (lightweight) |
llama3.2 |
mlx-community/Llama-3.2-3B-Instruct-4bit |
Llama 3.2 3B (default) |
llama3.2-1b |
mlx-community/Llama-3.2-1B-Instruct-4bit |
Llama 3.2 1B (fastest) |
deepseek-r1 |
mlx-community/DeepSeek-R1-0528-4bit |
DeepSeek R1 (reasoning) |
deepseek-coder |
mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit-mlx |
DeepSeek Coder |
qwen2.5 |
mlx-community/Qwen2.5-7B-Instruct-4bit |
Qwen 2.5 7B |
# Or start without specifying model (can switch via API)
swama serve --host 0.0.0.0 --port 28100
# Launch menu bar application
swama menubar
Swama provides a fully OpenAI-compatible API endpoint, allowing you to use it with existing tools and integrations:
# Get available models
curl http://localhost:28100/v1/models
# Chat completion using aliases (auto-downloads if needed)
curl -X POST http://localhost:28100/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3",
"messages": [
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 100
}'
# Streaming response with DeepSeek R1
curl -X POST http://localhost:28100/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1",
"messages": [
{"role": "user", "content": "Solve this step by step: What is 15% of 240?"}
],
"stream": true
}'
Since Swama provides OpenAI-compatible endpoints, you can easily integrate it with popular community tools:
π€ AI Coding Assistants:
# Continue.dev - Add to config.json
{
"models": [{
"title": "Swama Local",
"provider": "openai",
"model": "qwen3",
"apiBase": "http://localhost:28100/v1"
}]
}
# Cursor - Set custom API endpoint
# API Base URL: http://localhost:28100/v1
# Model: qwen3 or deepseek-coder
π¬ Chat Interfaces:
# Open WebUI (formerly Ollama WebUI)
# Add OpenAI API connection:
# Base URL: http://localhost:28100/v1
# API Key: not-required
# LibreChat
# Add to .env file:
OPENAI_API_KEY=not-required
OPENAI_REVERSE_PROXY=http://localhost:28100/v1
# ChatBox
# Add OpenAI API provider with base URL: http://localhost:28100/v1
π§ Development Tools:
# Python with OpenAI library
import openai
client = openai.OpenAI(
base_url="http://localhost:28100/v1",
api_key="not-required" # Swama doesn't require API keys
)
response = client.chat.completions.create(
model="qwen3",
messages=[{"role": "user", "content": "Hello from Python!"}]
)
// Node.js with OpenAI library
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'http://localhost:28100/v1',
apiKey: 'not-required'
});
const completion = await openai.chat.completions.create({
model: 'deepseek-coder',
messages: [{ role: 'user', content: 'Write a hello world function' }]
});
π Popular Integrations:
- Langchain/LlamaIndex: Use OpenAI provider with custom base URL
- AutoGen: Configure as OpenAI endpoint for multi-agent conversations
- Semantic Kernel: Add as OpenAI chat completion service
- Flowise/Langflow: Connect via OpenAI node with custom endpoint
- Anything: Any tool supporting OpenAI API can connect to Swama!
# Download model (supports both aliases and full names)
swama pull qwen3 # Using alias
swama pull mlx-community/Qwen3-8B-4bit # Using full name
# List local models and available aliases
swama list [--format json]
# Run inference (auto-downloads if model not found locally)
swama run qwen3 "Your prompt here" # Using alias - downloads automatically!
swama run deepseek-coder "Write a Python function" # Another alias
swama run <full-model-name> <prompt> [options] # Using full name
# Start API server
swama serve [--host HOST] [--port PORT] [--model MODEL_ALIAS]
# Start menu bar app
swama menubar
Swama supports convenient aliases for popular models. Use these short names instead of full model URLs:
# Examples with different model families
swama run qwen3 "Explain machine learning" # Qwen3 8B
swama run llama3.2-1b "Quick question: what is AI?" # Llama 3.2 1B (fastest)
swama run deepseek-r1 "Think step by step: 2+2*3" # DeepSeek R1 (reasoning)
--temperature <value>
: Sampling temperature (0.0-2.0)--top-p <value>
: Nucleus sampling parameter (0.0-1.0)--max-tokens <number>
: Maximum number of tokens to generate--repetition-penalty <value>
: Repetition penalty factor
Swama supports vision language models and can process image inputs:
curl -X POST http://localhost:28100/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mlx-community/llava-v1.6-mistral-7b-hf-4bit",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What do you see in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}
]
}'
- swift-nio - High-performance networking framework
- swift-argument-parser - Command-line argument parsing
- mlx-swift - Apple MLX Swift bindings
- mlx-swift-examples - MLX Swift examples and models
# Development build
swift build
# Release build
swift build -c release
# Run tests
swift test
# Generate Xcode project
swift package generate-xcodeproj
We welcome community contributions! Please follow these steps:
- Fork this repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
- Follow Swift coding style guidelines
- Add tests for new features
- Update relevant documentation
- Ensure all tests pass
This project is licensed under the MIT License - see the LICENSE file for details.
- Apple MLX team for the excellent machine learning framework
- Swift NIO for high-performance networking support
- All contributors and community members
- π Issue Tracker
- π¬ Discussions
- TODO
Swama - Bringing the best local AI experience to macOS users π