Swama

中文 | 日本語 | English

Swama is a high-performance machine learning runtime written in pure Swift, designed specifically for macOS and built on Apple's MLX framework. It provides a powerful and easy-to-use solution for local LLM (Large Language Model) and VLM (Vision Language Model) inference.

✨ Features

🚀 High Performance: Built on Apple MLX framework, optimized for Apple Silicon
🔌 OpenAI Compatible API: Standard /v1/chat/completions endpoint support
📱 Menu Bar App: Elegant macOS native menu bar integration
💻 Command Line Tools: Complete CLI support for model management and inference
🖼️ Multimodal Support: Support for both text and image inputs
📦 Smart Model Management: Automatic downloading, caching, and version management
🔄 Streaming Responses: Real-time streaming text generation support
🌍 HuggingFace Integration: Direct model downloads from HuggingFace Hub

🏗️ Architecture

Swama features a modular architecture design:

SwamaKit: Core framework library containing all business logic
Swama CLI: Command-line tool providing complete model management and inference functionality
Swama.app: macOS menu bar application with graphical interface and background services

📋 System Requirements

macOS 14.0 or later
Apple Silicon (M1/M2/M3)
Xcode 15.0+ (for compilation)
Swift 6.1+

🛠️ Installation

📱 Download Pre-built App (Recommended)

Download the latest release
- Go to Releases
- Download Swama.zip from the latest release
- Extract the zip file
Install the app
```
# Move to Applications folder
mv Swama.app /Applications/

# Launch the app
open /Applications/Swama.app
```
Note: On first launch, macOS may show a security warning. If this happens:
- Go to System Preferences > Security & Privacy > General
- Click "Open Anyway" next to the Swama app message
- Or right-click the app and select "Open" from the context menu
Install CLI tools
- Open Swama from the menu bar
- Click "Install Command Line Tool…" to add swama command to your PATH

🔧 Build from Source (Advanced)

For developers who want to build from source:

# Clone the repository
git clone https://github.com/Trans-N-ai/swama.git
cd swama

# Build CLI tool
swift build -c release
sudo cp .build/release/swama /usr/local/bin/

# Build macOS app (requires Xcode)
cd swama-macos/Swama
xcodebuild -project Swama.xcodeproj -scheme Swama -configuration Release

🚀 Quick Start

After installing Swama.app, you can use either the menu bar app or command line:

1. Instant Inference with Model Aliases

# Use short aliases instead of full model names - auto-downloads if needed!
swama run qwen3 "Hello, AI"
swama run llama3.2 "Tell me a joke"
swama run deepseek-r1 "Explain quantum computing"

# Traditional way (also works)
swama run mlx-community/Llama-3.2-1B-Instruct-4bit "Hello, how are you?"

# List downloaded models
swama list

✨ Smart Features:

Model Aliases: Use friendly names like qwen3, llama3.2, deepseek-r1 instead of long URLs
Auto-Download: Models are automatically downloaded on first use - no need to pull first!
Cache Management: Downloaded models are cached for future use

2. Available Model Aliases

Alias	Full Model Name	Description
`qwen3`	`mlx-community/Qwen3-8B-4bit`	Qwen3 8B (default)
`qwen3-1.7b`	`mlx-community/Qwen3-1.7B-4bit`	Qwen3 1.7B (lightweight)
`llama3.2`	`mlx-community/Llama-3.2-3B-Instruct-4bit`	Llama 3.2 3B (default)
`llama3.2-1b`	`mlx-community/Llama-3.2-1B-Instruct-4bit`	Llama 3.2 1B (fastest)
`deepseek-r1`	`mlx-community/DeepSeek-R1-0528-4bit`	DeepSeek R1 (reasoning)
`deepseek-coder`	`mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit-mlx`	DeepSeek Coder
`qwen2.5`	`mlx-community/Qwen2.5-7B-Instruct-4bit`	Qwen 2.5 7B

3. Start API Service

# Or start without specifying model (can switch via API)
swama serve --host 0.0.0.0 --port 28100

4. Menu Bar App

# Launch menu bar application
swama menubar

5. API Usage

🔌 OpenAI Compatible API

Swama provides a fully OpenAI-compatible API endpoint, allowing you to use it with existing tools and integrations:

# Get available models
curl http://localhost:28100/v1/models

# Chat completion using aliases (auto-downloads if needed)
curl -X POST http://localhost:28100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "temperature": 0.7,
    "max_tokens": 100
  }'

# Streaming response with DeepSeek R1
curl -X POST http://localhost:28100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r1",
    "messages": [
      {"role": "user", "content": "Solve this step by step: What is 15% of 240?"}
    ],
    "stream": true
  }'

🛠️ Community Tool Integration

Since Swama provides OpenAI-compatible endpoints, you can easily integrate it with popular community tools:

🤖 AI Coding Assistants:

# Continue.dev - Add to config.json
{
  "models": [{
    "title": "Swama Local",
    "provider": "openai",
    "model": "qwen3",
    "apiBase": "http://localhost:28100/v1"
  }]
}

# Cursor - Set custom API endpoint
# API Base URL: http://localhost:28100/v1
# Model: qwen3 or deepseek-coder

💬 Chat Interfaces:

# Open WebUI (formerly Ollama WebUI)
# Add OpenAI API connection:
# Base URL: http://localhost:28100/v1
# API Key: not-required

# LibreChat
# Add to .env file:
OPENAI_API_KEY=not-required
OPENAI_REVERSE_PROXY=http://localhost:28100/v1

# ChatBox
# Add OpenAI API provider with base URL: http://localhost:28100/v1

🔧 Development Tools:

# Python with OpenAI library
import openai

client = openai.OpenAI(
    base_url="http://localhost:28100/v1",
    api_key="not-required"  # Swama doesn't require API keys
)

response = client.chat.completions.create(
    model="qwen3",
    messages=[{"role": "user", "content": "Hello from Python!"}]
)

// Node.js with OpenAI library
import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'http://localhost:28100/v1',
  apiKey: 'not-required'
});

const completion = await openai.chat.completions.create({
  model: 'deepseek-coder',
  messages: [{ role: 'user', content: 'Write a hello world function' }]
});

📊 Popular Integrations:

Langchain/LlamaIndex: Use OpenAI provider with custom base URL
AutoGen: Configure as OpenAI endpoint for multi-agent conversations
Semantic Kernel: Add as OpenAI chat completion service
Flowise/Langflow: Connect via OpenAI node with custom endpoint
Anything: Any tool supporting OpenAI API can connect to Swama!

📚 Command Reference

Model Management

# Download model (supports both aliases and full names)
swama pull qwen3                    # Using alias
swama pull mlx-community/Qwen3-8B-4bit  # Using full name

# List local models and available aliases
swama list [--format json]

# Run inference (auto-downloads if model not found locally)
swama run qwen3 "Your prompt here"              # Using alias - downloads automatically!
swama run deepseek-coder "Write a Python function"  # Another alias
swama run <full-model-name> <prompt> [options]      # Using full name

Server

# Start API server
swama serve [--host HOST] [--port PORT] [--model MODEL_ALIAS]

# Start menu bar app
swama menubar

Model Aliases

Swama supports convenient aliases for popular models. Use these short names instead of full model URLs:

# Examples with different model families
swama run qwen3 "Explain machine learning"           # Qwen3 8B
swama run llama3.2-1b "Quick question: what is AI?"  # Llama 3.2 1B (fastest)
swama run deepseek-r1 "Think step by step: 2+2*3"    # DeepSeek R1 (reasoning)

Options

--temperature <value>: Sampling temperature (0.0-2.0)
--top-p <value>: Nucleus sampling parameter (0.0-1.0)
--max-tokens <number>: Maximum number of tokens to generate
--repetition-penalty <value>: Repetition penalty factor

🖼️ Multimodal Support

Swama supports vision language models and can process image inputs:

curl -X POST http://localhost:28100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/llava-v1.6-mistral-7b-hf-4bit",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What do you see in this image?"},
          {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]
      }
    ]
  }'

🔧 Development

Dependencies

swift-nio - High-performance networking framework
swift-argument-parser - Command-line argument parsing
mlx-swift - Apple MLX Swift bindings
mlx-swift-examples - MLX Swift examples and models

Building

# Development build
swift build

# Release build
swift build -c release

# Run tests
swift test

# Generate Xcode project
swift package generate-xcodeproj

🤝 Contributing

We welcome community contributions! Please follow these steps:

Fork this repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Follow Swift coding style guidelines
Add tests for new features
Update relevant documentation
Ensure all tests pass

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Apple MLX team for the excellent machine learning framework
Swift NIO for high-performance networking support
All contributors and community members

📞 Support

📝 Issue Tracker
💬 Discussions

🗺️ Roadmap

TODO

Swama - Bringing the best local AI experience to macOS users 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
scripts		scripts
swama-macos		swama-macos
swama		swama
.gitignore		.gitignore
.swiftformat		.swiftformat
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
README_JA.md		README_JA.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Swama

✨ Features

🏗️ Architecture

📋 System Requirements

🛠️ Installation

📱 Download Pre-built App (Recommended)

🔧 Build from Source (Advanced)

🚀 Quick Start

1. Instant Inference with Model Aliases

2. Available Model Aliases

3. Start API Service

4. Menu Bar App

5. API Usage

🔌 OpenAI Compatible API

🛠️ Community Tool Integration

📚 Command Reference

Model Management

Server

Model Aliases

Options

🖼️ Multimodal Support

🔧 Development

Dependencies

Building

🤝 Contributing

Development Guidelines

📝 License

🙏 Acknowledgments

📞 Support

🗺️ Roadmap

About

Uh oh!

Releases

Packages

Languages

License

Rin-Li/swama

Folders and files

Latest commit

History

Repository files navigation

Swama

✨ Features

🏗️ Architecture

📋 System Requirements

🛠️ Installation

📱 Download Pre-built App (Recommended)

🔧 Build from Source (Advanced)

🚀 Quick Start

1. Instant Inference with Model Aliases

2. Available Model Aliases

3. Start API Service

4. Menu Bar App

5. API Usage

🔌 OpenAI Compatible API

🛠️ Community Tool Integration

📚 Command Reference

Model Management

Server

Model Aliases

Options

🖼️ Multimodal Support

🔧 Development

Dependencies

Building

🤝 Contributing

Development Guidelines

📝 License

🙏 Acknowledgments

📞 Support

🗺️ Roadmap

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages