RLAMA - User Guide

RLAMA - User Guide

RLAMA is a powerful AI-driven question-answering tool for your documents, seamlessly integrating with your local Ollama models. It enables you to create, manage, and interact with Retrieval-Augmented Generation (RAG) systems tailored to your documentation needs.

Vision & Roadmap

RLAMA aims to become the definitive tool for creating local RAG systems that work seamlessly for everyone—from individual developers to large enterprises. Here's our strategic roadmap:

Completed Features ✅

✅ Basic RAG System Creation: CLI tool for creating and managing RAG systems
✅ Document Processing: Support for multiple document formats (.txt, .md, .pdf, etc.)
✅ Document Chunking: Basic text splitting with configurable size and overlap
✅ Vector Storage: Local storage of document embeddings
✅ Context Retrieval: Basic semantic search with configurable context size
✅ Ollama Integration: Seamless connection to Ollama models
✅ Cross-Platform Support: Works on Linux, macOS, and Windows
✅ Easy Installation: One-line installation script
✅ API Server: HTTP endpoints for integrating RAG capabilities in other applications
✅ Web Crawling: Create RAGs directly from websites
✅ Guided RAG Setup Wizard: Interactive interface for easy RAG creation

Small LLM Optimization (Q2 2025)

Prompt Compression: Smart context summarization for limited context windows
Adaptive Chunking: Dynamic content segmentation based on semantic boundaries
Minimal Context Retrieval: Intelligent filtering to eliminate redundant content
Parameter Optimization: Fine-tuned settings for different model sizes

Advanced Embedding Pipeline (Q2-Q3 2025)

Multi-Model Embedding Support: Integration with various embedding models
Hybrid Retrieval Techniques: Combining sparse and dense retrievers for better accuracy
Embedding Evaluation Tools: Built-in metrics to measure retrieval quality
Automated Embedding Cache: Smart caching to reduce computation for similar queries

User Experience Enhancements (Q3 2025)

Lightweight Web Interface: Simple browser-based UI for the existing CLI backend
Knowledge Graph Visualization: Interactive exploration of document connections
Domain-Specific Templates: Pre-configured settings for different domains

Enterprise Features (Q4 2025)

Multi-User Access Control: Role-based permissions for team environments
Integration with Enterprise Systems: Connectors for SharePoint, Confluence, Google Workspace
Knowledge Quality Monitoring: Detection of outdated or contradictory information
System Integration API: Webhooks and APIs for embedding RLAMA in existing workflows
AI Agent Creation Framework: Simplified system for building custom AI agents with RAG capabilities

Next-Gen Retrieval Innovations (Q1 2026)

Multi-Step Retrieval: Using the LLM to refine search queries for complex questions
Cross-Modal Retrieval: Support for image content understanding and retrieval
Feedback-Based Optimization: Learning from user interactions to improve retrieval
Knowledge Graphs & Symbolic Reasoning: Combining vector search with structured knowledge

RLAMA's core philosophy remains unchanged: to provide a simple, powerful, local RAG solution that respects privacy, minimizes resource requirements, and works seamlessly across platforms.

Installation

Prerequisites

Ollama installed and running

Installation from terminal

curl -fsSL https://raw.githubusercontent.com/dontizi/rlama/main/install.sh | sh

Tech Stack

RLAMA is built with:

Core Language: Go (chosen for performance, cross-platform compatibility, and single binary distribution)
CLI Framework: Cobra (for command-line interface structure)
LLM Integration: Ollama API (for embeddings and completions)
Storage: Local filesystem-based storage (JSON files for simplicity and portability)
Vector Search: Custom implementation of cosine similarity for embedding retrieval

Architecture

RLAMA follows a clean architecture pattern with clear separation of concerns:

rlama/
├── cmd/                  # CLI commands (using Cobra)
│   ├── root.go           # Base command
│   ├── rag.go            # Create RAG systems
│   ├── run.go            # Query RAG systems
│   └── ...
├── internal/
│   ├── client/           # External API clients
│   │   └── ollama_client.go # Ollama API integration
│   ├── domain/           # Core domain models
│   │   ├── rag.go        # RAG system entity
│   │   └── document.go   # Document entity
│   ├── repository/       # Data persistence
│   │   └── rag_repository.go # Handles saving/loading RAGs
│   └── service/          # Business logic
│       ├── rag_service.go      # RAG operations
│       ├── document_loader.go  # Document processing
│       └── embedding_service.go # Vector embeddings
└── pkg/                  # Shared utilities
    └── vector/           # Vector operations

Data Flow

Document Processing: Documents are loaded from the file system, parsed based on their type, and converted to plain text.
Embedding Generation: Document text is sent to Ollama to generate vector embeddings.
Storage: The RAG system (documents + embeddings) is stored in the user's home directory (~/.rlama).
Query Process: When a user asks a question, it's converted to an embedding, compared against stored document embeddings, and relevant content is retrieved.
Response Generation: Retrieved content and the question are sent to Ollama to generate a contextually-informed response.

Visual Representation

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Documents  │────>│  Document   │────>│  Embedding  │
│  (Input)    │     │  Processing │     │  Generation │
└─────────────┘     └─────────────┘     └─────────────┘
                                              │
                                              ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Query     │────>│  Vector     │<────│ Vector Store│
│  Response   │     │  Search     │     │ (RAG System)│
└─────────────┘     └─────────────┘     └─────────────┘
       ▲                   │
       │                   ▼
┌─────────────┐     ┌─────────────┐
│   Ollama    │<────│   Context   │
│    LLM      │     │  Building   │
└─────────────┘     └─────────────┘

RLAMA is designed to be lightweight and portable, focusing on providing RAG capabilities with minimal dependencies. The entire system runs locally, with the only external dependency being Ollama for LLM capabilities.

Available Commands

You can get help on all commands by using:

rlama --help

Global Flags

These flags can be used with any command:

--host string   Ollama host (default: localhost)
--port string   Ollama port (default: 11434)

rag - Create a RAG system

Creates a new RAG system by indexing all documents in the specified folder.

rlama rag [model] [rag-name] [folder-path]

Parameters:

model: Name of the Ollama model to use (e.g., llama3, mistral, gemma).
rag-name: Unique name to identify your RAG system.
folder-path: Path to the folder containing your documents.

Example:

rlama rag llama3 documentation ./docs

crawl-rag - Create a RAG system from a website

Creates a new RAG system by crawling a website and indexing its content.

rlama crawl-rag [model] [rag-name] [website-url]

Parameters:

model: Name of the Ollama model to use (e.g., llama3, mistral, gemma).
rag-name: Unique name to identify your RAG system.
website-url: URL of the website to crawl and index.

Options:

--max-depth: Maximum crawl depth (default: 2)
--concurrency: Number of concurrent crawlers (default: 5)
--exclude-path: Paths to exclude from crawling (comma-separated)
--chunk-size: Character count per chunk (default: 1000)
--chunk-overlap: Overlap between chunks in characters (default: 200)

Example:

# Create a new RAG from a documentation website
rlama crawl-rag llama3 docs-rag https://docs.example.com

# Customize crawling behavior
rlama crawl-rag llama3 blog-rag https://blog.example.com --max-depth=3 --exclude-path=/archive,/tags

wizard - Create a RAG system with interactive setup

Provides an interactive step-by-step wizard for creating a new RAG system.

rlama wizard

The wizard guides you through:

Naming your RAG
Choosing an Ollama model
Selecting document sources (local folder or website)
Configuring chunking parameters
Setting up file filtering

Example:

rlama wizard
# Follow the prompts to create your customized RAG

watch - Set up directory watching for a RAG system

Configure a RAG system to automatically watch a directory for new files and add them to the RAG.

rlama watch [rag-name] [directory-path] [interval]

Parameters:

rag-name: Name of the RAG system to watch.
directory-path: Path to the directory to watch for new files.
interval: Time in minutes to check for new files (use 0 to check only when the RAG is used).

Example:

# Set up directory watching to check every 60 minutes
rlama watch my-docs ./watched-folder 60

# Set up directory watching to only check when the RAG is used
rlama watch my-docs ./watched-folder 0

# Customize what files to watch
rlama watch my-docs ./watched-folder 30 --exclude-dir=node_modules,tmp --process-ext=.md,.txt

watch-off - Disable directory watching for a RAG system

Disable automatic directory watching for a RAG system.

rlama watch-off [rag-name]

Parameters:

rag-name: Name of the RAG system to disable watching.

Example:

rlama watch-off my-docs

check-watched - Check a RAG's watched directory for new files

Manually check a RAG's watched directory for new files and add them to the RAG.

rlama check-watched [rag-name]

Parameters:

rag-name: Name of the RAG system to check.

Example:

rlama check-watched my-docs

run - Use a RAG system

Starts an interactive session to interact with an existing RAG system.

rlama run [rag-name]

Parameters:

rag-name: Name of the RAG system to use.
--context-size: (Optional) Number of context chunks to retrieve (default: 20)

Example:

rlama run documentation
> How do I install the project?
> What are the main features?
> exit

Context Size Tips:

Smaller values (5-15) for faster responses with key information
Medium values (20-40) for balanced performance
Larger values (50+) for complex questions needing broad context
Consider your model's context window limits

rlama run documentation --context-size=50  # Use 50 context chunks

api - Start API server

Starts an HTTP API server that exposes RLAMA's functionality through RESTful endpoints.

rlama api [--port PORT]

Parameters:

--port: (Optional) Port number to run the API server on (default: 11249)

Example:

rlama api --port 8080

Available Endpoints:

Query a RAG system - POST /rag
```
curl -X POST http://localhost:11249/rag \
  -H "Content-Type: application/json" \
  -d '{
    "rag_name": "documentation",
    "prompt": "How do I install the project?",
    "context_size": 20
  }'
```
Request fields:
- rag_name (required): Name of the RAG system to query
- prompt (required): Question or prompt to send to the RAG
- context_size (optional): Number of chunks to include in context
- model (optional): Override the model used by the RAG
Check server health - GET /health
```
curl http://localhost:11249/health
```

Integration Example:

// Node.js example
const response = await fetch('http://localhost:11249/rag', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    rag_name: 'my-docs',
    prompt: 'Summarize the key features'
  })
});
const data = await response.json();
console.log(data.response);

list - List RAG systems

Displays a list of all available RAG systems.

rlama list

delete - Delete a RAG system

Permanently deletes a RAG system and all its indexed documents.

rlama delete [rag-name] [--force/-f]

Parameters:

rag-name: Name of the RAG system to delete.
--force or -f: (Optional) Delete without asking for confirmation.

Example:

rlama delete old-project

Or to delete without confirmation:

rlama delete old-project --force

list-docs - List documents in a RAG

Displays all documents in a RAG system with metadata.

rlama list-docs [rag-name]

Parameters:

rag-name: Name of the RAG system

Example:

rlama list-docs documentation

list-chunks - Inspect document chunks

List and filter document chunks in a RAG system with various options:

# Basic chunk listing
rlama list-chunks [rag-name]

# With content preview (shows first 100 characters)
rlama list-chunks [rag-name] --show-content

# Filter by document name/ID substring
rlama list-chunks [rag-name] --document=readme

# Combine options
rlama list-chunks [rag-name] --document=api --show-content

Options:

--show-content: Display chunk content preview
--document: Filter by document name/ID substring

Output columns:

Chunk ID (use with view-chunk command)
Document Source
Chunk Position (e.g., "2/5" for second of five chunks)
Content Preview (if enabled)
Created Date

view-chunk - View chunk details

Display detailed information about a specific chunk.

rlama view-chunk [rag-name] [chunk-id]

Parameters:

rag-name: Name of the RAG system
chunk-id: Chunk identifier from list-chunks

Example:

rlama view-chunk documentation doc123_chunk_0

add-docs - Add documents to RAG

Add new documents to an existing RAG system.

rlama add-docs [rag-name] [folder-path] [flags]

Parameters:

rag-name: Name of the RAG system
folder-path: Path to documents folder

Example:

rlama add-docs documentation ./new-docs --exclude-ext=.tmp

crawl-add-docs - Add website content to RAG

Add content from a website to an existing RAG system.

rlama crawl-add-docs [rag-name] [website-url]

Parameters:

rag-name: Name of the RAG system
website-url: URL of the website to crawl and add to the RAG

Options:

--max-depth: Maximum crawl depth (default: 2)
--concurrency: Number of concurrent crawlers (default: 5)
--exclude-path: Paths to exclude from crawling (comma-separated)
--chunk-size: Character count per chunk (default: 1000)
--chunk-overlap: Overlap between chunks in characters (default: 200)

Example:

# Add blog content to an existing RAG
rlama crawl-add-docs my-docs https://blog.example.com

# Customize crawling behavior
rlama crawl-add-docs knowledge-base https://docs.example.com --max-depth=1 --exclude-path=/api

update-model - Change LLM model

Update the LLM model used by a RAG system.

rlama update-model [rag-name] [new-model]

Parameters:

rag-name: Name of the RAG system
new-model: New Ollama model name

Example:

rlama update-model documentation deepseek-r1:7b-instruct

update - Update RLAMA

Checks if a new version of RLAMA is available and installs it.

rlama update [--force/-f]

Options:

--force or -f: (Optional) Update without asking for confirmation.

version - Display version

Displays the current version of RLAMA.

rlama --version

or

rlama -v

Uninstallation

To uninstall RLAMA:

Removing the binary

If you installed via go install:

rlama uninstall

Removing data

RLAMA stores its data in ~/.rlama. To remove it:

rm -rf ~/.rlama

Supported Document Formats

RLAMA supports many file formats:

Text: .txt, .md, .html, .json, .csv, .yaml, .yml, .xml, .org
Code: .go, .py, .js, .java, .c, .cpp, .cxx, .h, .rb, .php, .rs, .swift, .kt, .ts, .tsx, .f, .F, .F90, .el, .svelte
Documents: .pdf, .docx, .doc, .rtf, .odt, .pptx, .ppt, .xlsx, .xls, .epub

Installing dependencies via install_deps.sh is recommended to improve support for certain formats.

Troubleshooting

Ollama is not accessible

If you encounter connection errors to Ollama:

Check that Ollama is running.
By default, Ollama must be accessible at http://localhost:11434 or the host and port specified by the OLLAMA_HOST environment variable.

If your Ollama instance is running on a different host or port, use the --host and --port flags:

rlama --host 192.168.1.100 --port 8000 list
rlama --host my-ollama-server --port 11434 run my-rag

Check Ollama logs for potential errors.

Text extraction issues

If you encounter problems with certain formats:

Install dependencies via ./scripts/install_deps.sh.
Verify that your system has the required tools (pdftotext, tesseract, etc.).

The RAG doesn't find relevant information

If the answers are not relevant:

Check that the documents are properly indexed with rlama list.
Make sure the content of the documents is properly extracted.
Try rephrasing your question more precisely.
Consider adjusting chunking parameters during RAG creation

Other issues

For any other issues, please open an issue on the GitHub repository providing:

The exact command used.
The complete output of the command.
Your operating system and architecture.
The RLAMA version (rlama --version).

Configuring Ollama Connection

RLAMA provides multiple ways to connect to your Ollama instance:

Command-line flags (highest priority):

rlama --host 192.168.1.100 --port 8080 run my-rag

Environment variable:

# Format: "host:port" or just "host"
export OLLAMA_HOST=remote-server:8080
rlama run my-rag

Default values (used if no other method is specified):
- Host: localhost
- Port: 11434

The precedence order is: command-line flags > environment variable > default values.

Advanced Usage

Context Size Management

# Quick answers with minimal context
rlama run my-docs --context-size=10

# Deep analysis with maximum context
rlama run my-docs --context-size=50

# Balance between speed and depth
rlama run my-docs --context-size=30

RAG Creation with Filtering

rlama rag llama3 my-project ./code \
  --exclude-dir=node_modules,dist \
  --process-ext=.go,.ts \
  --exclude-ext=.spec.ts

Chunk Inspection

# List chunks with content preview
rlama list-chunks my-project --show-content

# Filter chunks from specific document
rlama list-chunks my-project --document=architecture

Help System

Get full command help:

rlama --help

Command-specific help:

rlama rag --help
rlama list-chunks --help
rlama update-model --help

All commands support the global --host and --port flags for custom Ollama connections.

The precedence order is: command-line flags > environment variable > default values.

Hugging Face Integration

RLAMA now supports using GGUF models directly from Hugging Face through Ollama's native integration:

Browsing Hugging Face Models

# Search for GGUF models on Hugging Face
rlama hf-browse "llama 3"

# Open browser with search results
rlama hf-browse mistral --open

Testing a Model

Before creating a RAG, you can test a Hugging Face model directly:

# Try a model in chat mode
rlama run-hf bartowski/Llama-3.2-1B-Instruct-GGUF

# Specify quantization
rlama run-hf mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF --quant Q5_K_M

Creating a RAG with Hugging Face Models

Use Hugging Face models when creating RAG systems:

# Create a RAG with a Hugging Face model
rlama rag hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF my-rag ./docs

# Use specific quantization
rlama rag hf.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF:Q5_K_M my-rag ./docs

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.github/workflows		.github/workflows
cmd		cmd
internal		internal
pkg/vector		pkg/vector
scripts		scripts
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
install.ps1		install.ps1
install.sh		install.sh
main.go		main.go

License

DonTizi/rlama

Folders and files

Latest commit

History

Repository files navigation

RLAMA - User Guide

Table of Contents

Vision & Roadmap

Completed Features ✅

Small LLM Optimization (Q2 2025)

Advanced Embedding Pipeline (Q2-Q3 2025)

User Experience Enhancements (Q3 2025)

Enterprise Features (Q4 2025)

Next-Gen Retrieval Innovations (Q1 2026)

Installation

Prerequisites

Installation from terminal

Tech Stack

Architecture

Data Flow

Visual Representation

Available Commands

Global Flags

rag - Create a RAG system

crawl-rag - Create a RAG system from a website

wizard - Create a RAG system with interactive setup

watch - Set up directory watching for a RAG system

watch-off - Disable directory watching for a RAG system

check-watched - Check a RAG's watched directory for new files

run - Use a RAG system

api - Start API server

list - List RAG systems

delete - Delete a RAG system

list-docs - List documents in a RAG

list-chunks - Inspect document chunks

view-chunk - View chunk details

add-docs - Add documents to RAG

crawl-add-docs - Add website content to RAG

update-model - Change LLM model

update - Update RLAMA

version - Display version

Uninstallation

Removing the binary

Removing data

Supported Document Formats

Troubleshooting

Ollama is not accessible

Text extraction issues

The RAG doesn't find relevant information

Other issues

Configuring Ollama Connection

Advanced Usage

Context Size Management

RAG Creation with Filtering

Chunk Inspection

Help System

Hugging Face Integration

Browsing Hugging Face Models

Testing a Model

Creating a RAG with Hugging Face Models

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 22

Packages 0

Uh oh!

Contributors 3

Languages

Packages