8000 GitHub - homan9883/gensyn-ai
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

homan9883/gensyn-ai

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

Run RL Swarm Node

RL Swarm is a fully open-source framework developed by GensynAI for building reinforcement learning (RL) training swarms over the internet. This guide walks you through setting up an RL Swarm node and a web UI dashboard to monitor swarm activity.

Hardware Requirements

  • CPU: Minimum 16GB RAM (more RAM recommended for larger models or datasets).
  • GPU (Optional): Supported CUDA devices for enhanced performance:
    • RTX 3090
    • RTX 4090
    • A100
    • H100
  • Note: You can run the node without a GPU using CPU-only mode (details in the docker-compose.yaml section).

Install Dependencies

1. Update System Packages

sudo apt-get update && sudo apt-get upgrade -y

2. Install General Utilities and Tools

sudo apt install curl iptables build-essential git wget lz4 jq make gcc nano automake autoconf tmux htop nvme-cli libgbm1 pkg-config libssl-dev libleveldb-dev tar clang bsdmainutils ncdu unzip libleveldb-dev  -y

3. Install Docker

# Remove old Docker installations
for pkg in docker.io docker-doc docker-compose podman-docker containerd runc; do sudo apt-get remove $pkg; done

# Add Docker repository
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

echo \
  "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Test Docker
sudo docker run hello-world
  • Tip: To run Docker without sudo, add your user to the Docker group:
sudo usermod -aG docker $USER

4. Install Python

sudo apt-get install python3 python3-pip

Clone the Repository

git clone https://github.com/gensyn-ai/rl-swarm/
cd rl-swarm

Create docker-compose.yaml

This file defines the services: the RL Swarm node, telemetry collector, and web UI.

  1. Rename the old file:
mv docker-compose.yaml docker-compose.yaml.old
  1. Create the new file:
nano docker-compose.yaml
  1. Paste the following configuration:
version: '3'

services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.120.0
    ports:
      - "4317:4317"  # OTLP gRPC
      - "4318:4318"  # OTLP HTTP
      - "55679:55679"  # Prometheus metrics (optional)
    environment:
      - OTEL_LOG_LEVEL=DEBUG

  swarm_node:
    image: europe-docker.pkg.dev/gensyn-public-b7d9/public/rl-swarm:v0.0.1
    command: ./run_hivemind_docker.sh
    runtime: nvidia  # Enables GPU support; remove if no GPU is available
    environment:
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
      - PEER_MULTI_ADDRS=/ip4/38.101.215.13/tcp/30002/p2p/QmQ2gEXoPJg6iMBSUFWGzAabS2VhnzuS782Y637hGjfsRJ
      - HOST_MULTI_ADDRS=/ip4/0.0.0.0/tcp/38331
    ports:
      - "38331:38331"  # Exposes the swarm node's P2P port
    depends_on:
      - otel-collector

  fastapi:
    build:
      context: .
      dockerfile: Dockerfile.webserver
    environment:
      - OTEL_SERVICE_NAME=rlswarm-fastapi
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
      - INITIAL_PEERS=/ip4/38.101.215.13/tcp/30002/p2p/QmQ2gEXoPJg6iMBSUFWGzAabS2VhnzuS782Y637hGjfsRJ
    ports:
      - "8080:8000"  # Maps port 8080 on the host to 8000 in the container
    depends_on:
      - otel-collector
      - swarm_node
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/api/healthz"]
      interval: 30s
      retries: 3
  • GPU/CPU Note: If you don't have an NVIDIA GPU or the NVIDIA Container Runtime, remove the runtime: nvidia line under swarm_node to run on CPU.

What Each Service Does:

  • otel-collector: Gathers telemetry data (metrics, traces).
  • swarm_node: The core RL Swarm node connecting to the network.
  • fastapi: The web UI dashboard for monitoring.

Run RL Swarm Node + Web UI Dashboard

Start the services with:

docker compose up --build -d && docker compose logs -f
  • Note: The first run may take time due to image downloads. Look for this log to confirm your node joined the swarm

Screenshot_654

  • Exit Logs: Press Ctrl+C

Check logs

  • RL Swarm node:
docker-compose logs -f swarm_node
  • Web UI:
docker-compose logs -f fastapi
  • Telemetry Collector:
docker-compose logs -f otel-collector
  • All Logs: Use docker-compose logs -f without a service name.

Access the Web UI Dashboard

  • VPS: http://<your-vps-ip>:8080/
  • Local PC: http://localhost:8080 or http://0.0.0.0:8080

image

Monitoring Your Node

  • The dashboard displays collective swarm data, not individual node stats. To track your node:

    1- Check the swarm_node logs for your node’s unique ID (e.g., [F-d2042cff-01c9-4801-8ea7-1c1afc29c9b6]):

image

2- Search for this ID in the dashboard data to see your node’s contributions.

  • Note: The dashboard monitors all peers together. The node ID in the logs is likely your identifier but I am keep experimenting to find out more about the node metrics!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0