10000 GitHub - agentsea/nebulous: A globally distributed container orchestrator
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

agentsea/nebulous

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Nebulous Logo

A globally distributed container orchestrator

Think of it as a Kubernetes that can span clouds and regions with a focus on accelerated compute.
Ships as a single binary, performant and lightweight via Rust ๐Ÿฆ€

Why not Kubernetes? See why_not_kube.md

Warning

Nebulous is in alpha, things may break.

Concepts

Cross-cloud Autoscaling

Nebulous helps find accelerators wherever they may be, across clouds or in your datacenter. It scales those resources as needed based on usage.

description

Globally Segmented Networks

Nebulous connects resources across clouds using Tailnet. Every container deployed is connected to every other container in their segmented namespace regardless of where they are running.

description

Decentralized Data Layer

Nebulous enables fast and resiliant replication of data between nodes using Iroh p2p. Containers can subscribe to data resources in their namespace and have them lazily synced from peers as they need them regardless of geolocation.

description

Live Migration

Nebulous enables containers to be suspended and restored at any point in time using CRIU, including GPU operations. This enables forking of containers in realtime or migrating workloads seemlessly to cheaper resources.

description

Metering

Accelerated resources are expensive. Nebulous comes batteries-included with primitives for metered billing using OpenMeter.

description

Multi-tentant

Nebulous is multi-tenant from the ground up, providing strong isolation of workloads and robust authorization mechanisms.

description

Lightweight

Everything in Nebulous is built to be light as a feather, it should feel the opposite of Kubernetes. You can spin it up easily on your local machine as a single process, while still enabling you to seemlessly scale to thousands of nodes in the cloud when needed.

description

Installation

curl -fsSL -H "Cache-Control: no-cache" https://raw.githubusercontent.com/agentsea/neblous/main/remote_install.sh | bash

Note

Only MacOS and Linux arm64/amd64 are supported at this time.

Usage

Export the keys of your cloud providers.

export RUNPOD_API_KEY=...
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...

Run a local API server on docker

neb serve --docker

Or optionally run on Kubernetes with our helm chart

Connect to the tailnet

neb connect

See what cloud platforms are currently supported.

neb get platforms

Tip

Prefer a pythonic interface? Try neblous-py

Prefer a higher level LLM interface? Try orign

Containers

Let's run our first container. We'll create a container on runpod with 2 A100 GPUs which trains a model using TRL.

First, let's find what accelerators are available.

neb get accelerators

Now lets create a container.

kind: Container
metadata:
  name: trl-job
  namespace: training
image: "huggingface/trl-latest-gpu:latest"
platform: runpod
command: |
  source activate trl && trl sft --model_name_or_path $MODEL \
      --dataset_name $DATASET \
      --output_dir /output \
      --torch_dtype bfloat16 \
      --use_peft true
env:
  - key: MODEL
    value: Qwen/Qwen2.5-7B 
  - key: DATASET
    value: trl-lib/Capybara 
volumes:  
  - name: model-cache
    mount: /models
accelerators:
  - "2:A100_SXM"
restart: Never

To create the container

neb create container -f mycontainer.yaml

Tip

See our container examples for more.

List all containers

neb get containers

Get the container we just created.

neb get containers trl-job -n training

Exec a command in a container

neb exec trl-job -n training -c "echo hello"

Get logs from a container

neb logs trl-job -n training

Send an http request to a container

curl http://container-{id}:8000

Meters

Metered billing is supported through OpenMeter using the meters field.

meters:
  - cost: 0.1
    unit: second
    currency: USD
    metric: runtime 

Tip

See container examples for more.

Secrets

Secrets are used to store sensitive information such as API keys and credentials. Secrets are AES-256 encrypted and stored in the database.

Create a secret

neb create secret my-secret --value $MY_SECRET_VALUE -n my-app

Get all secrets

neb get secrets -n my-app

Get a secret

neb get secrets my-secret -n my-app

Delete a secret

neb delete secrets my-secret -n my-app

Secrets can be used in container environment variables.

kind: Container
metadata:
  name: my-container
  namespace: my-app
env:
  - key: MY_SECRET
    secret_name: my-secret

Namespaces

Namespaces provide a means to segment groups of resources across clouds.

kind: Container
metadata:
  name: llama-factory-server
  namespace: my-app

Resources within a given namespace are network isolated using Tailnet, and can be accessed by simply using http://{kind}-{id} e.g. http://container-12345:8000.

Nebulous cloud provides a free hosted HeadScale instance to connect your resources, or you can bring your own by simply setting the TAILSCALE_URL environment variable.

Tenancy

Nebulous is multi-tenant from the ground up. Tenancy happens at the namespace level, when creating a namespace a user can set the owner to their user or an organization they are a member of.

kind: Namespace
metadata:
  namespace: my-app
  owner: acme

Now all resources created in that namespace will be owned by acme.

The authorization heirarchy is

owners -> namespaces -> resources

Processors

Processors are containers that work off real-time data streams and are autoscaled based on back-pressure. Streams are provided by Redis Streams.

Processors are best used for bursty async jobs, or low latency stream processing.

kind: Processor
metadata:
  name: translator
  namespace: my-app
stream: my-app:workers:translator
container:
  image: corge/translator:latest
  command: "redis-cli XREAD COUNT 10 STREAMS my-app:workers:translator"
  platform: gce
  accelerators:
    - "1:A40"
min_workers: 1
max_workers: 10
scale:
  up:
    above_pressure: 100
    duration: 10s
  down:
    below_pressure: 10
    duration: 5m
  zero:
    duration: 10m
neb create processor -f examples/processors/translator.yaml

Processors can also scale to zero.

min_workers: 0

Processors can enforce schemas.

schema:
  - name: text_to_translate
    type: string
    required: true

Send data to a processor stream

neb send processor translator --data '{"text_to_translate": "Dlrow Olleh"} -n my-app'

Read data from a processor stream

neb read processor translator --num 10

List all processors

neb get processors

Tip

See processor examples for more.

Services [in progress]

Services provide a means to expose containers on a stable IP address, and to balance traffic across multiple containers. Services auto-scale up and down as needed.

kind: Service
metadata:
  name: vllm-qwen
  namespace: inference
container:
  image: vllm/vllm-openai:latest
  command: |
    python -m vllm.entrypoints.api_server \
      --model Qwen/Qwen2-7B-Instruct \
      --tensor-parallel-size 1 \
      --port 8000
  accelerators:
    - "1:A100"
platform: gce
min_containers: 1
max_containers: 5
scale:
  up:
    above_latency: 100ms
    duration: 10s
  down:
    below_latency: 10ms
    duration: 5m
  zero:
    below_latency: 10ms
    duration: 10m
neb create service -f examples/service/vllm-qwen.yaml

The IP will be returned in the status field.

neb get services vllm-qwen -n inference

Services can also scale to zero.

min_containers: 0

Services can perform metered billing, such as counting the number of tokens in the response.

meters:
  - cost: 0.001
    unit: token
    currency: USD
    response_json_value: "$.usage.prompt_tokens"

Tip

See service examples for more.

SDK

๐Ÿ Python https://github.com/agentsea/neblous-py

๐Ÿฆ€ Rust https://crates.io/crates/neblous/versions

Roadmap

  • Support non-gpu containers
  • Processors
  • Support for Nebius Cloud
  • Support for AWS EC2
  • Services
  • Clusters
  • Support for GCE
  • Support for Azure
  • Support for Kubernetes

Contributing

Please open an issue or submit a PR.

Developing

Add all the environment variables shown in the .env_ file to your environment.

Run a postgres and redis instance locally. This can be done easily with docker.

docker run -d --name redis -p 6379:6379 redis:latest
docker run -d --name postgres -p 5432:5432 postgres:latest

To configure the secrets store you will need an encryption key. This can be generated with the following command.

openssl rand -base64 32 | tr -dc '[:alnum:]' | head -c 32

Then set this to the NEB_ENCRYPTION_KEY environment variable.

To optionally use OpenMeter for metered billing, you will need to open an account with either their cloud or run their open source and set the OPENMETER_API_KEY and OPENMETER_URL environment variables.

To optionally use Tailnet, you will need to open an account with Tailscale or run your own HeadScale instance and set the TAILSCALE_API_KEY and TAILSCALE_TAILNET environment variables.

Install locally

make install

Run the server

neb serve

Login to the auth server. When you do, set the server to http://localhost:3000.

neb login

Now you can create resources

neb create container -f examples/containers/trl_small.yaml

When you make changes, simply run make install and neb serve again.

Inspiration

About

A globally distributed container orchestrator

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  
0