8000 GitHub - tensorpool/tensorpool: The easiest way to use GPUs.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

tensorpool/tensorpool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TensorPool

TensorPool is the easiest way to use GPUs, at a fraction of the cost of traditional cloud providers.

Features

  • Zero Infra Setup: No GCP, no AWS, no Docker, no Kubernetes, no cloud configuration, no cloud accounts required
  • Optimization Priorities: optimize your job for price or time, we'll find the best GPU for your job
  • >50% cheaper than traditional cloud providers: TensorPool aggregates GPUs from multiple cloud providers to get you the best price possible. We also use spot node recovery technology to reliably run all jobs on spot

Prerequisites

  1. Create an account at tensorpool.dev
  2. Get your API key from the dashboard
  3. Install the CLI:
pip install tensorpool

Quick Start

TensorPool offers two ways to get started:

  1. Manual Configuration:
tp config new
  1. Autogenerated Configuration:
tp config train my model for 100 epochs on a T4

Both approaches generate a tp.config.toml which you can review and modify before running your job.

Adjust the tp.config.toml to match your project. For example, a simple python project a would look like:

commands = [
    "pip install -r requirements.txt",
    "python main.py --epochs 100",
]
optimization_priority = "PRICE"
gpu = "T4"

See Configuration for more details.

Now run your job!

tp run tp.config.toml

More examples can be found in tensorpool/examples

Core Commands

  • tp run - Execute a job
  • tp init - Create a new empty TensorPool configuration file
  • tp init <prompt> - Autogenerate a TensorPool configuration file from a natural language prompt
  • tp listen <job_id> [--pull] [--overwrite] - Attach to a running job to see its output.
    • You can leave & reattach at any time!
    • use --pull to download all outputs/changed files
    • use --overwrite to force replace existing files on pull
  • tp pull <job_id> [files...] [--overwrite] - Download outputs/changed files from your job.
    • Optionally provide a list of files to download
    • use --overwrite to force replace existing files
  • tp cancel <job_id(s)> - Cancel a running job(s)
  • tp dashboard - View all your jobs and their outputs

Quick Demo Video

Check out a quick demo video

Configuration

The heart of Tensorpool is the tp.config.toml which defines your job.

This file is generated by the tp config command and can be executed by the tp run command.

Here's a complete list of all fields supported in the tp.config.toml:

# List of commands to run as if you were starting from a fresh virtual environment
commands = [
    # For example:
    # "pip install -r requirements.txt",
    # "python main.py",
]

# The optimization priority for the job
optimization_priority = "PRICE"  # Either "PRICE" or "TIME"

ignore = [
    # List of files to ignore sending with your project
    # For example:
    # ".git",
    # ".DS_Store",
]

# The GPU you'd like to use
gpu = "auto" # Either "auto", "P4", "P100", "V100", "T4", "L4", "A100", "A100-80GB".
# Defaults to "auto", where TensorPool will select the best GPU based on your optimization priority

# Instance specification (optional)
# gpu_count = n  # Number of GPUs to use
# vcpus = x  # Number of vCPUs to use
# memory = y  # Amount of memory in GB to use
# See https://github.com/tensorpool/tensorpool/blob/main/docs/instances.md for all supported configurations

For all supported gpu_count, vcpus, and memory configurations see docs/instances

The beauty of the tp.config.toml is its simplicity and flexibility, this allows 7A4F you to:

  • Run your job with a single command
  • Kick off several experiments with different configurations
  • Version control it with your code
  • Reuse it for future runs
What does optimization_priority mean?

optimization_priority = "PRICE" means that TensorPool will execute your job for the lowest price possible. This doesn't always mean the cheapest GPU, but the best value (typically $/performance) GPU for your job.

optimization_priority = "TIME" means that TensorPool will search for the fastest instance types (best GPU) across all cloud providers.

TensorPool uses heuristics to find the best GPU for your job based on the optimization priority you set.

What cloud providers are supported?
Currently GCP, AWS, Azure, Runpod are supported. More cloud providers are coming soon!

Best Practices

  • Run commands like you're starting from scratch
    • TensorPool executes your job in a fresh environment - include all setup commands like installing dependencies, downloading data, etc
  • Save your outputs
    • Always save your model weights and outputs to disk, you'll get them back at the end of the job!
    • Don't save files outside of your project directory, you won't be able to get them back
  • Download datasets and big files within your script
    • All TensorPool machines are equipped 10+Gb/s networking, so large files can be downloaded faster if done within your script
  • Run from the root of your project
    • TensorPool will send your project directory to the cloud, so make sure you're in the right directory
    • Don't run from your home directory or a subdirectory!
  • Don't require stdin
    • TensorPool runs your job in the background, so don't require any input from stdin (like input() in Python)
    • If you need to pass arguments to your script, use command line arguments or environment variables

Getting Help

Why TensorPool?

  • Simplicity: Use GPUs without the hassle of cloud setup, machine configuration, or quotas
  • Flexibility: Change your job configuration on the fly, run multiple experiments, and version control your job configurations
  • Cost Effective: By aggregating excess GPU capacity from multiple cloud providers TensorPool offer GPUs at a fraction of the cost

Get started today at tensorpool.dev!

Contributors 4

  •  
  •  
  •  
  •  

Languages

0