StarCoder 2

StarCoder2 is a family of code generation models (3B, 7B, and 15B), trained on 600+ programming languages from The Stack v2 and some natural language text such as Wikipedia, Arxiv, and GitHub issues. The models use Grouped Query Attention, a context window of 16,384 tokens, with sliding window attention of 4,096 tokens. The 3B & 7B models were trained on 3+ trillion tokens, while the 15B was trained on 4+ trillion tokens. For more details check out the paper.

Quickstart

StarCoder2 models are intended for code completion, they are not instruction models and commands like "Write a function that computes the square root." do not work well.

Installation

First, we have to install all the libraries listed in requirements.txt

pip install -r requirements.txt
# export your HF token, found here: https://huggingface.co/settings/account
export HF_TOKEN=xxx

Model usage and memory footprint

Here are some examples to load the model and generate code, with the memory footprint of the largest model, StarCoder2-15B. Ensure you've installed transformers from source (it should be the case if you used requirements.txt)

pip install git+https://github.com/huggingface/transformers.git

Running the model on CPU/GPU/multi GPU

Using full precision

# pip install git+https://github.com/huggingface/transformers.git # TODO: merge PR to main
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigcode/starcoder2-15b"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# to use Multiple GPUs do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Using torch.bfloat16

# pip install accelerate
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

checkpoint = "bigcode/starcoder2-15b"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

# for fp16 use `torch_dtype=torch.float16` instead
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", torch_dtype=torch.bfloat16)

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
LICENSE		LICENSE
README.md		README.md
finetune.py		finetune.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

StarCoder 2

Table of Contents

Quickstart

Installation

Model usage and memory footprint

Running the model on CPU/GPU/multi GPU

Quantized Versions through `bitsandbytes`

Text-generation-inference:

Fine-tuning

Setup

Training

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

bigcode-project/starcoder2

Folders and files

Latest commit

History

Repository files navigation

StarCoder 2

Table of Contents

Quickstart

Installation

Model usage and memory footprint

Running the model on CPU/GPU/multi GPU

Quantized Versions through bitsandbytes

Text-generation-inference:

Fine-tuning

Setup

Training

Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Quantized Versions through `bitsandbytes`

Packages