Dolly

This fine-tunes the GPT-J 6B model on the Alpaca dataset using a Databricks notebook. Please note that while GPT-J 6B is Apache 2.0 licensed, the Alpaca dataset is licensed under Creative Commons NonCommercial (CC BY-NC 4.0).

Get Started Training

Install dependency

pip install -r requirements_dev.txt

Download dataset file and model file

wget https://cloud.tsinghua.edu.cn/seafhttp/files/beac2337-e002-403d-befa-4666db860078/parquet-train.arrow
mkdir ./model/ && cd ./model/
wget https://cloud.tsinghua.edu.cn/seafhttp/files/56f50dc0-4cbb-473d-ae95-6f9f87564b3c/vocab.json
wget https://cloud.tsinghua.edu.cn/seafhttp/files/22be3e9c-313d-4194-b513-dfda505aa7a0/tokenizer_config.json
wget https://cloud.tsinghua.edu.cn/seafhttp/files/83ce4dd2-7b00-42ec-b958-369f01573003/tokenizer.json
wget https://cloud.tsinghua.edu.cn/seafhttp/files/eb3f574f-a83e-45a5-b88e-f285d48f266f/special_tokens_map.json
wget https://cloud.tsinghua.edu.cn/seafhttp/files/69a4874c-e8b8-43f5-81a5-84cedc7a6f84/merges.txt
wget https://cloud.tsinghua.edu.cn/seafhttp/files/dc70c876-95f0-48c7-8082-26ca8f39ce06/config.json
wget https://cloud.tsinghua.edu.cn/seafhttp/files/4f79b941-5569-49b0-9b7b-6f6140788689/added_tokens.json
wget https://cloud.tsinghua.edu.cn/seafhttp/files/e5980d8d-1457-424f-a91d-906b8ed45aeb/pytorch_model.bin

Alternatively, you can download model checkpoint from huggingface

pip install transformers
transformers-cli download EleutherAI/gpt-j-6B --cache-dir ./model/
export TRANSFORMERS_CACHE=`pwd`/model

Train the model

Start a single-node cluster with node type having 8 A100 (40GB memory) GPUs (e.g. Standard_ND96asr_v4 or p4d.24xlarge).

export timestamp=`date +%Y-%m-%d_%H-%M-%S`
export model_name='dolly'
export checkpoint_dir_name="${model_name}__${timestamp}"
export deepspeed_config=`pwd`/config/ds_z3_bf16_config.json
export local_training_root='./'
export local_output_dir="${local_training_root}/${checkpoint_dir_name}"
export dbfs_output_dir=''
export te
5FEF
nsorboard_display_dir="${local_output_dir}/runs"
export DATASET_FILE_PATH=`pwd`/parquet-train.arrow
export MODEL_PATH=`pwd`/model/
deepspeed --num_gpus=8 \
    --module training.trainer \
    --deepspeed $deepspeed_config \
    --epochs 1 \
    --local-output-dir $local_output_dir \
    --dbfs-output-dir "" \
    --per-device-train-batch-size 8 \
    --per-device-eval-batch-size 8 \
    --lr 1e-5

Generate some sentences

python generate.py

(It is recommended to use ipython to interactively generate sentences to avoid loading models from disk again and again.)

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
config		config
test		test
training		training
.coveragerc		.coveragerc
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
__init__.py		__init__.py
generate.py		generate.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
run_pytest.sh		run_pytest.sh
train_dolly.py		train_dolly.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dolly

Get Started Training

Install dependency

Download dataset file and model file

Train the model

Generate some sentences

About

Uh oh!

Releases

Packages

Languages

License

stemmons-william/dolly

Folders and files

Latest commit

History

Repository files navigation

Dolly

Get Started Training

Install dependency

Download dataset file and model file

Train the model

Generate some sentences

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages