bigscience

Research workshop on large language models - The Summer of Language Models 21

At the moment we have 2 code repos:

https://github.com/bigscience-workshop/Megatron-DeepSpeed - this is our flagship code base
https://github.com/bigscience-workshop/bigscience - (this repo) for everything else - docs, experiments, etc.

Currently, the most active segments of this repo are:

JZ - Lots of information about our work environment which helps evaluate, plan and get things done
Experiments - many experiments are being done. Documentation, result tables, scripts and logs are all there
Datasets info
Train - all the information about the current trainings (see below for the most important ones)

We have READMEs for specific aspects, such as:

hub integration

Trainings

Train 1 - 13B - unmodified Megatron gpt2 - baseline

the full spec and discussions
the training script
checkpoints and logs:
- tensorboard
- logs
chronicles

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (\d+)/; \
print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' \
https://huggingface.co/bigscience/tr1-13B-logs/resolve/main/main_log.txt

Train 3

Architecture and scaling baseline runs: no fancy tricks, just GPT2. Here are links to the respective tensorboards:

Size	1B3	760M	350M	125M
C4 + low warmup	a	b	c
OSCAR + low warmup	f
C4 + high warmup	e
OSCAR + high warmup	d (current baseline)	g	h	i
Pile + high warmup	m	j	k	l

Train 8

This is the current main training

104B - unmodified Megatron gpt2 - with extra-wide hidden size to learn how to deal with training instabilities

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (\d+)/; \
print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' \
https://cdn-lfs.huggingface.co/bigscience/tr8-104B-logs/b2cc478d5ae7c9ec937ea2db1d2fe09de593fa2ec38c171d6cc5dca094cd79f9

Name		Name	Last commit message	Last commit date
Latest commit History 571 Commits
.github		.github
bigscience		bigscience
data		data
evaluation		evaluation
experiments		experiments
finetune		finetune
inference		inference
jz		jz
math		math
megatron-notes		megatron-notes
tests		tests
tools		tools
train		train
.editorconfig		.editorconfig
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
pytorch-notes.md		pytorch-notes.md
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bigscience

Trainings

Train 1 - 13B - unmodified Megatron gpt2 - baseline

Train 3

Train 8

About

Releases

Packages

Languages

License

aashiqmuhamed/bigscience

Folders and files

Latest commit

History

Repository files navigation

bigscience

Trainings

Train 1 - 13B - unmodified Megatron gpt2 - baseline

Train 3

Train 8

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages