Small Conversational speech models that can run on device
uv pip install -e .
python demo.py
- Vui.BASE is base checkpoint trained on 40k hours of audio conversations
- Vui.ABRAHAM is a single speaker model that can reply with context awareness.
- Vui.COHOST is checkpoint with two speakers that can talk to each other.
You can clone with the base model quite well but it's not perfect as hasn't seen that much audio / wasn't trained for long
vui is a llama based transformer that predicts audio tokens.
fluac is a audio tokenizer based on descript-audio-codec which reduces the number of codes per second by 4 from 83.1hz to 21.53hz
- Was developed with on two 4090's https://x.com/harrycblum/status/1752698806184063153
- Hallucinations: yes the model does hallucinate, but this is the best I could do with limited resources! :(
- VAD does slow things down but needed to help remove areas of silence.
- Whisper - https://github.com/openai/whisper
- Audiocraft - https://github.com/facebookresearch/audiocraft
- Descript Audio Codec - https://github.com/descriptinc/descript-audio-codec
@software{vui_2025,
author = {Coultas Blum, Harry},
month = {01},
title = {{vui}},
url = {https://github.com/fluxions-ai/vui},
version = {1.0.0},
year = {2025}
}