10000 GitHub - kannachi323/dweller-speech
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

kannachi323/dweller-speech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dweller-speech

This is the official repository for the speech engine powering Dweller. It includes both text-to-speech (TTS) and speech-to-text (STT) components, enabling voice interaction within the Dweller application.

Description

Before diving into any modifications of the TTS or STT components, I recommend checking out Dweller to better understand its architecture and how dweller-speech fits in.

In short, Dweller is a multi-modal AI system that integrates vision, hearing, voice, and other large models to function intelligently. This repository specifically powers Dweller’s hearing and voice — enabling it to listen and speak.

The speech-to-text model is based on Vosk for offline, real-time transcription through an audio input (such as microphone, speakers, etc). In the stt directory, there are a few classes that handle audio input buffering, feeding live audio data into Vosk for transcribing. To see a list of supported languages, go check out vosk-api documentation for more details.

The text-to-speech model uses the Kokoro engine which is a curated pipeline of pretrained voices fine-tuned for smoother, more expressive output. The tts directory contains code for synthesizing speech from raw text, selecting voices, and managing audio playback. This allows Dweller to respond in a natural and flexible way. Please take a look at kokoro and kokoro-onnx for more examples and documentation.

Purpose

The goal of dweller-speech is not to develop new speech-to-text (STT) or text-to-speech (TTS) models from scratch, but rather to leverage powerful, existing open-source solutions and make them easier to integrate into the Dweller ecosystem.

This repository provides a thin abstraction layer over tools like Vosk and Kokoro, offering:

  • Wrapper classes for simplified model usage
  • Local server hosting for self-contained, offline operation
  • WebSocket endpoints for real-time communication with Dweller or other clients

By providing these models with a modular interface, dweller-speech enables Dweller to speak and listen efficiently, while keeping the flexibility to swap or upgrade components as needed.

Getting Started

Dependencies

Installation

Usage

Help

Authors

me :)

Version History

License

Acknowledgments

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0