A pattern for an always on AI Assistant powered by Deepseek-V3, RealtimeSTT, and Typer for engineering
Checkout the demo where we walk through using this always-on-ai-assistant.
cp .env.sample .env
- Update with your keys
DEEPSEEK_API_KEY
andELEVEN_API_KEY
- Update with your keys
uv sync
- (optional) install python 3.11 (
uv python install 3.11
)
This Docker setup provides a consistent and isolated environment, which is especially useful for Windows users to avoid common encoding and audio issues.
- Environment Consistency: Docker containers ensure that the application runs the same way, regardless of the underlying operating system.
- UTF-8 Encoding: The container uses UTF-8 encoding by default, resolving character encoding issues encountered on Windows.
- Audio Support: Docker allows mapping the host's audio devices to the container, enabling seamless use of speech recognition and text-to-speech.
- Isolation: Unlike virtual environments, Docker containers provide full isolation, including the file system, processes, and network. This prevents conflicts with system-wide packages and settings.
-
Install Docker and Docker Compose:
- Download and install Docker Desktop which includes both Docker Engine and Docker Compose.
-
Copy and Update Environment Variables:
cp .env.sample .env # Edit .env to add your DEEPSEEK_API_KEY and ELEVEN_API_KEY
-
Build and Run the Container:
docker-compose up --build
- Base Image: Uses
python:3.11-slim
as the base image, providing a lightweight and consistent Python environment. - Encoding: Sets
LANG
andLC_ALL
environment variables toC.UTF-8
to enforce UTF-8 encoding within the container. - Audio Device Mapping:
- Maps the host's PulseAudio socket to the container, allowing the use of the host's audio system.
- Uses
host.docker.internal
for PulseAudio server address, enabling communication between the container and the host's PulseAudio server. - Maps
/dev/snd
to allow access to audio devices.
- Dependency Management:
- Installs system dependencies including
pulseaudio
,alsa-utils
, andportaudio19-dev
for audio support. - Uses
pip
directly to install Python dependencies, as the container itself provides an isolated environment, eliminating the need for a separate virtual environment.
- Installs system dependencies including
- User Configuration:
- Creates a non-root user (
appuser
) within the container for improved security.
- Creates a non-root user (
- Entrypoint:
- Uses
python main_typer_assistant.py
as the entrypoint, directly invoking the Python script without a virtual environment.
- Uses
Docker containers already provide an isolated environment similar to virtual environments. Using a virtual environment inside a container adds unnecessary overhead and complexity. The Dockerfile installs all dependencies directly into the container's system Python, ensuring a clean and efficient setup.
See
main_base_assistant.py
for more details. Start a conversational chat session with the base assistant:
# Local
uv run python main_base_assistant.py chat
# Docker
docker-compose run assistant python main_base_assistant.py chat
See
main_typer_assistant.py
,modules/typer_agent.py
, andcommands/template.py
for more details.
--typer-file
: file containing typer commands--scratchpad
: active memory for you and your assistant--mode
: determines what the assistant does with the command: ('default', 'execute', 'execute-no-scratch').
- Awaken the assistant
# Local
uv run python main_typer_assistant.py awaken --typer-file commands/template.py --scratchpad scratchpad.md --mode execute
# Docker (this is the default command)
docker-compose up
-
Speak to the assistant Try this: "Hello! Ada, ping the server wait for a response" (be sure to pronounce 'ada' clearly)
-
See the command in the scratchpad Open
scratchpad.md
to see the command that was generated.
See
assistant_config.yml
for more details.
See
assistant_config.yml
for more details.
- 🧠 Brain:
Deepseek V3
- 📝 Job (Prompt(s)):
prompts/typer-commands.xml
- 💻 Active Memory (Dynamic Variables):
scratchpad.txt
- 👂 Ears (STT):
RealtimeSTT
- 🎤 Mouth (TTS):
ElevenLabs
See
assistant_config.yml
for more details.
- 🧠 Brain:
ollama:phi4
- 📝 Job (Prompt(s)):
None
- 💻 Active Memory (Dynamic Variables):
none
- 👂 Ears (STT):
RealtimeSTT
- 🎤 Mouth (TTS):
local
- LOCAL SPEECH TO TEXT: https://github.com/KoljaB/RealtimeSTT
- faster whisper (support for RealtimeSTT) https://github.com/SYSTRAN/faster-whisper
- whisper https://github.com/openai/whisper
- examples https://github.com/KoljaB/RealtimeSTT/blob/master/tests/realtimestt_speechendpoint_binary_classified.py
- elevenlabs voice models: https://elevenlabs.io/docs/developer-guides/models#older-models