Converser

A conversational agent following an automated customer support workflow.

System Architecture

stateDiagram-v2
  Greet --> CollectEmail: Ask initial question
  CollectEmail --> ValidateEmail: Ask for email
  ValidateEmail --> CollectDeviceInfo: Ask about device
  CollectDeviceInfo --> CollectIssueDetails: Get problem details
  CollectDeviceInfo --> CollectDeviceInfo: Keep collecting
  CollectIssueDetails --> ProvideSolutions: Look for solutions
  ProvideSolutions --> CheckSatisfaction
  ProvideSolutions --> ProvideSolutions: Keep trying
  CheckSatisfaction --> Farewell
  CheckSatisfaction --> CheckSatisfaction: Propose more solutions
  Farewell --> [*]

Converser is built around a state machine that guides the conversation flow, with specialized agents handling different stages of the customer support interaction. Each state in the diagram represents a conversation node where a specific agent collects or provides information.

The conversation follows these main steps:

Greeting: Initial contact with the customer
Issue Collection: Gathering preliminary information about the customer's technical problem
Customer Identification: Collecting and validating customer email
Device Information: Identifying the device type, brand, and model
Problem Details: Getting comprehensive information about the technical issue
Solution Proposal: Providing troubleshooting steps to the customer
Satisfaction Check: Determining if the problem was resolved or needs escalation
Farewell: Concluding the conversation appropriately

Each node in the graph contains an OpenAI Voice pipeline using the Agents SDK. The graph itself was made with Pydantic graph. Most frameworks for handling Agent orchestration are focused on having a graph of agents to handle a single user request, our use case requires a graph to instead represent different stages of the conversation, while each node could be either a single Agent or a "swarm".

A notable challenge is getting agents to both output structured (Pydantic/jsonschema) output while at the same time streaming dialogue. Wating for the entire output object produces too much latency. This is why we implement partial json parser: this allows the "dialogue" field to be streamed out first, then the rest of the object is formed

Features

Voice-based conversational interface with text-to-speech
Multi-agent architecture with specialized roles
Customer sentiment tracking
State persistence and conversation history
Support for multiple languages
Clear conversation flow with automated handoffs

Setup Instructions

Prerequisites

Python 3.12 or higher
Poetry (dependency management)
Valid OpenAI API key

Installation

Clone the repository

git clone https://github.com/yourusername/converser.git
cd converser

Install dependencies using Poetry
```
poetry install
```
Create a .env file in the project root with your API key:
```
OPENAI_API_KEY=your_api_key_here
```

Running the Application

Start a conversation session:

poetry run conversation

Press space to start recording your message, and press space again to stop recording and send. Note that it's not push-to-talk.

To save conversation history:

poetry run conversation --directory data/conversations

To append conversation state to a TSV file:

poetry run conversation --tsv data/states.tsv

Development

Project Structure

.
├── data/                  # Saved conversations and state data
├── src/
│   └── converser/         # Main package
│       ├── agent.py       # Agent definitions and prompt templates
│       ├── audio.py       # Voice interface utilities
│       ├── graph.py       # Conversation flow graph definition
│       ├── main.py        # CLI entry point
│       └── parser.py      # Utilities for parsing agent responses
└── tests/                 # Unit tests

Running Tests

poetry run pytest

Further Improvements

The sending of the audio can be much improved. A continuous stream should be implemented instead of taking turns. OpenAIs realtime API could be used.
Much refinement could be done on the agent prompts and the conversation graph.
A current limitation is that all nodes must follow the pattern [User message]->[Agent response]. Some parts would be improved if the Agent were allowed to talk first.
The user should be able to switch the conversation language. This is especially important at the beginning.
There should be more work done on improving general latency.
Eventually should be made deployable and useable through websockets.

License

MIT

Credits

Built with OpenAI Agents and Pydantic Graph.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
src/converser		src/converser
tests		tests
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Converser

System Architecture

Features

Setup Instructions

Prerequisites

Installation

Running the Application

Development

Project Structure

Running Tests

Further Improvements

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Languages

RobertoRRW/converser

Folders and files

Latest commit

History

Repository files navigation

Converser

System Architecture

Features

Setup Instructions

Prerequisites

Installation

Running the Application

Development

Project Structure

Running Tests

Further Improvements

License

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages