A sophisticated AI chat application featuring voice interaction, persistent memory storage, and multi-modal communication capabilities. Built with Node.js, Express, MongoDB, and modern web technologies.
- Structured session handling using persistent IDs
- Format:
{type}-{id}-{version}
(e.g.,global-persistent-storage-001-v1
) - Session validation and verification
- Support for multiple session types (global, user, admin)
- Short-term memory (SLM) for immediate context
- localStorage for quick client-side access
- MongoDB for persistent long-term storage
- Automatic data synchronization between layers
- Personal Information (name, preferences)
- Secret Words
- Favorites
- General Memories
- Timestamps and versioning for all stored data
- Real-time voice input processing
- Continuous listening mode
- Command recognition
- Error handling and recovery
- Azure TTS integration
- Multiple voice options
- Queue-based audio playback
- Interrupt and resume capabilities
- Inactivity detection and timeout
- Conversation mode toggle
- Context preservation
- Exit command handling
- Clean, responsive design
- Real-time status updates
- Audio controls
- Model selection
- Voice selection
- Microphone toggle
- OpenAI GPT integra ADAC tion
- Azure Speech Services
- Google Image Search
- Bing Search capabilities
PersonalInfo Schema:
{
userId: String,
sessionId: String,
sessionType: String,
sessionVersion: String,
type: String,
value: String,
timestamp: Date,
created: Date,
updated: Date
}
Memory Keywords:
Secret Word: /(?:the )?secret word (?:is|=) (.+)/i
Favorites: /(?:my )?favorite (\w+) (?:is|=) (.+)/i
Remember: /remember (?:that )?(.+)/i
- Speech recognition error recovery
- Network failure handling
- Invalid session detection
- Data validation
- Audio playback error management
GET /api/personal-info/:type POST /api/personal-info GET /api/personal-info/all
GET /api/google-image-search POST /api/bing-search
Required environment variables:
MONGODB_URI
SPEECH_API_KEY
OPENAI_API_KEY
GOOGLE_API_KEY
GOOGLE_SEARCH_ENGINE_ID
// Store a secret word
"The secret word is nebula"
// Set user name
"My name is Paul"
// Store a favorite
"My favorite color is blue"
// Get secret word
"What is the secret word?"
// Get name
"What is my name?"
// Get a favorite
"What is my favorite color?"
The application maintains various states:
- Audio playback state
- Speech recognition state
- Processing state
- Conversation mode state
- Session state
- Memory state
- Session validation
- Input sanitization
- API key protection
- Error message sanitization
- Rate limiting (TODO)
- User authentication
- Multiple session support
- Enhanced error recovery
- Data migration tools
- Memory expiration
- Rate limiting
- Enhanced security features
- express
- cors
- mongoose
- axios
- microsoft-cognitiveservices-speech-sdk
- openai
- dotenv
Install dependencies
npm install
Start the server
npm start
Default port: 3335
- Chrome (recommended)
- Firefox
- Safari
- Edge
- Speech recognition requires HTTPS in production
- Some features require specific browser permissions
- Local storage must be enabled
- Stable internet connection required for API features