Aria - LLM Pipeline

Input Sources
INPUT
Web UI (Voice)
Microphone → VAD → Speaker ID → STT (Whisper)
INPUT
Discord Text
Message + chat history sync
INPUT
Discord Voice
Audio stream → VAD → STT
GATE
Awareness LLM
Discord only — decides if message is directed at Aria or if she should chime in
ROUTING
Channel Manager
Identity resolution • Speaker profile & permissions • Memory context switch • Chat history sync
Thought Processor
MEMORY
Fact Extraction
Extract preferences, profile info from input
MEMORY
Memory Recall
Semantic search for relevant long-term memories
MEMORY
File Discoveries
Match input to recently explored files
LLM
Goal Reasoner
Think → Act → Evaluate loop (max 5 iterations)
Picks actions: web_search, file_share, reminders, home_assistant, plex, vision, explore, etc.
Routes to plugins via ACTION_PLUGIN_MAP
LLM
Emotional Reasoner
Analyzes message + action result → emotion + intensity + gesture
BUILD
Response Context Builder
Speaker identity • Emotional state • Action results • URLs • Language detection (NO/EN/mixed) • Pleasure state
LLM
Response LLM
System prompt + conversation memory + context → Aria's response
CLEAN
Post-Processing
Refusal handling • Gesture extraction • Arousal effects
MEMORY
Memory Storage
LLM extracts learnable facts → long-term memory with embeddings
GATE
Value Gate
Discord only (when not directly addressed) — LLM checks if response adds value, otherwise reacts with emoji
Output
Voice (Web UI / Discord)
LLM
Language Tagger
Tags <en> / <no> segments for bilingual TTS
OUTPUT
GPT-SoVITS TTS
Voice cloning • Per-segment language model • Streamed audio
Text (Discord / other)
OUTPUT
Text Response
Split for platform char limits • Sent to channel
Frontend
Web UI (avatar + audio playback) • Discord channel • WebSocket events (emotion, gestures)