Aria - LLM Pipeline

Input Sources

INPUT

Web UI (Voice)

Microphone → VAD → Speaker ID → STT (Whisper)

INPUT

Discord Text

Message + chat history sync

INPUT

Discord Voice

Audio stream → VAD → STT

▼

GATE

Awareness LLM

Discord only — decides if message is directed at Aria or if she should chime in

▼

ROUTING

Channel Manager

Identity resolution • Speaker profile & permissions • Memory context switch • Chat history sync

▼

Thought Processor

MEMORY

Fact Extraction

Extract preferences, profile info from input

MEMORY

Memory Recall

Semantic search for relevant long-term memories

MEMORY

File Discoveries

Match input to recently explored files

▼

Stage 1 — Goal Reasoning

LLM

Goal Reasoner

Think → Act → Evaluate loop (max 5 iterations)
Picks actions: web_search, file_share, reminders, home_assistant, plex, vision, explore, etc.
Routes to plugins via ACTION_PLUGIN_MAP

▼

Stage 2 — Emotional Analysis

LLM

Emotional Reasoner

Analyzes message + action result → emotion + intensity + gesture

▼

Stage 3 — Response Generation

BUILD

Response Context Builder

Speaker identity • Emotional state • Action results • URLs • Language detection (NO/EN/mixed) • Pleasure state

▼

LLM

Response LLM

System prompt + conversation memory + context → Aria's response

▼

CLEAN

Post-Processing

Refusal handling • Gesture extraction • Arousal effects

MEMORY

Memory Storage

LLM extracts learnable facts → long-term memory with embeddings

▼

GATE

Value Gate

Discord only (when not directly addressed) — LLM checks if response adds value, otherwise reacts with emoji

▼

Output

Voice (Web UI / Discord)

LLM

Language Tagger

Tags <en> / <no> segments for bilingual TTS

▼

OUTPUT

GPT-SoVITS TTS

Voice cloning • Per-segment language model • Streamed audio

Text (Discord / other)

OUTPUT

Text Response

Split for platform char limits • Sent to channel

▼

Frontend

Web UI (avatar + audio playback) • Discord channel • WebSocket events (emotion, gestures)