An interactive map of all the moving pieces in a production agentic AI system.
Agent-driven tool invocation — the agent intelligently chooses which tools to call and when.
Developer-wired tool invocation — tools are pre-configured at design time, not selected by the model.
Two-way data flow — the connection carries data in both directions.
Data flows through a pipeline — post-processed and indexed by a separate system, not the agent.
The most important insight hidden by linear diagrams: agent execution is iterative. After each tool result, the agent returns to the model for another reasoning step. The loop (reason → act → observe → reason) may execute many times per request. This loop is the defining characteristic that separates agentic from automated systems.
The connection from the agent layer to tools is not uniform. Agents autonomously choose tools (non-deterministic). Pipelines and workflows use tools that are wired in at design time (deterministic). This is arguably the defining line between “agentic” and “automated” AI systems.
The architecture captures four distinct memory patterns:
In production, inference requests often route through a model gateway (OpenRouter, LiteLLM) before reaching the model endpoint. The gateway handles model selection, load balancing, failover, rate limiting, and cost tracking. A self-hosted LiteLLM instance can unify access to both cloud APIs and local model servers.
Safety guardrails don’t just constrain agents — they apply at three points: input filtering on prompts (injection detection), agent behavior constraints, and tool-level restrictions on MCP (preventing unauthorized actions).
Human-in-the-loop patterns differ: per-action approval for agents (human approves each tool call), stage-gate for pipelines (review output before next stage), and exception-based for workflows (human notified only on low confidence).
High-value data pathways that are simple but deliver outsized returns — typically invisible in architecture diagrams.
Good prompts are discovered through use, not designed upfront. Capturing and curating prompts that work well in production creates reusable institutional knowledge — “this is how we talk to models about X.”
User/System Prompts → Stored Prompts → Prompt Library
New agents and workflows can bootstrap from proven prompts instead of starting from scratch. It’s a form of organizational memory.
Conversations themselves are mined for context about the user — preferences, patterns, domain knowledge. A dedicated agentic workflow extracts, categorizes, and indexes this into the vector store.
User Prompt → Conversations (DB) → [Mining Workflow] → Vector Store → Context (RAG) → Future Prompts
This creates a second virtuous cycle distinct from generic context-mining:
The mining step is itself agentic — it uses LLMs to extract structured insights, not just raw embeddings.
Every agent interaction potentially generates organizational knowledge. Without this route, that knowledge is trapped in individual conversation threads and lost. Agent outputs (research summaries, analyses, recommendations) flow to wikis and knowledge bases for organizational use.