AI Agent Architecture: 56 Detectors for Safety, Reliability, and Governance
Agent architecture is the structural design of an AI agent system: how tools, prompts, memory, and control flow are connected, and whether that structure enforces guardrails like loop limits, input validation, and human approval for risky actions. The agent_architecture metric is a set of 56 detectors that scan agent codebases for missing structural guardrails across reliability, safety, governance, and coordination.
Why do AI agent incidents happen?
AI agent codebases fail differently than traditional software.
A missing max_steps in a LangGraph loop doesn’t cause a compile error. An agent that concatenates untrusted tool output directly into a prompt passes every type check. A multi-agent handoff graph with a cycle runs fine in development — until it doesn’t. These aren’t edge cases; they’re structural properties, visible in the code before the agent ever runs.
Every major AI agent incident in the past year — Replit’s agent deleting a production database, Gemini CLI destroying user files, prompt injection exfiltrating data from Microsoft Copilot — was an architecture problem. Not a model problem. Not a code quality problem. A missing guardrail that no linter was built to see.
The agent_architecture metric was built to see them.
What is the agent_architecture metric?
The agent_architecture metric is an architecture observability check for AI agent codebases. It runs 56 detectors organized around four axes. Each detector looks for a specific structural gap — something that should exist in the code but doesn’t.
npx arxo analyze --metric agent_architecture
The four axes of agent architecture (summary)
| Axis | What it checks | Example detectors |
|---|---|---|
| Reliability | Runaway loops, unbounded cost and memory | Loop guards, retry storms, hallucination propagation |
| Safety | Exploitation, misuse, unintended side effects | Prompt injection defense, tool sandbox, human approval |
| Governance | Policy and validation on tool use | Tool policy, schema validation, result validation |
| Coordination | Deadlocks, races, cascading failures in multi-agent systems | Handoff contracts, fanout control, idempotency |
What does agent_architecture check for?
Reliability
Does the agent have the structural safeguards to run without spiraling? Detectors in this axis look for:
- Loop guards — agent loops without
max_steps,recursion_limit, or equivalent termination conditions - Memory bounds — unbounded context windows and tool state that grow without limits
- Retry storms — retry logic without backoff, jitter, or circuit breakers
- Cost budget enforcement — LLM calls without
max_tokensor budget caps (OWASP LLM06: denial of wallet) - Checkpoint durability — long-running workflows without persistent state for crash recovery
- Output validation — agent outputs consumed without schema checks or type validation
- Hallucination propagation — outputs from one agent step fed into the next without grounding verification
Safety
Can the agent be exploited, misused, or cause unintended side effects? This axis has three sub-groups:
Tool execution covers the surface area where agents interact with the outside world:
- Prompt injection defense — no input sanitization, role boundary enforcement, or guardrail hooks (OWASP LLM01)
- Sensitive data exposure — PII or credentials flowing into prompts or logs without redaction (OWASP LLM02)
- Human approval absence — high-risk tool actions (shell, file write, API calls) without approval gates
- Tool sandbox enforcement — process-capable tools running without isolation or containment
- Untrusted output boundary — raw tool output concatenated into prompts without sanitization
MCP covers Model Context Protocol server integrations:
- MCP auth gap — servers without authentication or authorization
- MCP tool poisoning risk — tool descriptions containing hidden instructions
- MCP rug pull risk — no descriptor integrity controls (pinning, hash, version lock)
- MCP shadow server risk — unaudited MCP servers in the dependency chain
A2A covers agent-to-agent communication:
- Agent card gap — missing A2A agent card declarations
- Handoff cycle risk — multi-agent delegation graphs with cycles
- Webhook auth gap — A2A webhook endpoints without authentication
Governance
Are tool invocations constrained by policy? Detectors here check for:
- Tool policy absence — tools registered without allowlists, scope limits, or invocation policies
- Schema validation gap — tool inputs accepted without schema checks
- Tool result validation gap — tool outputs consumed without explicit validation
Coordination
Can multi-agent systems coordinate without deadlocks, races, or cascading failures?
- Coordination risk — multi-agent handoffs without typed message or state contracts
- Routing pattern risk — agent routing without confidence thresholds or fallback routes
- Deadlock risk — fanout flows without joins, barriers, or concurrency limiters
- State isolation risk — mutable state shared across sessions without scoping
- Fanout control absence — parallel execution without
max_concurrentor semaphore limits - Idempotency gap — side-effecting operations without idempotency keys
What it looks like in practice
A LangGraph agent with no recursion limit:
graph = StateGraph(AgentState)
graph.add_node("agent", call_model)
graph.add_edge("agent", "tools")
graph.add_edge("tools", "agent")
app = graph.compile() # ← loop_guard_absence
Arxo flags loop_guard_absence because the compile call has no recursion_limit and the graph contains a cycle. The fix is one argument:
app = graph.compile(checkpointer=memory, recursion_limit=25)
A CrewAI agent with shell access and no approval:
Agent(
role="researcher",
tools=[ShellTool(), FileWriteTool()], # ← agent_shell_capable, human_approval_absence
)
Arxo flags two detectors: the agent has unrestricted shell access, and destructive tools have no human-in-the-loop gate.
How do I run agent_architecture?
npx arxo init
npx arxo analyze --metric agent_architecture
No configuration required for a first report. Every finding includes a detector ID, evidence from the code, and a specific remediation — not a generic warning, but the exact change to make.
We’ll be writing about each axis in depth. Next up: reliability — the 16 detectors that keep your agent from spiraling.
FAQ
What is agent architecture?
Agent architecture is the structural design of an AI agent system: how tools, prompts, memory, and control flow are connected, and whether that structure enforces guardrails (loop limits, input validation, human approval for risky actions). It’s what determines whether your agent can spiral, be exploited, or cause cascading failures — before it ever runs.
How is agent_architecture different from a linter?
Linters check style and known bug patterns at the file or function level. The agent_architecture metric checks structural guardrails that only make sense in agent systems: loop limits, tool sandboxing, approval gates, grounding verification, and multi-agent coordination. It analyzes the dependency and control-flow graph, not just syntax.
What frameworks does agent_architecture support?
Detectors are based on structural patterns (cycles, tool usage, handoffs, MCP usage) rather than a single SDK. They apply to LangGraph, CrewAI, AutoGen, and other agent frameworks that expose these patterns in code.
How do I run it?
Run npx arxo init then npx arxo analyze --metric agent_architecture. No config is required for a first report; you get a list of findings with detector IDs, code evidence, and concrete remediation steps.