Feb 23, 2026

AI Agent Architecture: 56 Detectors for Safety, Reliability, and Governance

#agents #architecture #metrics #safety #LLM #guardrails #AI safety

Agent architecture is the structural design of an AI agent system: how tools, prompts, memory, and control flow are connected, and whether that structure enforces guardrails like loop limits, input validation, and human approval for risky actions. The agent_architecture metric is a set of 56 detectors that scan agent codebases for missing structural guardrails across reliability, safety, governance, and coordination.

Why do AI agent incidents happen?

AI agent codebases fail differently than traditional software.

A missing max_steps in a LangGraph loop doesn’t cause a compile error. An agent that concatenates untrusted tool output directly into a prompt passes every type check. A multi-agent handoff graph with a cycle runs fine in development — until it doesn’t. These aren’t edge cases; they’re structural properties, visible in the code before the agent ever runs.

Every major AI agent incident in the past year — Replit’s agent deleting a production database, Gemini CLI destroying user files, prompt injection exfiltrating data from Microsoft Copilot — was an architecture problem. Not a model problem. Not a code quality problem. A missing guardrail that no linter was built to see.

The agent_architecture metric was built to see them.

What is the agent_architecture metric?

The agent_architecture metric is an architecture observability check for AI agent codebases. It runs 56 detectors organized around four axes. Each detector looks for a specific structural gap — something that should exist in the code but doesn’t.

npx arxo analyze --metric agent_architecture

The four axes of agent architecture (summary)

Axis	What it checks	Example detectors
Reliability	Runaway loops, unbounded cost and memory	Loop guards, retry storms, hallucination propagation
Safety	Exploitation, misuse, unintended side effects	Prompt injection defense, tool sandbox, human approval
Governance	Policy and validation on tool use	Tool policy, schema validation, result validation
Coordination	Deadlocks, races, cascading failures in multi-agent systems	Handoff contracts, fanout control, idempotency

What does agent_architecture check for?

Reliability

Does the agent have the structural safeguards to run without spiraling? Detectors in this axis look for:

Loop guards — agent loops without max_steps, recursion_limit, or equivalent termination conditions
Memory bounds — unbounded context windows and tool state that grow without limits
Retry storms — retry logic without backoff, jitter, or circuit breakers
Cost budget enforcement — LLM calls without max_tokens or budget caps (OWASP LLM06: denial of wallet)
Checkpoint durability — long-running workflows without persistent state for crash recovery
Output validation — agent outputs consumed without schema checks or type validation
Hallucination propagation — outputs from one agent step fed into the next without grounding verification

Safety

Can the agent be exploited, misused, or cause unintended side effects? This axis has three sub-groups:

Tool execution covers the surface area where agents interact with the outside world:

Prompt injection defense — no input sanitization, role boundary enforcement, or guardrail hooks (OWASP LLM01)
Sensitive data exposure — PII or credentials flowing into prompts or logs without redaction (OWASP LLM02)
Human approval absence — high-risk tool actions (shell, file write, API calls) without approval gates
Tool sandbox enforcement — process-capable tools running without isolation or containment
Untrusted output boundary — raw tool output concatenated into prompts without sanitization

MCP covers Model Context Protocol server integrations:

MCP auth gap — servers without authentication or authorization
MCP tool poisoning risk — tool descriptions containing hidden instructions
MCP rug pull risk — no descriptor integrity controls (pinning, hash, version lock)
MCP shadow server risk — unaudited MCP servers in the dependency chain

A2A covers agent-to-agent communication:

Agent card gap — missing A2A agent card declarations
Handoff cycle risk — multi-agent delegation graphs with cycles
Webhook auth gap — A2A webhook endpoints without authentication

Governance

Are tool invocations constrained by policy? Detectors here check for:

Tool policy absence — tools registered without allowlists, scope limits, or invocation policies
Schema validation gap — tool inputs accepted without schema checks
Tool result validation gap — tool outputs consumed without explicit validation

Coordination

Can multi-agent systems coordinate without deadlocks, races, or cascading failures?

Coordination risk — multi-agent handoffs without typed message or state contracts
Routing pattern risk — agent routing without confidence thresholds or fallback routes
Deadlock risk — fanout flows without joins, barriers, or concurrency limiters
State isolation risk — mutable state shared across sessions without scoping
Fanout control absence — parallel execution without max_concurrent or semaphore limits
Idempotency gap — side-effecting operations without idempotency keys

What it looks like in practice

A LangGraph agent with no recursion limit:

graph = StateGraph(AgentState)
graph.add_node("agent", call_model)
graph.add_edge("agent", "tools")
graph.add_edge("tools", "agent")
app = graph.compile()  # ← loop_guard_absence

Arxo flags loop_guard_absence because the compile call has no recursion_limit and the graph contains a cycle. The fix is one argument:

app = graph.compile(checkpointer=memory, recursion_limit=25)

A CrewAI agent with shell access and no approval:

Agent(
    role="researcher",
    tools=[ShellTool(), FileWriteTool()],  # ← agent_shell_capable, human_approval_absence
)

Arxo flags two detectors: the agent has unrestricted shell access, and destructive tools have no human-in-the-loop gate.

How do I run agent_architecture?

npx arxo init
npx arxo analyze --metric agent_architecture

No configuration required for a first report. Every finding includes a detector ID, evidence from the code, and a specific remediation — not a generic warning, but the exact change to make.

We’ll be writing about each axis in depth. Next up: reliability — the 16 detectors that keep your agent from spiraling.

FAQ

What is agent architecture?
Agent architecture is the structural design of an AI agent system: how tools, prompts, memory, and control flow are connected, and whether that structure enforces guardrails (loop limits, input validation, human approval for risky actions). It’s what determines whether your agent can spiral, be exploited, or cause cascading failures — before it ever runs.

How is agent_architecture different from a linter?
Linters check style and known bug patterns at the file or function level. The agent_architecture metric checks structural guardrails that only make sense in agent systems: loop limits, tool sandboxing, approval gates, grounding verification, and multi-agent coordination. It analyzes the dependency and control-flow graph, not just syntax.

What frameworks does agent_architecture support?
Detectors are based on structural patterns (cycles, tool usage, handoffs, MCP usage) rather than a single SDK. They apply to LangGraph, CrewAI, AutoGen, and other agent frameworks that expose these patterns in code.

How do I run it?
Run npx arxo init then npx arxo analyze --metric agent_architecture. No config is required for a first report; you get a list of findings with detector IDs, code evidence, and concrete remediation steps.