Three Agent Patterns I Use in Every Production System
ReAct, hierarchical orchestration, and reflection loops — the patterns that survived contact with production and why they work.
Most agent tutorials show you a chatbot that can call a function. That’s not a production system — it’s a demo. The moment you deploy an agent into a real workflow, a different set of problems appears: wasted tool calls, context limits, subtle output errors that compound silently.
These three patterns are what I actually deploy. Each one solves a specific failure mode. I’ll show you the architecture, the code, and — just as importantly — when each pattern breaks down.
Pattern 1 — ReAct: Reason Before You Act
The problem it solves: agents that execute blindly. Without an explicit reasoning step, an agent will call the first tool that seems relevant, get a result it doesn’t know how to interpret, call another tool, and spiral into a chain of wasted operations. I’ve watched agents burn through 30 API calls to solve a problem that needed three — because they never stopped to think about what they actually needed.
How it works: ReAct (Reason + Act) interleaves a reasoning step before every action. The agent explicitly thinks about what information it has, what it’s missing, and which tool will get it closer to the goal. Then it acts. Then it observes the result and reasons again.
In LangGraph, this maps cleanly to a three-node cycle:
from langgraph.graph import StateGraph, END
from langchain_core.messages import AIMessage, ToolMessage
class ReActState(TypedDict):
messages: list
goal: str
step_count: int
def reason(state: ReActState) -> dict:
"""Think about what we know and what to do next."""
response = llm.invoke([
SystemMessage(content=f"Goal: {state['goal']}"),
*state["messages"],
HumanMessage(content="Reason about what you know and what tool to call next. "
"If you have enough information, respond with DONE."),
])
return {"messages": [response], "step_count": state["step_count"] + 1}
def act(state: ReActState) -> dict:
"""Execute the tool call from the reasoning step."""
last = state["messages"][-1]
result = execute_tool(last.tool_calls[0])
return {"messages": [ToolMessage(content=result, tool_call_id=last.tool_calls[0]["id"])]}
def should_continue(state: ReActState) -> str:
last_ai = [m for m in state["messages"] if isinstance(m, AIMessage)][-1]
if not last_ai.tool_calls:
return "done"
if state["step_count"] > 15:
return "done" # circuit breaker
return "act"
graph = StateGraph(ReActState)
graph.add_node("reason", reason)
graph.add_node("act", act)
graph.add_edge("act", "reason") # observe is implicit in the next reason step
graph.add_conditional_edges("reason", should_continue, {"act": "act", "done": END})
Notice the circuit breaker at 15 steps. Without it, a confused agent will loop indefinitely. Every ReAct implementation needs a hard limit.
When it breaks down: ReAct is greedy — it optimises the next step, not the whole plan. For tasks requiring 10+ coordinated steps with dependencies between them, ReAct will wander. It’ll solve step 3 in a way that makes step 7 impossible, because it never looked ahead. That’s when you need the next pattern.
Where I use it: incident investigation agents. The agent reasons about what it knows from the alert, queries logs, checks recent deployments, examines metrics — each step informed by what the previous step revealed. The non-linear, exploratory nature of debugging maps perfectly to ReAct.
Pattern 2 — Hierarchical Orchestration
The problem it solves: single-agent systems that try to do everything. As task complexity grows, a single agent hits context window limits, loses coherence, and starts forgetting earlier steps. The agent equivalent of a developer trying to hold an entire monolith in their head.
How it works: a planner agent decomposes the goal into discrete sub-tasks. Each sub-task has a clear contract: typed input, expected output, and success criteria. Specialist worker agents handle each sub-task independently. The planner reviews results and decides whether to proceed, retry, or escalate.
The critical design decision is the task interface. This is the contract between the planner and its workers:
from pydantic import BaseModel
class AgentTask(BaseModel):
task_id: str
description: str
input_data: dict
expected_output: str # what success looks like
success_criteria: list[str] # verifiable checks
assigned_to: str # which specialist
class TaskResult(BaseModel):
task_id: str
status: Literal["success", "failed", "needs_review"]
output: dict
reasoning: str # why the agent thinks this is correct
criteria_met: dict[str, bool] # each criterion checked
def orchestrate(goal: str) -> dict:
# Step 1: Planner decomposes the goal
tasks = planner_agent.decompose(goal)
results = []
for task in tasks:
# Step 2: Route to specialist
worker = get_worker(task.assigned_to)
result = worker.execute(task)
# Step 3: Planner reviews
if result.status == "failed":
# Retry with adjusted instructions, or reassign
result = handle_failure(task, result)
results.append(result)
return {"goal": goal, "results": results}
The success_criteria field is what makes this work. Without it, the planner has no way to verify whether a worker actually succeeded — it just has to trust the output. With explicit criteria, the planner can programmatically check results before moving to the next task.
When it breaks down: tightly coupled tasks that can’t be cleanly decomposed. If worker A’s output fundamentally depends on a decision worker B hasn’t made yet, you’ll spend more time coordinating than executing. This is the exact same trade-off as microservices — don’t decompose until complexity demands it. For simple sequential workflows, a single ReAct agent is usually better.
Where I use it: feature development agents. A planner breaks “implement user authentication” into research (what auth patterns exist in the codebase), implementation (write the code), and review (validate against security criteria). Each worker has a focused context window and a clear deliverable.
Pattern 3 — The Reflection Loop
The problem it solves: first-pass LLM outputs that are “close enough” but contain subtle errors. A code agent writes a function that handles 9 out of 10 edge cases. A data agent generates a query that’s syntactically correct but logically wrong. These near-misses are dangerous because they look right at a glance and only fail in production.
How it works: after generating output, a separate evaluation pass critiques it against explicit criteria. The agent revises based on the critique. One reflection pass catches most issues. Two passes hit diminishing returns. Three is almost never worth the latency.
The key to effective reflection is a structured critique prompt. Vague instructions like “review this for quality” produce vague feedback. Instead, give the critic specific criteria to check:
def reflect(state: ReflectState) -> dict:
critique = llm.invoke([
SystemMessage(content="""Review the output against these criteria:
1. COMPLETENESS — does it address every requirement?
2. CORRECTNESS — are there logical errors or edge cases missed?
3. FORMAT — does it match the expected output schema?
For each criterion, respond with PASS or FAIL and a one-line explanation.
If all pass, respond with APPROVED.
If any fail, explain what needs to change."""),
HumanMessage(content=f"Requirements: {state['requirements']}\n\n"
f"Output to review:\n{state['output']}"),
])
if "APPROVED" in critique.content:
return {"status": "approved", "output": state["output"]}
# Revise with the critique as context
revised = llm.invoke([
SystemMessage(content="Revise the output based on this feedback. "
"Address every FAIL criterion."),
HumanMessage(content=f"Original: {state['output']}\n\n"
f"Critique: {critique.content}"),
])
return {"status": "revised", "output": revised.content, "revision_count": state["revision_count"] + 1}
When reflection helps vs. hurts: high-stakes outputs benefit enormously. Code generation, data transformations, legal document drafting — anything where a subtle error has real consequences. For low-stakes outputs like draft summaries or brainstorming lists, reflection adds latency without meaningful quality improvement. Measure the error rate with and without reflection for your specific use case. If it doesn’t measurably improve output quality, skip it.
The cap matters. I hard-limit reflection to two passes. After two revisions, if the output still isn’t passing criteria, the problem is usually in the prompt or the task definition — not something another revision will fix. Escalate to a human instead of looping.
Where I use it: every code-generation pipeline. The agent writes the code, then a reflection pass checks it compiles, handles the specified edge cases, and follows the project’s conventions. The revision rate on first pass is typically around 30% — meaning reflection catches real issues nearly a third of the time.
These three patterns handle 80% of the production agent work I encounter. ReAct for exploratory tasks where the path isn’t known upfront. Hierarchical orchestration for complex tasks that exceed a single agent’s context. Reflection for high-stakes outputs where “close enough” isn’t good enough.
The remaining 20% is domain-specific glue — custom tools, data connectors, business logic. But the architectural bones are always one of these three, or a combination.
Start with ReAct. It’s the simplest and solves the most common problem (agents acting without thinking). Add hierarchical orchestration when a single agent can’t hold the full context of what it’s doing. Layer in reflection for outputs where errors have consequences. Resist the urge to over-architect. The simplest pattern that works is the right one.