GraphLab — A visual lab for LLM graphs by Anthony DiBenedetto

Lesson overview

Agent Loop: Plan, Act, Critique

Extend RAG into a real agent. Add a planner, a critic, and a router that lets the agent revise its own work until it passes review.

The problem we are solving

You have a working RAG pipeline. It answers questions well. Now your PM asks for something different: "I want a system that drafts our weekly product update from the changelog, our customer feedback, and our roadmap. Send it to me when it is good."

That word — good — is the entire problem. Drafts are easy. Drafts that are good require evaluation, and evaluation requires a second pass. A single LLM call cannot reliably do both creative work and quality control on its own output.

The pattern that solves this is the planner-critic loop: one model plans and writes, a second model judges, a router decides whether to ship or revise. It generalizes to almost everything agents do: code (write, run tests, fix), research (search, summarize, verify), customer support (draft response, check policy, send).

Why this graph shape

Why this exact topology? Three reasons.

Separation of roles. Planner and Critic are different prompts, often different models. Mixing them in one node — "write a draft and then judge it" — produces the well-known failure mode of LLMs being unable to honestly critique their own output. Split the roles and quality goes up immediately.

Explicit cycle. The Router is a separate node (a conditional edge in LangGraph code) because the decision to loop is the most important decision in the system. You want it visible on the canvas, in the logs, and in your evals. Inline loops disguised as "the model will know when to stop" do not survive contact with real users.

Convergence to RAG when k=0. The agent graph contains the RAG graph as a special case (run once, never loop). This is not an accident — it is the right way to evolve a system. Start with the smallest pipeline that produces output, then add loops only where the data tells you you need them.

Prerequisites

Completed RAG 101 or equivalent familiarity with the basic LangGraph state pattern
Comfort writing typed Python (we use Pydantic for structured outputs)
Conceptual familiarity with tool-calling LLMs (OpenAI function-calling, Anthropic tool use)
An LLM provider key for following along (Anthropic, OpenAI, Bedrock, or a local Ollama model)

What you will learn

Explain the difference between a DAG pipeline and a cyclic agent graph
Build a planner-critic loop in LangGraph with explicit conditional edges
Use structured output (Pydantic) to make critic verdicts machine-readable
Add iteration caps, recursion limits, and oscillation detection
Stream graph events to a UI and trace runs in LangSmith
Recognize when to use ReAct, Planner-Critic, or supervisor patterns

Common pitfalls

Critic loops can run forever

The critic keeps finding new problems, the writer keeps making new mistakes, and you pay per iteration. Always set both a recursion_limit on the graph AND a per-iteration counter in state. Force-route to END at 3-5 iterations regardless of verdict.

Self-critique without role separation

Asking one LLM call to both write and judge is the most common anti-pattern. The model defends its own output and approves things it should not. Split into two model invocations with different system prompts. Even using the same model with different prompts is a massive improvement.

Free-text verdicts the router cannot parse

If your critic returns prose like 'this looks pretty good but you might want to...' the router has to do its own NLP to decide whether that is approve or revise. Use structured output (Pydantic, with_structured_output) so the verdict is a strict Literal['approve', 'revise']. The savings in debugging time are enormous.

Critique oscillation

The critic says 'be more concise', the writer cuts a sentence, the critic says 'now it is missing detail', and you are stuck. Pass previous critiques into the critic prompt so it cannot contradict itself, or force it to acknowledge improvement on each pass.

Tools that are not idempotent inside loops

If Act calls `send_email` or `charge_card` and the loop retries, you have sent two emails or charged twice. Wrap non-idempotent tools in an outer guard that records which tools have run for which iteration, or only place such tools after the loop terminates.

Context accumulation past the window limit

Each loop iteration appends to state. By iteration 10, state can exceed your model's context window and the agent silently truncates important history. Either summarize state between iterations or set strict per-field size limits and reject inputs that exceed them.