Extend RAG into a real agent. Add a planner, a critic, and a router that lets the agent revise its own work until it passes review.
You have a working RAG pipeline. It answers questions well. Now your PM asks for something different: "I want a system that drafts our weekly product update from the changelog, our customer feedback, and our roadmap. Send it to me when it is good."
That word — good — is the entire problem. Drafts are easy. Drafts that are good require evaluation, and evaluation requires a second pass. A single LLM call cannot reliably do both creative work and quality control on its own output.
The pattern that solves this is the planner-critic loop: one model plans and writes, a second model judges, a router decides whether to ship or revise. It generalizes to almost everything agents do: code (write, run tests, fix), research (search, summarize, verify), customer support (draft response, check policy, send).
Why this exact topology? Three reasons.
Separation of roles. Planner and Critic are different prompts, often different models. Mixing them in one node — "write a draft and then judge it" — produces the well-known failure mode of LLMs being unable to honestly critique their own output. Split the roles and quality goes up immediately.
Explicit cycle. The Router is a separate node (a conditional edge in LangGraph code) because the decision to loop is the most important decision in the system. You want it visible on the canvas, in the logs, and in your evals. Inline loops disguised as "the model will know when to stop" do not survive contact with real users.
Convergence to RAG when k=0. The agent graph contains the RAG graph as a special case (run once, never loop). This is not an accident — it is the right way to evolve a system. Start with the smallest pipeline that produces output, then add loops only where the data tells you you need them.
The critic keeps finding new problems, the writer keeps making new mistakes, and you pay per iteration. Always set both a recursion_limit on the graph AND a per-iteration counter in state. Force-route to END at 3-5 iterations regardless of verdict.
Asking one LLM call to both write and judge is the most common anti-pattern. The model defends its own output and approves things it should not. Split into two model invocations with different system prompts. Even using the same model with different prompts is a massive improvement.
If your critic returns prose like 'this looks pretty good but you might want to...' the router has to do its own NLP to decide whether that is approve or revise. Use structured output (Pydantic, with_structured_output) so the verdict is a strict Literal['approve', 'revise']. The savings in debugging time are enormous.
The critic says 'be more concise', the writer cuts a sentence, the critic says 'now it is missing detail', and you are stuck. Pass previous critiques into the critic prompt so it cannot contradict itself, or force it to acknowledge improvement on each pass.
If Act calls `send_email` or `charge_card` and the loop retries, you have sent two emails or charged twice. Wrap non-idempotent tools in an outer guard that records which tools have run for which iteration, or only place such tools after the loop terminates.
Each loop iteration appends to state. By iteration 10, state can exceed your model's context window and the agent silently truncates important history. Either summarize state between iterations or set strict per-field size limits and reject inputs that exceed them.