Agentic Coding Works Better as a Doer/Reviewer Pipeline

A lot of AI coding talk still revolves around the individual assistant: the model in the editor, the chat window beside the codebase, the prompt that gets the next patch unstuck.

Spencer Graham's fireside pointed at a more useful pattern. His setup is less like asking one assistant to build a feature and more like running a small production pipeline. A coordinator receives the feature or spec. A planning agent works through the implementation plan. A reviewer checks that plan. Once the plan passes, a separate implementation agent does the work. Another reviewer loop checks the implementation. Then a final comprehensive review catches what the narrower loops missed.

That structure matters because it changes the shape of the work. The agent is not just doing. The agentic system is separating doing from judgment.

Diagram showing a feature request moving through coordinator, planning, plan review, implementation, code review, and final check stages.

From assistant to pipeline

The simplest version of AI coding is conversational: ask for a change, get a patch, inspect the result. That can be useful, but it puts too much pressure on one loop. The same interaction has to produce the idea, write the code, notice the edge cases, remember the constraints, and judge whether the result is good enough.

Spencer described a more staged workflow. Claude Code acts as a coordinator, while Codex is used non-interactively inside review loops. The planning step gets reviewed before implementation starts. The implementation step gets reviewed before the work is treated as complete. A final review pass replaces part of what had previously been a more CI-style agenda review.

The interesting part is not the exact stack. It is the role separation.

A planner can focus on the approach. A reviewer can look for gaps in that approach. An implementer can focus on making the approved plan real. Another reviewer can inspect the actual diff. The final pass can ask whether the whole thing still coheres.

That sounds obvious because it resembles how strong human teams already work. We separate design from implementation, implementation from review, and review from release. Agentic coding becomes more legible when it borrows that shape instead of pretending the whole process should collapse into one brilliant response.

Review loops create handles

The strongest reason to use a doer/reviewer pattern is not that reviewers are always right. They are not. It is that review creates a handle on the work.

A plan that must pass review can be discussed before code exists. An implementation that must pass review can be judged against an agreed plan rather than a vague memory of the original request. A final check gives the system one more chance to catch integration problems, missing tests, or a mismatch between the intended feature and the shipped change.

That matters for teams because agentic work can otherwise disappear into private local loops. Someone prompts an agent, gets a result, patches the repo, and moves on. If the process is invisible, the team may only see the final diff. They lose the reasoning that led there.

A reviewed pipeline can leave better traces: the plan, the objections, the changes made in response, and the final checks. Those traces are useful even when a human reviewer ultimately makes the call.

The hard part is still verification

Spencer's workflow also surfaced the limit of the pattern. Reviewing code is not the same as knowing whether the product is good.

He called out human-facing interfaces as a harder category for agents to evaluate. A program can run tests. A reviewer can inspect a diff. But a user interface also has to feel coherent. It has to match user intent, avoid awkward states, and make sense to someone who does not already understand the implementation.

That is where agentic coding starts to expose the next bottleneck. If agents can write more software faster, teams need stronger ways to decide whether the result is worth shipping. The review loop has to move beyond syntax, tests, and local correctness. It needs product criteria, UX judgment, and enough shared context to know what good means.

Spencer mentioned the possibility of an automated UX QA agent that runs nightly or after pushes to main, inspects the application, opens issues, and maybe eventually fixes some of them. Framed carefully, that is not a claim that UX judgment can be fully automated. It is a sign of where the pressure is moving. Once agents can generate more interface work, teams need better ways to inspect interface quality.

The atomized developer problem

The wider implication of the fireside was not just technical. If every builder has a powerful local agent loop, individual throughput can rise while shared context gets weaker.

Spencer and the group circled this as a coordination problem. Project work is not only a set of goals and tasks. It is also social context: who knows what, why a decision was made, what tradeoffs were considered, what kind of work feels meaningful, and how people stay aligned while building together.

Agentic workflows can unbundle some of that. A developer can ask a personal agent to plan, implement, and review in a private workspace. That may make the individual more capable. It may also make the team's shared understanding thinner if the work does not produce artifacts other people can inspect.

This is why the doer/reviewer pipeline is more than an efficiency pattern. It is also a coordination pattern. It gives the work stages. It gives humans places to intervene. It gives teams artifacts to preserve. It makes the agentic process easier to audit, discuss, and improve.

Process over prompts

The practical takeaway from Spencer's fireside is simple: agentic coding is becoming less about the perfect prompt and more about the process around the model.

A useful workflow defines roles. It separates planning from implementation. It reviews plans before code. It reviews code before acceptance. It keeps final checks explicit. And when the work touches human-facing product quality, it does not pretend that code review alone is enough.

That does not make Spencer's setup a universal template. It is one builder's working pattern, shaped by his tools and constraints. But the underlying lesson is portable: as agents become more capable, the differentiator is not only what they can produce. It is how well the surrounding process lets people inspect, trust, and coordinate that production.

For teams, the question is no longer just, "Can an agent do this?"

The better question is: "Can we review what the agent did, understand why it did it, and keep the work connected to shared context?"

That is where agentic coding starts to look less like automation magic and more like engineering practice.