Understanding AI Orchestration

Are More Agents Better?

When you hear "multi-agent system," the picture that forms is a team of specialists. One writes, one formats, one reviews. They pass work around, check each other, iterate. Sounds like any good team but in reality building a set of AI agents collaborating on deep structured material is actually really hard, just like it is IRL with humans.

I have been building PrintShop, an experimental pipeline where three AI agents produce professional print PDFs. A content editor improves prose -> A LaTeX specialist typesets the document -> A visual QA agent compiles the PDF, inspects rendered pages with vision, and fixes formatting issues. They run in sequence, gated by quality thresholds, with iteration loops when standards are not met.

On paper it looks like collaboration. In practice it starts like a factory line with inspectors. Here is the sample content the pipeline produces, each can take around 30 minutes:

Collaboration Is Asynchronous and Non-Linear

Anyone who has managed a team knows the work does not flow in clean stages. A designer reviews a draft and catches a structural problem that sends the writer back to square one. An engineer prototypes something that changes the spec. People work in parallel, interrupt each other productively, and backtrack constantly.

This is a feature, not a bug. Humans have judgment. They can say "this is headed in the wrong direction" before it gets expensive.

Agents do not have that. They have quality scores and thresholds. They do not course-correct in real time or walk over to someone's desk and say "the approach we agreed on is not going to work."

Agents Reason in Cycles, Not Pipelines

Traditional workflow automation is input-to-output. Data enters, transformations run, a result comes out. That model works when every step is deterministic and the output is predictable.

Agents are not deterministic, they "reason" which is cyclical. An agent produces output, evaluates it against a quality threshold, and decides whether to try again. In a multi-agent system, that cycle extends across agents: when the collective result falls short, the whole chain may need another pass. The orchestrator is not just sequencing tasks. It is managing a collective reasoning process where each cycle refines the result.

A traditional ETL pipeline does not loop back to step one because step four found a problem. An agent pipeline does exactly that, by design. The quality gates are not error handling. They are the reasoning mechanism. Getting orchestration right means supporting structured iteration without losing track of where the system is or why it went back.

Starting with While-Loop Orchestrator

Before LangGraph, my pipeline was a lengthy Python class with nested conditional branches inside a loop that pass around context. It worked. The agents ran, the PDFs came out, the quality gates fired. But the problems were real:

Changing iteration behavior meant reading the entire loop to understand what touched what.
Routing decisions were invisible. You could not quickly answer "under what conditions does the pipeline skip visual QA?"
Testing orchestration logic was impossible. The routing was not a function you could call in isolation. It was behavior embedded in a stateful loop.

LangGraph, an Open Source Framework

LangGraph is a framework for building agent workflows as state machines, where you define nodes, edges, and routing functions instead of writing imperative control flow. The migration replaced the while-loop with a StateGraph: seven nodes, three conditional routing functions, and an immutable state dict with explicit reducers.

The graph did not make the agents smarter, enable parallelism, or reduce the number of LLM calls. What it did was separate concerns that had no business being tangled together.

Routing became testable. Test node traversal and node functionality independently.

State became explicit. A PipelineState TypedDict with append-only reducers for agent results and quality assessments, and a custom merge for inter-agent context. No more hunting for which function made the last change.

Simpler surface. All orchestration logic moved into the graph definition where it is declared.

Context Enrichment: Giving Agents Peripheral Vision

The original pipeline had a problem. Each agent operated in isolation. The LaTeX specialist had no idea what the content editor changed or why. The visual QA agent had no idea what the LaTeX specialist struggled with. They were passing artifacts forward but not context.

The fix was context enrichment nodes: lightweight LLM calls (Haiku) that sit between the processing stages and synthesize what happened upstream into targeted instructions for the next agent. The graph flow became: content_review → enrich_for_latex → latex_generation → enrich_for_visual_qa → visual_qa.

This is not agents "talking to each other." It is a cheap summarization step that reads the previous agent's quality scores, issues found, and changes made, then produces a focused brief. When the content editor flags awkward phrasing in a methods section, the enrichment node tells the LaTeX specialist "pay attention to sentence structure in Section III." When the LaTeX specialist uses resizebox to force-fit a table, the enrichment node tells visual QA "check tables for readability degradation."

The effect was immediate. The LaTeX quality score for IEEE conference papers jumped from 80 to 94 on first pass. The enrichment nodes cost almost nothing (a few hundred Haiku tokens each) but they eliminated the blind handoff problem.

This is the closest the pipeline gets to that "walk over to someone's desk" moment. It is not real-time and it is not bidirectional. But it is awareness of upstream decisions, which turns out to be what matters most.

Rendering Instructions: Configuration as Natural Language

One architectural decision that has paid for itself many times over: document appearance is controlled by natural language, not code.

Each content type has a type.md file containing detailed rendering instructions that the LaTeX agent reads at generation time. For IEEE conference papers, this includes everything from \author{} block formatting rules to float placement strategies to a list of packages to avoid. When I needed the LLM to stop placing \thanks{} outside the author block (which caused a blank first page), I strengthened the instructions in type.md. When tables kept colliding because consecutive floats were all [!t], I added guidance about alternating placement specifiers.

The alternative would have been hardcoding LaTeX output in Python. Every formatting fix would have been a code change, a test, a deploy. Instead, tuning type.md is like editing a style guide. The LLM reads it fresh every run.

Adding an entirely new content type (IEEE conference) required zero orchestration changes: one type.md file, sample content, and the pipeline ran. That is the payoff of separating "what to produce" from "how to produce it."

Visual QA: Render, Detect, Fix

LaTeX can compile cleanly and still produce a document where two figures overlap, a table overflows its column, or a diagram is illegible. These are rendering defects that only exist in the PDF. The visual QA agent closes this gap: compile to PDF, render pages as images, submit them to a vision model, parse defect reports, patch the LaTeX, and recompile.

The key insight was: do not preemptively compute. Only spend resources fixing problems that actually exist in the rendered output. The pipeline walkthroughs show before/after examples of what this looks like in practice:

Research Report walkthrough — TOC reflow, intro spacing, last-page balancing
IEEE Conference walkthrough — diagram spacing, chart layout, reference formatting
Magazine walkthrough — table overflow, data table fixes, column spacing

The Limits

LangGraph is a tool, not a solution. Two things it did not "fix":

The pipeline is still sequential. Visual QA needs a compiled PDF. The LaTeX agent needs edited markdown. The framework makes the dependency chain visible but does not alter the flow.

The overhead is substantial. Each run involves roughly 40 file I/O operations, 10+ LLM calls, and multiple PDF compilations. The graph makes this all visible but the work is the same.

How I Know It Was Worth It

Quality scores tell the story. The IEEE conference content type hits 94/100 on LaTeX quality (first pass, no iterations needed) and 90+ on visual QA. Before context enrichment, the same content scored 80 on LaTeX and required multiple iterations to pass the gate.

Adding content types requires no orchestration changes. IEEE conference was the third content type. I created a type.md with rendering instructions, wrote sample content, and ran the pipeline. The graph, the quality gates, the enrichment nodes, the visual QA detection patterns: all reused without modification.

Routing changes are surgical. Adding a hard gate for PDF compilation failure was a single conditional in one routing function, not a conditional buried in a while-loop.

Debugging iteration behavior is straightforward. The append-only quality_evaluations list shows every gate decision in sequence. When a run escalates unexpectedly, I read the list. Before, I added print statements.

Defect detection compounds. Every detection pattern added to visual QA (table overflow, float collision, diagram spacing) benefits every content type on every future run. The pipeline gets better at finding problems without getting more expensive.

The Actual Lesson

The benefit of LangGraph was not "graph-based orchestration" as an abstraction or a way to visually build the flow with "low code." It was forcing a separation between what agents do and how the pipeline manages state and decides what happens next.

Human teams benefit from rich, bidirectional, asynchronous communication. Agent pipelines benefit from explicit contracts, predictable routing, and visible state. These are different problems. Treating them as the same produces systems that are neither good human workflows nor good machine workflows.

The hard part of multi-agent orchestration is not getting agents to work together. It is making the coordination legible enough to debug, test, and change six months from now. That is a workflow automation problem, and workflow automation has always been about making the boring parts explicit.

Context enrichment, rendering instructions as natural language, visual feedback loops: these are not collaboration patterns borrowed from human teams. They are engineering patterns that emerged from the constraints of what agents actually are. Stateless functions that need context fed to them. Probabilistic generators that need guardrails after the fact. Reasoners that work in cycles, not conversations.

Build for that, and the orchestration takes care of itself.