The Next Architectural Leap in Document AI

The current state of AI document processing follows a recognizable pattern. An organization crafts a prompt — often quite sophisticated with detailed instructions and examples — passes it to a large language model along with a document and receives structured data in return. Whether built internally or embedded within a newer IDP solution, the underlying architecture is essentially the same: a static prompt that does not evolve based on what happens during processing.

This approach works. It represents a genuine improvement over template-based extraction for handling document variability. But it also leaves significant value on the table. When extraction fails or validation errors occur, the system learns nothing. When human operators correct mistakes, those corrections inform nothing beyond the immediate transaction. But why should the thousandth document be processed the same way as the first?

There is an architectural pattern emerging, primarily in adjacent domains like software development, where AI coding assistants are evolving rapidly, that suggests a more powerful approach: hierarchical agent systems where a persistent orchestrating agent manages ephemeral worker agents (much like a manager leads a team), where the orchestrator can modify how those workers operate based on observed outcomes.

What If the Prompt Could Evolve Itself?

The conceptual shift is fairly straightforward. Rather than a human crafting a static prompt that an LLM executes repeatedly, an AI agent sits above that process and takes responsibility for evolving the prompt over time. The human defines the objective and constraints; the orchestrating agent figures out how to instruct the workers that actually touch documents.

This orchestrator maintains context that persists across processing, observing when workers succeed and fail and identifying patterns in those failures. Crucially, it can modify the instructions, examples or approaches provided to subsequent workers based on what it has learned.

Let's consider an insurance claims workflow where documents arrive from Provider A. Under the current static-prompt model, each document hits the same extraction logic. If extraction errors occur — a misread value, a field the model fails to locate, an unusual document layout — human operators intervene, fix the immediate problem and move on. The system remains unchanged.

Under an orchestrated model, the managing agent spawns a worker to process the first document. That worker encounters a validation failure when attempting to push data downstream. The orchestrator observes this, analyzes the failure, and faces a decision: Is this a one-off error requiring human review? A transient system issue warranting retry? Or a pattern suggesting the current instructions are inadequate for this provider's documents?

If the orchestrator determines an instructional change would help, it modifies the prompt or examples for the next worker and tests that hypothesis. If the modified approach succeeds, it can apply that learning going forward; if not, it escalates appropriately. The feedback loop that experienced human teams naturally develop — notice a pattern, adjust the procedure, verify the adjustment actually helped — becomes embedded in the system itself.

Where Variability Breaks Static Systems

AI for document processing is characterized by variability that static approaches struggle to address gracefully. Healthcare providers change how they submit claims, payers update their EOB formats and edge cases accumulate faster than anyone can anticipate. The traditional response is either to accept degraded accuracy or to continuously invest human effort in prompt refinement and exception handling.

An orchestrating agent offers a different path. Rather than requiring human intervention for every adaptation, the system can handle a meaningful subset of adjustments autonomously, testing changes, verifying outcomes and incorporating successful modifications into its ongoing operation.

Humans remain essential for defining objectives, handling genuinely novel situations and providing oversight. But the boundary between "requires human judgment" and "system can handle this" shifts meaningfully.

Here's where memory architecture matters enormously. A naive implementation might maintain one continuous context, hoping the model retains relevant information as that context compresses over time. More robust approaches give the orchestrator explicit mechanisms to store and retrieve knowledge, recording, for instance, that a particular provider's claims submissions require specific handling, and surfacing that context when documents from that provider appear in future.

This transforms institutional knowledge from something that lives in human heads and scattered documentation into something the system can actually use, persisting even when staff leave or processing volumes spike.

Why Most IDP Vendors Haven't Made This Leap

Most AI-based document processing solutions, whether legacy IDP platforms or newer LLM-based offerings, are still built around a documentcentric architecture. They tend to optimize for extraction accuracy on individual pages rather than for adaptive processing that improves through ongoing operation.

The ROI implications are significant. Organizations using static approaches pay twice: once for the technology and continuously for the human effort required to keep it working as documents evolve. Every new provider format, every schema change, every edge case requires manual intervention to update prompts or retrain models.

Hierarchical AI orchestration shifts this equation. When the system can adapt autonomously to routine variations, human effort can focus on genuinely novel problems rather than endless maintenance. Designing for adaptability from the start means treating the orchestration layer as a firstclass concern, not an afterthought.

The Trade-offs Worth Considering

Of course, this architectural pattern introduces its own considerations. When an agent autonomously modifies its own instructions, organizations need robust logging and decision tracking to maintain explainability. For regulated industries — insurance, financial services and healthcare — governance frameworks need to account for systems that adapt over time.

But this is tractable. The orchestration layer provides a natural place to capture why decisions were made and how instructions evolved. Static approaches, by contrast, offer no such mechanism: each LLM call is essentially a black box, with no architectural support for audit trails. Organizations that need to demonstrate compliance may find that hierarchical orchestration, implemented thoughtfully, actually makes explainability easier rather than harder.

Adaptive orchestration solves the learning problem, but it's one component of what regulated industries need. Production-ready automation also requires the full document workflow, traceability back to source documents, validation against business rules and human review where it matters. The orchestration layer becomes most powerful when it sits within that broader governed architecture.

The underlying insight is this: moving up one level of abstraction, letting an AI agent manage the evolution of document processing logic rather than freezing that logic in a static prompt, represents a genuine architectural evolution. It aligns AI document processing with how effective human teams actually operate: observing outcomes, adapting approaches and accumulating knowledge that makes future work easier.

For document strategy leaders, the key decision is whether to continue investing in manually refined, static logic or to design workflows that can adapt and improve as they run.

Andrew Bird is Head of AI at global IDP provider Affinda, where he is responsible for AI technologies for the automation of high-volume document workflows. He was named a finalist for AI Software Engineer of the Year at the Australian AI Awards 2025 for his work on Affinda's agentic AI platform.