Your AI Can't Know What Your Team Knows
The Document AI industry has spent years chasing extraction accuracy. But the real challenge isn't reading documents — it's capturing the institutional knowledge that determines what to do with them.
And that’s the problem nobody in Document AI wants to talk about.
The Document AI industry has spent the last three years in an accuracy arms race. Every vendor pitch deck has the same slide: “Our model achieves 99.X% extraction accuracy on invoices.” The benchmarks keep climbing. The models keep improving. And yet Gartner reports that 40% of document AI implementations still underperform their initial ROI projections.
How is that possible?
Because reading the document was never the hard part.
The Knowledge Gap
When an experienced AP analyst picks up an invoice from a vendor they’ve worked with for years, they’re not just reading fields off a page. They know that this particular vendor always rounds shipping charges up. They know that line items labeled “consulting” from this supplier actually map to three different GL codes depending on the project. They know that anything over $50K from the EMEA region needs a second approval, even though that rule isn’t written down anywhere.
That knowledge — accumulated over years of corrections, exceptions, and judgment calls — is what makes document processing actually work inside an organization. And no foundation model, no matter how capable, ships with it.
Most Document AI vendors treat this as a training data problem. Feed the model more examples, fine-tune on your documents, and eventually it will learn. But there’s a structural issue with that approach: the knowledge isn’t in the documents. It’s in the decisions people make about the documents. It lives in the corrections, the overrides, the routing logic, the “we always do it this way for this client” rules that exist in people’s heads and nowhere else.
What Actually Needs to Be Automated
The document AI market grew to $19B in 2025 and is on pace for $32B in 2026. That growth isn’t coming from better OCR. It’s coming from organizations realizing that the real cost of document processing isn’t the reading — it’s the thinking.
A financial institution processing loan packages doesn’t struggle because the AI can’t extract a borrower’s name from a tax return. It struggles because the AI doesn’t know that this specific lender requires three years of tax returns for self-employed borrowers in certain geographies, or that financials from this particular accounting firm always need their depreciation schedules cross-referenced against the balance sheet because they format them inconsistently.
These rules aren’t in any training set. They’re institutional knowledge — the accumulated expertise of people who have done this work for years and built up pattern recognition that no model can replicate from scratch.
The Kodexa Approach
We built Kodexa around a different premise: AI should help your team operationalize what they already know, not try to replace that knowledge with a statistical approximation.
The platform has three components that work as a cycle. In Studio, teams define their knowledge — the data structures, validation rules, and extraction logic for their specific document types and business context. In Workflow, they maintain that knowledge over time — capturing corrections, handling exceptions, and encoding the vendor-specific or customer-specific rules that make their process actually work. In Knowledge, the platform applies that accumulated expertise autonomously, at scale, around the clock.
The AI accelerates every step of this cycle. It helps teams configure faster, suggests rules based on patterns in their corrections, and handles the routine work so humans focus on the exceptions that matter. But the knowledge itself belongs to the customer. It lives in their instance, reflects their expertise, and gets more valuable to them specifically as it grows.
Why This Matters Now
Extraction accuracy on standard invoices hit 95%+ across most vendors last year. That capability is table stakes. The competitive question has shifted to what happens after extraction — the matching, routing, validation, exception handling, and downstream workflow integration that determines whether automation actually saves money or just moves the manual work somewhere else.
This is where most implementations fail. A system that perfectly extracts every field on an invoice but can’t apply the business rules that determine how to process it is an expensive OCR engine. The 40% of implementations underperforming ROI projections aren’t failing on extraction. They’re failing on the gap between reading the document and knowing what to do with it.
The Model-Agnostic Advantage
The other reason we built the platform this way: foundation models are improving fast, and any company whose competitive position depends on having a better model is running on borrowed time. These companies are investing hundreds of billions in making their models better at everything, including reading documents.
You don’t want to compete with that. You want to benefit from it.
Kodexa’s architecture is model-agnostic. Better foundation models make the platform more capable — faster configuration, better suggestions, higher accuracy on novel document types. The value isn’t in the model. It’s in the knowledge layer your team builds on top of it. That layer doesn’t get commoditized when a new model drops. It gets more powerful.
What We’re Building Toward
The agent maturity curve in enterprise software runs from copilots (AI assists, human does the work) through task agents and workflow agents to full autonomy. Most of the industry is stuck at copilots — chatbots bolted onto existing products. Moving from copilots to workflow agents expands the addressable market dramatically, because you start automating operational labor, not just software tasks.
That progression requires exactly the kind of knowledge capture Kodexa is built for. An agent that can orchestrate a multi-step document workflow — receiving an invoice, validating it against contract terms, routing exceptions to the right reviewer, matching to POs, and scheduling payment — needs to know how your organization does each of those steps. Not how organizations do them in general. How yours does.
That’s why we start with humans, not AI. The humans have the knowledge. The AI needs to learn it from them. And once it has, it needs to apply it consistently, transparently, and at a scale no human team can match.
That’s what we’re building.
Philip Dodds is CEO of Kodexa. Learn more at kodexa.ai or schedule a demo.
Tags
Ready to Transform Your Document Processing?
See how Kodexa can help you automate your document workflows with enterprise-grade AI.
