Glass box AI deliverables: AI explainability in accounting

We covered the Sage/PwC headline in Monday’s roundup. Here’s the part of their research that’s already eating your week.

Sage and PwC released research at Sage Future on April 28 that put a number on something most CAS practices haven’t measured. Finance professionals now spend an average of 12.9 hours a week reconstructing, validating, and defending AI outputs — and 26% of the time savings AI is supposed to deliver get lost to that verification work. The IDC-conducted study sits underneath the headline most coverage led with: 71% of finance leaders would reject an AI system that can’t explain its outputs, even when those outputs are 99% accurate.

The 71% is a stated preference. The 12.9 hours is what’s already happening on your timesheet.

Tom Herbert at AccountingWEB framed it sharply: if AI saves 10 minutes producing an answer but costs 20 minutes explaining where it came from, the productivity gain has evaporated. The framing lands, but it’s only half the picture. The honest comparison isn’t AI plus verification versus zero — it’s AI plus verification versus the hours a junior would have spent drafting and the partner would still have spent reviewing. The AI path usually wins on total time. What it doesn’t do is remove the verification work. It moves it — onto the partner’s desk, the most expensive time on the docket.

The platform vendor solves the layer they own

Sage’s response was to build the platform answer. The Finance Intelligence Agent for Sage Intacct rolls out in phases through 2026. Sage Copilot now has a “show me how you worked that out” button that walks through its reasoning. The AI Trust Label appears on every AI completion. Sage CEO Steve Hare framed it directly in the keynote: “Finance does not run on answers alone. It runs on answers you can explain. If you cannot show how a number was produced, you cannot use it.”

That is the right standard. But Sage solves it for the layer Sage owns.

Most of what your practice produces never touches the platform’s reasoning surface. Variance commentary, advisory narratives, board memos, client emails, year-end notes, audit-prep workpapers, tax-organizer summaries — the high-judgment, client-facing work where AI compresses the most partner time is also the work that happens in the gaps between platforms. Sage isn’t shipping glass-box AI for those workflows. Neither is QuickBooks. Neither is Karbon. The practitioner has to solve the rest. The 12.9 hours is your problem, not Intuit’s or Sage’s.

The three-document workflow

The practical answer is to stop shipping AI deliverables alone. Ship them with their receipts attached.

When you use AI to produce a piece of work — a variance commentary, a client memo, an advisory narrative — ask the same AI to produce a second document alongside it: a research memo that names where the inputs came from, how the numbers were derived, what assumptions filled the gaps, which model and prompt did the work, and which claims the AI was confident about versus inferred. Some tools will produce this on request; others need a deliberate prompt.

Then feed both documents — the deliverable and the research memo — into a different LLM. If your firm runs Copilot, paste them into Claude or ChatGPT. If your firm runs ChatGPT, send them to Claude or Gemini. The cross-model AI’s job is to produce a third document: a verification report that scores the deliverable against the research memo, flags claims that don’t match the source data, identifies weak inferences, and rates overall reliability. Different model architectures catch different errors. Using two is not perfect, but it’s strictly better than one.

The economic shift is the point. The bulk of the verification work that used to land silently on the partner’s desk now lives with the author. The preparer produces the deliverable, runs Prompt 1 for the research memo, runs Prompt 2 in a second LLM for the verification report, and reads through both before passing the package up. The partner’s job is no longer to verify — it’s to review the verification report and decide whether the verification was strong enough. That is a different task, measured in minutes, not hours, and it sits with the right hour on the docket.

None of this is client-facing. The research memo and the verification report are internal workpapers — documentation supporting the deliverable in the file, not artifacts the client ever sees. The client still receives the deliverable in whatever form they always have. The three-document standard is for your firm’s quality system.

This isn’t compliance theatre. The wringable neck is still the partner’s, and the signature now covers a deliverable that came with its own evidence package.

Run the April test on your own work this week

Pull every client deliverable that left your firm in April. For each AI-touched piece, ask a single question — did this arrive in three documents or one? If the deliverable went out alone, with no research memo behind it and no cross-model verification scoring it, then the 12.9 hours is in your timesheet too. You just haven’t named it.

The 71% test is the version your clients will run on you within twelve months, on the same grounds Sage’s research found finance leaders running it now. The work that survives that test is the work that arrived in three documents the first time.

Did your last AI-assisted deliverable arrive in three documents or one?

AI saved you 10 minutes. Explaining it cost 20.

The platform vendor solves the layer they own

The three-document workflow

Run the April test on your own work this week

Bonus content for subscribers

The platform vendor solves the layer they own

The three-document workflow

Run the April test on your own work this week

Bonus content for subscribers

More on Building AI Workflows

More from Peter McCarroll