Free Resource
Anthropic just confirmed six weeks of silent Claude Cowork degradation — caught by customers, not by internal monitoring. Three operational tests every CAS firm needs to know when the AI underneath your workflows quietly breaks.
The Problem
Most CAS practices can't answer that. Most haven't asked. Anthropic published a post-mortem this week confirming three separate degradations of Claude Cowork between March 4 and April 16 — six weeks of silent drift while users complained and internal monitoring stayed quiet. AMD's AI director called the product "dumber, lazier" in public before the vendor confirmed.
Every major AI vendor will have a week like this. Models degrade silently. Plans get rewritten. Features ship the same day quality breaks. Your firm has built workflows, deliverables, and client expectations on a category of tool that changes underneath you — and right now, you have no instrument that would tell you when it does.
You need three.
What's Inside
Paste directly into Claude, ChatGPT, Gemini, Copilot, or any frontier LLM. No tooling change required.
Paste your workflow in, get a workflow-specific scoring rubric out. Specific dimensions, a scoring guide, and a target threshold tuned to that workflow. Run it once per workflow; save the rubric.
Score new outputs against the rubric — using a different AI model than the one that produced them. Self-checking is self-defeating; the cross-LLM step is what makes the score trustworthy.
One line in your workpaper, full accountability. AI assistance: Cowork on Opus 4.7, prompt v3, run April 25, 2026. Examples and field notes for where to put it.
A complete rubric, a sample scored output, two drift scenarios, and the version-trail entry — all applied to one repeatable CAS workflow. Adapt the dimensions to your firm.
The Three Tests
Pick one workflow. Have AI write a scoring rubric for what good output looks like. Once a week, run a stable input through the workflow and have a different LLM score the new output. When the score moves from a steady 8 to a steady 6, something changed in the model before the vendor announced it.
The rubric you built for the drift test does double duty. New models will land monthly now. Build the rubric once and you can evaluate any new model the day it ships. That's the difference between vendor optionality as a capability and vendor optionality as a wish.
One line in the workpaper or workflow log: AI assistance: Cowork on Opus 4.7, prompt v3, run April 25, 2026. When the model changes — and it will every six weeks now — your sign-off doesn't. The version trail is the difference between defensible AI use and "we used AI somewhere."
Stay Current
Subscribe to The AI Accountant newsletter and get the Vendor Test Pack delivered to your inbox, along with weekly analysis of the AI developments that matter for your practice.
Your Move
The firms that build this in April spend the next year refining it. The firms that don't will spend the next year hoping the vendor doesn't have another bad week. Pick one workflow, build one rubric, run one weekly score — start the trail this week.