Claude doesn’t know how it works.
Today’s lesson: the AI in your stack is a colleague with blind spots. The blind spots are predictable. Your job is to build the feedback loop.
I sat down this evening to set up Firecrawl. The whole thing should have taken ten minutes. It took thirty-five. The reason is worth a blog post, because it’s the single most common AI-in-the-workflow failure I see, and the one nobody writes about honestly.
Claude doesn’t know what its own host application looks like.
Not in any abstract sense. I mean it literally. I asked Claude where in Cowork to add a new MCP server. Confident answer: Settings → Capabilities → MCP Servers. I clicked Settings. No Capabilities. I screenshotted the panel and sent it back. Second confident answer: Settings → MCP Servers, paste the JSON. That menu didn’t exist either. The actual surface was in a completely different part of the app, under Customize → Connectors → +, which neither path had hinted at.
Two wrong assertions, in confident prose, in a row. The setup didn’t start until I made Claude work from a live screenshot instead of memory.
Why it happens
Models learn from text. That text describes Claude’s menus and settings at the moment it was written. The host application then ships an update. The menu moves. The training data doesn’t.
Confidence is a property of the prose, not the knowledge. The hallucination isn’t “Claude made up a UI from nothing.” It’s Claude trusted a description it had no live way to verify. That failure shape is universal: the lawyer remembering a regulation from 2019, the marketer quoting attribution data from a pre-cookie world. The information was right when it was learned. Nothing flagged the expiry date.
Models have it worse because their host environment changes faster than training updates. By the time a Cowork settings menu makes it into a dataset, Cowork has shipped four more updates.
What fixed it
The first wrong path came from a note Claude had filed three days earlier in our shared credentials.md. The note recorded what the menu used to look like on 21 May. By the 24th, the app had been restructured. The note was stale; Claude trusted it as current.
The path that worked was a screenshot. Once Claude could see the live UI, it called the right surface in eight seconds: “Click the plus at the top of the Connectors column. Same place Higgsfield was added.” Correct. One shot.
So I’ve added a rule to my own working canon:
I’ve also stopped trusting notes in my own files that describe UI paths. If a note from last week says “Settings → X → Y → click Z”, I treat it as a hypothesis, not a fact. The UI moves between sessions. The note doesn’t.
The bit that surprised me
What surprised me wasn’t the failure. I’ve seen it dozens of times. It was how fluent Claude was about the failure once I named it.
I told Claude: “You assumed two windows there are one.” Claude filed a failures.md entry on itself. It tagged the failure mode, traced it to the same class as two earlier fails this month, proposed a corrective, and committed to apply it. A junior staffer who could do that would be a hire I’d never let go of.
The framing changes when you treat AI as junior staff, not as an oracle. Oracles you trust. Junior staff you give feedback to, write down what you learned, and check the work against the screen.
In the Marketing Engine Pilot, every AI workflow has a checkpoint where the AI’s output meets a human eye before it touches a customer. The checkpoints are the feedback loop. Without them the system drifts. With them, it gets sharper every week because the failures get logged and the prompts get tightened.
Screenshot before instruction. Log the fail. Re-read the log. Treat notes about UI as hypotheses, not facts. Build the loop and the AI gets sharper. Skip it and the AI drifts, confidently, somewhere you weren’t looking.
The Firecrawl install took thirty-five minutes in the end. Next time it’ll take ten. That’s the loop working.
This is what running, not using, looks like
Every AI workflow ships with a human-in-the-loop checkpoint that catches the confident-but-wrong moments before they touch a customer. The Marketing Engine Pilot installs six of them. Eight weeks. On your stack. Yours to keep.
Talk to Anthony →