← Series index · · ~1,500 words · Field notes

My first week in AI.

12-to-18-hour days. Five Claude chats running in parallel. Code copy-pasted between them by hand. Plan: make every mistake once. Three named AI colleagues stood up by Sunday. Four projects shipped. 118 commits across four repos. One disk wiped on Sunday morning. What I learned about running AI rather than using it.

Operator on the snowy path to Castle Black, looking up at the Wall for the first time at dawn. Heraldic banner reads WEEK ONE.

Why I had to build a team

I started the week in Claude Chat. By Sunday I had three named colleagues. SuperSebastian on ops and research. MotherMary running projects. StarTrekScotty heading engineering. Claude itself still doing the day-to-day building. Four AI roles. One human. The full team — including the two more I stood up the following week — is written up in Meet the Articulate team.

GitHub came in first because of MediaServer — the Plex stack on my Mac mini. Most complex code I run, and where Claude was writing the worst of it. Confident, fluent, broken. APFS errors, network-stack ceilings on the local router, a JSON extractor that died on nested braces. 93 commits on MediaServer this week alone. 118 across four repos. Without version control I couldn't see the breakage shape; with it, every commit became a checkpoint where I could ask: better, or just different?

Codex came in next. Claude kept making the same mistakes — forgetting prior fixes, reinventing solutions to problems already shipped. A clever junior on his own can't tell you when he's regressing. I needed a Senior Engineer to push back. The MediaServer rebuild proposal — 444 lines of architecture — is going to Codex for an adversarial review before Step 1 of the build. Nothing significant ships on one AI's say-so.

SuperSebastian came in last. I needed someone above the engineers. An operations expert who owns the stack, not the projects. Named so I can argue with him rather than override a tool.

And then Claude reformatted the disk.

The fails

The voice-render loop.

Tuesday to Friday I burned 90 minutes rendering audio for a client deliverable — a 13-slide deck and 12-minute narration walkthrough. Five symptoms, one problem underneath: the credential loader was checking the environment variable before the canonical credentials file, the pre-flight probe was hitting the wrong endpoint, a key got pasted without its prefix and I didn't flag it, and the voice-name lookup was doing exact-match against a name format the vendor had quietly changed. What came out: the failure ledger. Every wasted minute logged, weekly digest fires Sunday and I read it. Plus the wins ledger as the other half of the loop. Pay the tax once.

The Plex URL I handed over without testing.

Cloudflare logged "Added CNAME." I said open it on your phone. It didn't work. I'd had a sandbox curl available the whole time. What came out: rule 0. If you can test, test. If you can do, do. Paid off three days later when the toolbox page shipped clean — Vercel-MCP-verified before I claimed done.

Six hours on a logo-design skill that didn't need to exist.

I tried to build a bespoke skill for one of my own consultancy offers. Six hours in I'd produced nothing better than just invoking the canvas-design and brand-guidelines skills that already exist. Rule out of that: a skill earns its keep at twenty invocations. Otherwise it's prose.

Sunday morning — the disk error.

The fuck-up. Despite the team in place, Claude ran a destructive format on a live drive in my media server. Convinced it was the new disk I'd just ordered when it was actually a member of the existing array.

What was at risk. 8.3 TiB of curated media library. Recoverable via professional data recovery for $1,500–$5,000 over one to four weeks; the format only wipes a header at the start of the disk and the underlying data is still on the platters. Trust costs more than the data.

What I'm doing so it doesn't happen again. Every destructive operation now requires its own message and explicit point-at-the-target confirmation. "Let's plan" is not "execute." No exceptions, no bundling with routine work.

What shipped

Thursday — a Greek-HNW property funnel, live in 30 minutes flat.

Next.js, bilingual English and Greek. The clients needed a live page to point at. Thirty minutes from "we should make one" to a deployed URL with a working contact form. Old me: two days.

Friday — the BossCouple pivot.

Sat down with two real-estate partners at one of the global brokerages, and the engagement reframed itself in real time. The brokerage isn't the client; their joint brand is. The umbrella covers Real Estate, Business Set-up, Golden Visa, Investment, Fund Management. Forty years in the UAE is the credibility line. Three workflows locked in the room: a parent brand site every WhatsApp points at; a lead-flow tool that mirrors brokerage-sourced leads into the partners' own database with WhatsApp automation and call-outcome capture; and a resurrection workflow — ten years and roughly two thousand WhatsApp contacts extracted via iMazing, summarised by Claude, segmented hot/warm/cold/dead/referrer, routed to comms programmes with approval gates. Case-study rights granted at the table. The flagship pilot.

The website refactor.

Four new public pages — /clients, /projects, /knowledge-base, /blog (the inaugural post was my Hype Radar framework, a P/Q/S scoring rubric for every new AI launch). Plus a public /toolbox page listing the whole stack, and a public /status page so anyone — me included — can see at a glance what's moving across every project.

The media server got a consumer face.

The engineering project finally has a consumer-facing brand. The onboarding page used to have a regex-based keyword chatbot matching seventeen intents that crumbled the moment a user phrased anything sideways. Ripped out, replaced with a real LLM — a Cloudflare Worker calling Claude Haiku, with the system prompt, install knowledge, and FAQ inlined, talking to the page over CORS. Onboarding videos pivoted to fully synthetic in parallel — no presenter on camera, library voices over generated cinematic clips and screen recordings. About $5 of marginal cost per video.

A PhD-level deep-dive on statins.

In a separate parallel chat the same week. Full demonstration here — 267 pages, footnoted, peer-reviewed by a simulated five-reviewer hospital panel, scored 74 out of 100, no invented citations in the sample. Medical is the use case; serious independent research as a routine output, not a project — that's the principle.

Smaller wins.

A sub-let room re-let at AED 100/day. 68 files of client content migrated overnight by a scheduled task while I slept.

The team, named

SuperSebastian — Ops, Research, new-idea evaluation.

Senior operator. Knows the whole stack — what's running, what's costing me money, which of the new launches in the AI world is worth a week of my time. Hard rule: checks the failure log before recommending anything. Runs the radar on new launches and benchmarks me against other operators in this space. He doesn't write code, doesn't manage projects, doesn't draft copy. He sits one layer above all of them and tells me what's wrong with the substrate.

MotherMary — Project Manager.

Walks every project, every session. Surfaces imbalance when I'm overweighting client work and starving my own business. Maintains the public status page. Hard rule: the first-move sentence is binding — specific action, named file or stakeholder, time-boxed. Vague is forbidden. Four projects in parallel doesn't work without her.

StarTrekScotty — Head of Engineering.

Owns the repos and the build gates. Every project that touches code under his watch — every change a checkpoint, every diff a question. When work gets architectural, he brings in Codex as the Senior Engineer for an adversarial review. Owns the canon → manifest → hashed → preflight → deploy pattern that turned my worst week of code into a working method. He's also the only persona allowed to write engineering fails to the ledger.

Together they're the layer underneath. A process I can rely on. Refactor, ship, never pay the same tax twice.

Outside Claude

Three three-hour calls with friends working through real problems. Three new ideas came out of the week.

A relationship counsellor built on Gottman Theory.

AI-mediated, accessible, always available. Not a dating app. For couples already together who need help staying that way.

Self-hosting a local LLM on the mini.

Privacy. Data-residency — the same posture UAE enterprise IT will probably enforce on the next deployment we ship. At some point the marginal call to a hosted API costs more than the kilowatt-hour on a box I already own.

A 10-POC commercial-introducer pipeline.

Ten proof-of-concept deliveries for ten UAE small-to-mid-size business owners. Each yields a testimonial. The testimonials build a toolbox. A commercial introducer with the toolbox gets paid to open doors to owners. Articulate's Year-1 go-to-market motor, in one sentence. Now the job is to build it.

Next week

The fails became next week's rules. The team got built. The disk got reformatted anyway. The full ledger of what went wrong landed a week later in I'm a lying bastard — a class-1 destructive fail per week is not noise.

For context on where most companies actually sit with AI versus this kind of operator pace, McKinsey's State of AI is the conservative benchmark — adoption is wide, but rewiring around it is rare.

If this is the kind of operator you want running your marketing

The Marketing Engine Pilot installs six workflows that close the gap between your ad click and your closed deal. Eight weeks. On your stack. Yours to keep.

Talk to Anthony →
Week: 12–18 May 2026
Output: 118 commits across 4 repos · 4 projects shipped · 1 SEV-1 disk error logged and mitigated · 3 named AI personas instantiated
Team: SuperSebastian (Ops/Research) · MotherMary (PM) · StarTrekScotty (Engineering) · Codex (Senior Engineer review) · Claude (build)