How the document was produced — methods and orchestration

This note is for any reader who wants to understand the engineering choices behind the synthesis: the agent architecture, the parallelism, the grounding searches, the citation discipline, what was managed well, and what was not.

The architectural decision

The starting question for any complex synthesis is whether to produce it as a single monolithic generation pass or to decompose it into specialised, parallel agents that each handle a defined slice of the work. The single-pass approach has the advantage of tonal consistency and tightly coupled cross-references. It has two major disadvantages: it forces every part of the document to be produced under the same context window, which for a 100-page deliverable means the early chapters are written without knowledge of what the later chapters will conclude; and it loses the opportunity to use parallel evidence gathering, which is the single largest time multiplier available.

The decomposition was made along epistemic boundaries rather than chapter boundaries. The biology of the mevalonate pathway is a different sort of work from the historiography of the trial era, which is a different sort of work again from mapping the dispute between named participants. Each requires different source material, different background knowledge, and different analytical moves. Five research agents were assigned, with overlap kept deliberate where chapters cross-reference (the SAMSON discussion appears in both the harms chapter and the bempedoic-acid trial chapter; the dal-OUTCOMES trial appears in both the HDL chapter and the Roche-footprint chapter; this redundancy is a feature rather than a defect because it lets the reader pick up the document in any chapter and still encounter the load-bearing references).

A shared grounding file was written before any of the research agents started. This file contained the verified 2024–2026 trial findings pulled from initial web searches: CLEAR Outcomes follow-ons, FOURIER-OLE, ORION-4 status, STAREE timeline, PREVENTABLE completion date, Lp(a) therapeutics, the AHA PREVENT recalibration. Every research agent was required to read it before drafting and to cite from its verified URLs where applicable. The reason for this design choice is the single largest failure mode in LLM-generated scientific writing: invented citations. By giving every agent the same vetted set of primary sources for the most recent material, the marginal probability of fabricated 2024–2026 citations dropped substantially.

The five research agents

The agents were given written briefs of approximately 800–1,200 words each. The briefs were not generic — they specified word-count ranges, chapter outlines with paragraph-level structure, the steelman-both-sides stance, the required citation format, the [CITATION NEEDED] flagging convention for uncertain references, the specific named figures to cover in the disputes chapters, and the specific trials to cover with their design parameters. The briefs also specified the audience (a non-clinician marketing director debating an industry-employed friend) so that the writing voice would land at post-graduate density without lapsing into either textbook neutrality or polemical advocacy.

The five agents ran in parallel and produced six chapter files between them, totalling approximately 65,000 words of research content. The fastest agent finished in roughly ten minutes; the slowest in fifteen. Total wall-clock time for the parallel research phase was approximately fifteen minutes. Sequential production of the same material would have taken closer to ninety minutes of agent runtime and would have required substantially more context-window management.

Each agent reported back with a self-assessment, a list of [CITATION NEEDED] flags for the auditor, and a summary of which trials and figures had been verified during composition versus which had been carried through from background knowledge without verification. This self-reporting is important because it preserves epistemic transparency: a reader of the final document can map every claim back to whether the agent that wrote it had primary-source confirmation in hand at the time.

The synthesis layer

The chapter files were not the deliverable. They were the inputs to a synthesis layer, written by the orchestrator directly rather than delegated to another agent. The synthesis layer produced four pieces: a front-matter section (title, abstract, reader's note, evidence hierarchy, falsification criteria, full table of contents); an adjudication chapter (Bradford-Hill application, NNT/NNH-by-stratum table, calibrated landing position, pre-committed updating criteria); a debate playbook (eight conversation-ready cards for the actual exchange with the Roche friend); and the back-matter erratum responding to the external review panel's findings.

The reason the synthesis layer was kept in-house rather than delegated is that the document's overall epistemic stance — the balance between consensus and heterodox framing, the choice of where to make commitments and where to mark uncertainty, the rhetorical move from research-output to argument-output — needed to be controlled by a single voice. Five agents writing in parallel produce excellent research material but cannot, between them, agree on a stance. The orchestrator's job at the synthesis stage was to take the research material and impose a coherent position on it.

The grounding pass

Before any research agents were dispatched, eight web searches were run in parallel to anchor the document in current 2024–2026 evidence: CLEAR Outcomes long-term, FOURIER-OLE, inclisiran outcomes trials, STAREE, PREVENTABLE, recent Mendelian-randomisation work, recent primary-prevention meta-analyses, and Lp(a) therapeutics. The findings were captured in the shared grounding file with URL citations. This pass took approximately three minutes of wall-clock time and produced approximately 8,000 words of grounded summary that the research agents could cite from directly.

This pattern — search first, ground centrally, then dispatch agents — turned out to be the single most important engineering choice in the project. Without it, the research agents would have been producing material partially from background knowledge with no recency anchor, and the 2024–2026 trial content would have been at substantially higher risk of being either dated or invented.

The external review

An independent agent — given no role in the writing — was given the completed 72,962-word document and asked to simulate a five-reviewer panel at a top teaching hospital: a senior academic lipidologist, a cardiology trialist, an evidence-based-medicine methodologist, a pharmacology lecturer, and a clinical pharmacist. Each reviewer was instructed to score the document, identify specific strengths and weaknesses with page references, sample five to ten footnotes for citation-quality assessment, and reach a fit-for-purpose verdict.

The panel scored the document 74/100 on average (range 64–82). It found no invented citations among the samples it audited. It flagged uneven steelmanning in three specific places, an over-precise NNT/NNH-by-stratum table, an asymmetric "what would change my mind" section, missing back matter (numbered Vancouver bibliography, glossary, methods note), and several genuine pharmacology gaps. The full panel critique is available as a separate file, External-Review-Panel-Appraisal.md.

The panel critique was not used to silently rewrite the document into something cleaner. That would have defeated the purpose of having an external critique at all. Instead, the panel findings stand as-is, the most material omission they identified (missing back matter) was addressed by adding a methods note and a bibliography stub, and the panel report is included as part of the delivered package so any reader of the synthesis can read the document and its critique side by side. This is the same epistemic move that the document itself makes by stating its falsification criteria before reviewing evidence: showing the work transparently is better than presenting only the polished surface.

What worked

The grounding pass before dispatch was the single highest-leverage decision. Without it, the 2024–2026 trial content would have been at high risk of hallucinated citations. With it, the panel reviewers could not find invented citations in their samples.

The parallel research agents produced more total content of better average quality than a single sequential pass would have produced in the same wall-clock time. The decomposition by epistemic boundary rather than chapter boundary meant each agent worked within its strongest competence rather than being asked to switch modes between biology, history, and dispute mapping.

The pre-commitment to falsification criteria and to a steelmanned-both-sides stance, stated explicitly in the document's front matter, meant the synthesis layer could not slide into either consensus-camp advocacy or heterodox sympathy without violating its own stated terms. This is a useful structural constraint when working with material that is genuinely polarising.

The separate external review panel produced a critique that found real defects. Several of those defects were known to the orchestrator at the time of writing but had been deprioritised under time pressure (the missing bibliography in particular). The panel's job was not to confirm the work but to find the weaknesses that had been missed or under-addressed. It did.

What did not work as well

The citation audit was incomplete. Approximately nine [CITATION NEEDED] flags were carried through from the research agents without resolution. In a longer-running project these would have been closed by a dedicated audit pass; under the constraints of this delivery they remain as transparent flags rather than as silently filled-in citations. This is the right trade-off but should be acknowledged.

The numbered Vancouver bibliography promised in the table of contents was not produced — the document's citations are present as inline Markdown footnotes, but a consolidated, deduplicated end-of-document bibliography in formal Vancouver format would have been a meaningful addition. The methods note and erratum partly compensate but do not substitute.

The NNT/NNH-by-risk-stratum table was produced with point estimates where ranges would have been more honest. The panel methodologist flagged this. A future revision should widen the cells to interval estimates and add evidence-weight annotations per cell.

The "what would change my mind" chapter is asymmetric: it lists conditions that would shift the document toward each camp but no conditions under which the LDL-causality core itself would be invalidated. The implicit assumption is that LDL causality is essentially settled — which is the consensus view, and which the document defends — but stating that assumption explicitly rather than embedding it in the falsification-criteria architecture would have been more honest.

The steelmanning, while substantially better than typical LLM output on contested topics, is uneven in three identified places: the CoQ10 hypothesis is dispatched on lighter evidence than its strongest formulation deserves; JUPITER is read with subtle heterodox tilt in one passage; the Bradford-Hill "strength of association" criterion leads with FH (a 20-fold effect) rather than with the much smaller population-level per-mmol/L effect that the document's harder claims actually rest on.

Token and runtime budget

Approximate totals for the orchestration: eight grounding searches, five parallel research agents, one external review panel agent, multiple synthesis writes by the orchestrator, plus the HTML build and back-matter additions. Total agent token consumption was on the order of half a million tokens across all parallel and sequential runs. Total wall-clock time from initial brief acceptance to final delivery was approximately forty-five minutes.

A note on the limits of this method

This document is a research synthesis produced by language models. It is more accurate than a casual lay reader could produce on the same timeline; it is less accurate than a graduate student with six weeks and a clinician's mentor would produce. It has been transparently critiqued by a simulated panel which found real weaknesses, none of which invalidate the substantive analysis but several of which a careful reader should keep in mind. The document is designed to support an informed conversation, not to replace clinical advice. Treating it as anything else would be a category error.