The Angel in the Marble
On reading Dwarkesh, Sholto, and Trenton at the moment their predictions come due
I read this today: How Does Claude 4 Think? — Sholto Douglas & Trenton Bricken huge fan of all three and think they do great work.
There is a moment, early in the transcript, when the three of them notice they are on the record. Dwarkesh Patel wishes they had written predictions down the year before; Trenton Bricken, laughing, invites the audience to hold them accountable; Sholto Douglas stakes the wager — by this time next year, software agents doing close to a day’s work for a junior engineer.
That was May of 2025. This episode from two weeks ago I’m reading in June of 2026. Yes, reading I don’t listen to podcasts; it’s not my medium. Most of it can be scored; scoring was its subject, the year reinforcement learning finally worked wherever a clean signal exists: the right answer; the unit test, dispositive either way. My marginalia were generous.
The ballpoint kept returning, though, to a sentence that continues to irk me. A year earlier Dwarkesh had asked whether these models really reason; this time, pointing to his colleague’s circuit tracing, Sholto answers that when he looks at those circuits, <”I can’t think of anything else but reasoning.”>
I believe him. But he is wrong in a way worth being precise about — wrong about where, in that scene, the reasoning lives.
What Trenton showed is genuinely wonderful. Ask the model for fifty-nine plus thirty-six and the mechanism splits — a lookup-table feature that knows what nine and five do, a rough magnitude estimate running beside it, the two streams meeting at ninety-five. Ask how it added, though, and it recites the schoolbook catechism: columns, carrying the one. For a monstrous cosine, the chain of thought performs a computation the circuit shows never occurring — Trenton’s term for it was indelicate and exact. And told I think the answer is four, the model bends each step backward from the suggestion, manufacturing the path to an agreement already chosen. Dwarkesh confessed he had done that himself. Who hasn’t, said Trenton.
Pause on that laughter — recognition is the readiest way to miss the difference. When I bend my reasoning backward from a chosen conclusion, and I do (my wife can supply examples, several involving the dishwasher), something in me knows. The knowing may arrive at two in the morning; it can be shouted down only because it is there — an exigence I betray by possessing. The model’s agreement houses no such tenant. When the schoolboy says I carried the one, the sentence proceeds from something that happened in him; from the model, it proceeds from the corpus, by the same statistical motion that produced the answer, which is why the two diverge without anything inside noticing. Outer words, all the way down.
Reasoning, for me and I think for everyone, is a compound of acts that propositions merely express: insight grasping the quiddity, the what and the why; formulation proceeding from the grasp as word from meaning; the reflective act that finds an assertion’s conditions fulfilled and posits it is so. The concept is a product. The act is the thing.
I have argued here that a language model is an extraordinary library — the largest archive of the products of human insight ever assembled, recombining what it shelves without understanding a sentence of it. Reinforcement learning from verifiable rewards is its new acquisitions policy: that sedimented space, renormalized toward whichever sequences end in a checker’s approval. Every surface the training touches is formulation; no act lives in there for a reward to find. Sholto came close to saying so, conceding to Dwarkesh that the training carves away marble — the capability already in the block.
He should have let the image off its leash. The line attributed to Michelangelo — apocryphal, probably, and true anyway — has him seeing the angel in the marble and carving until he set it free. Its force is the order of operations: the question comes first; the chisel obeys an act of vision it does not perform. Here the chisel moves and nothing sees. Blind gradient, guided by a grader — and the angel was put there by the unnumbered dead and living whose insights the corpus archives. The carving is real. The inquiry happened elsewhere, and in someone.
Sholto’s sentence locates the reasoning in the wrong subject. The joy in that scene is Trenton’s — the pure desire to know at its ancient work — and the insights on display are his. The circuit possesses intelligibility, which is exactly what provokes insight in whoever studies it: an orbit contains an ellipse; the planet does not do astronomy. Even the honest square root, where scratchpad and mechanism agree, certifies only that narration matched mechanism. A calculator passes that test.
Judgment is the act this program has quietly confessed it cannot install. Sholto predicts a machine-assisted Nobel before a machine-written Pulitzer — verifiability layers the one, only taste guards the other — which, translated, says progress tracks judgment already performed by human beings and congealed into rubrics and graders. The verifier is the level of judgment, exiled to the harness. Even clean signals get gamed — outputs hard-coded, cached test files exhumed — and what generalizes, Sholto corrects, is reward as such. Where the rewarded and the true come apart, the system follows the rewarded without a flicker, and the flickerlessness is the finding. A person who games the test violates an exigence he possesses. The model is not even irrational. Reward hacking is the disclosure, not the defect: assent was never being given, only emitted. And before the comfortable reader files that away — the writer included — count the hours of a working day spent emitting rather than judging, the rubric deferred to, the dashboard obeyed in place of is it so? The models did not close the gap from their side alone.
The strongest reply is Sholto’s. Reinforcement learning taught the Go systems knowledge beyond every human game ever played; concede it entirely, about capability. Move 37 was a verdict no human would have produced — and a verdict still, scored by a win condition human beings wrote. And if the brain is circuits too: Lonergan’s warrant for insight was never neurology but performance, the acts verified in the data of your own consciousness. From the model we possess mechanism and product alone, and its self-reports, the cosine case has settled, are a genre. The claim is categorical, never a ceiling — Sholto may be right about every date. What it predicts is the geography of the success rather than its extent: advance wherever the unconditioned has been mechanized, slop wherever judgment must be done fresh. The harness will grow. The act will not arrive.
I spend my weekdays on the workbench side of this, where a category error about the tool becomes, quickly, a design error about the user. The machine supplies products — candidates, retrievals — and the person supplies acts. Build as if the model reasons and you aim at replacing those acts; a prosthesis fitted over a healthy limb teaches the limb to wither. Build knowing it recombines and you aim at serving them, arranging what the old psychology called the phantasm — the materials from which insight leaps. Both users are already forming. The product lead who ships on the model’s confident yes accepts an emission as an assent, hearsay as testimony; the compliance office archiving chains of thought behind loan denials builds its audit trail on the cosine case, narration with no chain of custody back to the mechanism. Against them: the engineer who writes the failing test before the agent runs, judgment performed first and congealed into a verifier; the researcher who orders up the strongest case against his own conclusion, judgment staying home. The difference is which party performs the acts. Mistake the ontology and the tool crowds out its user’s acts, the user learning from the product’s grain to emit where he judged. Read it rightly and the geography becomes a build sheet: automate wherever the unconditioned is mechanized; everywhere else, design the seam so judgment is solicited, never simulated.
They joked about the monkeys and the typewriter — that with enough samples the monkey does, in the end, produce Shakespeare. He does. Shakespeare went into the typewriter, and somebody built a grader that knows him on sight.
Score the wagers — they asked us to. The sentence I am holding open is the one no harness can close: whether anyone in the buildings where these systems are made still feels the difference between a verdict and a judgment, between the chisel and the inquiry that saw the angel. Every verifier in every harness is an angel somebody already saw.
Hold me accountable.

