“You cannot answer a question that you cannot ask, and you cannot ask a question that you have no words for.”
— Pearl

tl;dr
The difference between a productive agentic coding session and a catastrophically maladjusted one is not the model, and categorically not the phase of the moon—it is the statistical structure of what goes into the context window. Which is to say, the agent is a conditional distribution and you are part of that conditioning. You look like you know what you are doing, but the thing you are actually doing is curating a conditional distribution’s evidence stream, and that bears no structural resemblance to the thing you think you are doing, which is writing software. The hard-to-swallow pill is that, while the competence is real, it is the wrong competence, because you are competent at a thing that no longer exists, or won’t for much longer. I am trying to make that feel obvious and to disabuse you of the comfortable illusion that you are still a programmer issuing imperious commands to the silicon golem, rather than an experimentalist cherry-picking the right evidence from a bewilderingly noisy universe of possible evidence. In that regard, you are closer to a painstaking, obsessively attentive digital horse whisperer than to anything resembling a traditional software engineer.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%@*+++++++++++++++++++++++++++++++++++++++++++%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@@@%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@@@%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@@@%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*======#%%%+=====================#%%%*======%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*======#%%%+=====================#%%%*======%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*======#%%%+=====================#%%%*======%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*======%%@@+====================+#%%%*======%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@#==========================================================*%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@#==========================================================*%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@#==========================================================*%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@#==========================================================*%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@%#######+===========================================*#######%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%@@%%*===#%%@+===%%@%@@@@@@@%@%===+%@@%===+%@%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*==+#%%%+==+%%%%%%%%%%%%%%+==+%%%%===+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*==+#%%%+==+%%%%%%%%%%%%%%===+%%%%===+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*==+#%%%+==+%%%%%%%%%%%%%%===+%%%%===+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

I had recently been invited to tip my hand as a speaker, to engage in an act of “accidental” self-disclosure in front of an audience that I had no reason to expect would internalize anything from what I had to say—and who did not, it should be noted, disappoint on that front. I was given 10 minutes to show and tell about how I use Claude Code, then lead a table discussion with whoever wanted to sit down and talk about it. And what ended up happening, with a kind of inevitability that in retrospect feels almost cosmological, is that I entered the room as the person with something to teach and left it as the only person who had learned anything. The pedagogical term for this, I believe, is a complete failure.

Let me try again.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%@*+++++++++++++++++++++++++++++++++++++++++++%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@@@%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@@@%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@@@%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*======#%%%+=====================#%%%*======%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*======#%%%+=====================#%%%*======%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*======#%%%+=====================#%%%*======%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*======%%@@+====================+#%%%*======%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@#==========================================================*%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@#==========================================================*%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@#==========================================================*%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@#==========================================================*%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@%#######+===========================================*#######%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%@@%%*===#%%@+===%%@%@@@@@@@%@%===+%@@%===+%@%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*==+#%%%+==+%%%%%%%%%%%%%%+==+%%%%===+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*==+#%%%+==+%%%%%%%%%%%%%%===+%%%%===+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*==+#%%%+==+%%%%%%%%%%%%%%===+%%%%===+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Surprise to none knowing me in any depth, I have a predilection (a tropism even) for thinking statistically. Grounding—not “anchoring,” a word which I fought for and lost, because Kahneman has permanently claimed it for the pathology of not updating—is what I am referring to as the epistemic process of making the present information and the past information cohere in a way that allows an educated guess about the nature of what is actually happening.

Every present interaction becomes a past interaction, and every potential interaction becomes a present interaction. This is a temporal truism providing fertile ground for a particular kind of statistical thinking—Bayesianism.

Experience is a conveyor belt.

And maybe not unlike you, an agent trapped in a box framed by your terminal, is embedded in a sequence of interactions in which the constant conversion of future into present into past is performing something structurally Bayesian (whether or not it is running Bayes’ theorem under the hood, we will see).

I will make the charitable assumption that you all can read that. It happens to be the case that the structure of natural language is massively redundant; the statistical structure of English in particular is so rich you can recover meaning even from heavily corrupted text. Let us ask Claude Code what does it make of this:

/clear
Aoccdrnig to rscheearch at Cmabirgde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a ttoal mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Amzanig huh?
Yep, perfectly readable. The passage works because English carries far more statistical structure than needed for comprehension; your brain is doing something very Bayesian — using the first/last letter anchors plus word-shape priors to reconstruct meaning from noisy input.
but you have no brain like i do and was still able to decode it -- why?
Because the same redundancy that lets your brain recover meaning lets my learned distributions do it too. I've never seen those exact scrambled words before, but I've seen enough English that the statistical prior is overwhelming — given the first letter, last letter, word length, and surrounding context, there's really only one plausible word each token could be.

Claude probably saw those exact scrambled words before, but that is not the point. My brain and Claude’s “brain” are pattern-matching against a lifetime (in my case) or a training corpus (in Claude’s case) of learned distributions over word sequences.

What constitutes a language is not an eternal ideal form, represented by the settings of a small number of parameters, but rather is the contingent outcome of complex processes. Since they are contingent, it seems they can only be analyzed with probabilistic models. Since people have to continually understand the uncertain, ambiguous, noisy speech of others, it seems they must be using something like probabilistic reasoning.
On Chomsky and the Two Cultures of Statistical Learning

An agent is an LLM, and an LLM is a transformer, and a transformer is a statistical model. The weights of a transformer are storing the knobs of an obscenely large probability distribution over text. So, when Claude Code is doing its thing, nails working code for me like a well-paid grunt, it is not because it “understands” my codebase in some deep, human-like way, but because code, like that scrambled paragraph, has enormous statistical regularity the model has internalized.

Prompting is an attempt at tickling the right neurons with the chosen evidence you are conditioning on.

Let me say this more carefully.

At inference time, when the model is generating output, it is sampling from a distribution, more specifically, this one:

\[P(\text{next token} | \text{everything that came before})\]

Everything that came before.

Your *.md files, dubious task descriptions that probably not even you would understand upon a second reading, the files you loaded (including the ones you do not need), the model’s own previous “You’re absolutely right”s vowing to fix the same error message you have been getting for the past three hours, your frustration and despair as one-liners; all of it. Every single token in the context window is to the right of that vertical bar and the model’s output is a function of that evidence.

Said without poetry:

(There is obviously a capability bar I am willingly ignoring here, that sets a hard cap on what the model can achieve with the provided evidence, because from what I come across, there is a breed of prompt engineering folks out there with course covers that look like a cornered, molested Midjourney was forced at gunpoint to vomit cyberpunk rainbows, who are setting a lower bar than the model’s actual capabilities by being not just flagrantly fake but dangerously so.)

Incidentally, /clear is, in an expanding brain meme sort of way, statistical intervention. When I clear the context window, I choose to discard all the evidence I have been accumulating, allowing the prior—the weights of the model—to operate on a blank slate. In most cases, you are decontaminating the model from the evidence of your previous failures. The people who are good at this clear context with the ruthlessness of a surgeon debriding a wound, and the people who are bad at this let forty turns of accumulated confusion sit in the context window like a festering abscess, wondering why the model went to being a babbling dingleberry.

It is (still) our duty as humans in the loop to be aware of our complicity, because every time we let a bad turn stand, we engage in co-authored hallucination. Learning to notice when you have wandered off the distribution is a skill worth cultivating before the session teaches you the hard way.

So what does it mean to use Claude Code “well”?

In my experience, it means I am less and less of a programmer, and more and more of an experimentalist, running live experiments in the context window, carefully curating the evidence collaboratively, in real time, with the model. My every session starts with a hypothesis about what the model can do provided with the evidence I am about to give it. That is more or less my experimental design. If the results are not what I expected, I redesign the experiment—load different files, restate the constraints, maybe find a worked example, and try again, every time learning something new about the statistical structure of the problem I am trying to solve.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%@*+++++++++++++++++++++++++++++++++++++++++++%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@@@%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@@@%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@@@%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*======#%%%+=====================#%%%*======%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*======#%%%+=====================#%%%*======%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*======#%%%+=====================#%%%*======%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*======%%@@+====================+#%%%*======%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@#==========================================================*%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@#==========================================================*%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@#==========================================================*%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@#==========================================================*%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@%#######+===========================================*#######%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%@@%%*===#%%@+===%%@%@@@@@@@%@%===+%@@%===+%@%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*==+#%%%+==+%%%%%%%%%%%%%%+==+%%%%===+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*==+#%%%+==+%%%%%%%%%%%%%%===+%%%%===+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*==+#%%%+==+%%%%%%%%%%%%%%===+%%%%===+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Everything you read above is rhetoric. Here, I will seduce you with the cold, hard, unromantic truth of the matter. So, let me dryly state the undertow: the claim that next-token prediction is “structurally Bayesian” is not torturing a figure of speech. What follows is a condensed tour of the results that underwrite the argument, roughly in order of the logical dependencies, and with enough pointers that the motivated reader can go and verify I am not just making any of this up.

The foundational observation is almost disappointingly vapid: cross-entropy loss corresponds to the negative log-likelihood, so minimizing it is equivalent to maximizing the likelihood. Because maximizing the posterior distribution over model parameters is mathematically identical to minimizing this cross-entropy loss (alongside standard weight regularization), the standard training objective of an LLM directly mirrors Bayesian posterior inference—but only in expectation, not in realization. This “Bayesian in expectation, not in realization” framing is the most technically honest summary of the relationship. The expectation is taken over token orderings; any particular ordering may deviate, but on average the transformer’s predictive behavior converges to the Bayesian-optimal predictor—architecture can do it, but in practice it falls short (what is particularly worth noting in that last paper is that the KL-divergence between the model’s predictive distribution and the Bayesian oracle is not just nonzero but exhibits a characteristic “bump” at short context lengths, which may reflect the model’s fixed compute-per-token bottleneck in high-ambiguity regimes).

Moving on, the result that carried the most weight for me in precipitating the Bayesian reading of transformers was that in-context learning can be reasoned about as the model maintaining an implicit posterior over latent data-generating concepts. (SymbolicAI is, in a sense, the engineering consequence of taking that claim seriously—and of Chollet’s observation that prompt engineering is the process of searching through program space to find the program that empirically seems to perform best on your target task. Chollet’s insight is probably the closest to what I am arguing here, with the caveat that his “fuzzy matching lookup table of programs” framing is one I have never been able to take seriously, for reasons that are hopefully now obvious.). That implies the forward pass narrows this implicit posterior as context gets filled, privileging one hypothesis over another based on how well it explains the observed sequence. Then, the output distribution neatly corresponds to the posterior predictive distribution. I argue—and I am not alone in this—that coherent decision-making requires Bayesian principles at the level of the agentic system, not necessarily the LLM agent parameters.

Findings also show that positional encodings emerge naturally as a prior distribution over token positions. But classical Bayesian inference assumes exchangeability—the joint distribution is invariant under permutation—and positional encodings are precisely what breaks it (strictly, De Finetti exchangeability is broken). Ironically, the thing that makes the model useful is the same thing that makes it theoretically impure.

Lastly, there is evidence that transformers can do Bayesian inference and RLHF—that thing that teaches the model to want what we want—is equivalent to variational inference: approximating a Bayesian posterior which specifies how to update a prior LM to conform with evidence provided by the reward function.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%@*+++++++++++++++++++++++++++++++++++++++++++%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@@@%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@@@%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@@@%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*======#%%%+=====================#%%%*======%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*======#%%%+=====================#%%%*======%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*======#%%%+=====================#%%%*======%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*======%%@@+====================+#%%%*======%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@#==========================================================*%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@#==========================================================*%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@#==========================================================*%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@#==========================================================*%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%@%#######+===========================================*#######%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%*===========================================%@%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%@@%%*===#%%@+===%%@%@@@@@@@%@%===+%@@%===+%@%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*==+#%%%+==+%%%%%%%%%%%%%%+==+%%%%===+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*==+#%%%+==+%%%%%%%%%%%%%%===+%%%%===+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*==+#%%%+==+%%%%%%%%%%%%%%===+%%%%===+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
claude --allow-dangerously-skip-permissions