The Theory of Mind We’ve Been Missing
Answering Satya Nadella’s Question About AI and Human Cognition
Satya Nadella posed the right question. In his year-end reflection, he called for “a new concept that evolves ‘bicycles for the mind’ such that we always think of AI as a scaffolding for human potential vs a substitute.” He wants us to “develop a new equilibrium in terms of our ‘theory of the mind’ that accounts for humans being equipped with these new cognitive amplifier tools as we relate to each other.”
This is precisely the question. And we have an answer—one developed decades before large language models existed, waiting for exactly this application.
The Canadian philosopher Bernard Lonergan spent his career mapping how human knowing actually works. Not knowing as philosophers imagined it should work, but knowing as it operates when you solve a problem, grasp a concept, verify a hunch, decide what to do. His cognitional theory provides the “theory of mind” Nadella seeks: a precise account of distinct mental operations that clarifies which can be amplified by tools and which cannot be replaced.
Why Bicycles for the Mind Works—And Where It Fails
Steve Jobs drew on a 1973 Scientific American study: humans on bicycles consume roughly one-fifth the calories per kilometer of unaided walking. The bicycle doesn’t replace legs. It amplifies what legs already do. Jobs saw computers the same way—tools that extend human cognitive capacity without substituting for it.
The metaphor has proven remarkably durable. It captures something true: the best tools enhance rather than replace. But it lacks the precision the AI moment demands. A bicycle amplifies locomotion. What, exactly, does AI amplify? “Cognition” is too vague. We need to know which cognitive operations are at stake.
The slop-versus-sophistication debate reveals the cost of this imprecision. Critics decry AI-generated content as low-quality filler. Defenders counter that outputs are increasingly indistinguishable from human work. Both sides argue about the product while ignoring the process. Both assume AI outputs and human knowing are commensurable—differing in quality, not kind.
They aren’t. And until we have vocabulary for the difference, we’ll keep arguing past each other.
What Knowing Actually Requires
Here’s what Lonergan mapped: knowing unfolds through four distinct operations, each irreducible to the others.
First, experience. We attend to data—sensing, perceiving, imagining, remembering. This is the intake function: gathering the raw material that knowing requires. Be attentive, the imperative runs. Notice what’s there.
Second, understanding. We inquire into the data, and sometimes insight strikes. Not more data, but the grasp of pattern, relationship, intelligibility. The shift from “these symptoms keep appearing together” to “oh, that’s why they cluster”—that click of coherence. Be intelligent: pursue the insight, don’t settle for accumulation.
Third, judgment. Understanding proposes; judgment disposes. We reflect on whether our insight is actually correct, marshal evidence, weigh considerations, and reach a verdict: yes, this is so; no, that isn’t; maybe, I need more. The question shifts from “what might this mean?” to “is it true?” Be reasonable: don’t affirm beyond what evidence warrants, but do affirm when it does.
Fourth, decision. Knowing what’s true opens onto knowing what to do. We deliberate, evaluate options against values, and choose. Be responsible: act on what you’ve come to know.
The four levels aren’t a sequence you complete once and leave behind. They spiral and recur. New decisions generate new data requiring fresh understanding and judgment. But they remain distinct operations—categorically different acts of consciousness, not points on a continuum.
Skip a level and knowing collapses. Data without insight is noise. Insight without judgment is speculation. Judgment without decision is sterile. Each operation depends on those before it; none substitutes for those after.
Where AI Lives in This Structure
Large language models operate powerfully at level one. They gather, process, and surface data at scales impossible for unaided humans. They extend memory, accelerate search, aggregate information across sources. As experience-amplifiers, they are genuinely transformative.
They also assist level two—but here precision matters. Models generate candidate understandings: possible patterns, hypotheses, framings. They surface what might be intelligible. But they do not achieve insight. The subjective click of coherence—grasping why something is so—requires a consciousness that wonders and is satisfied. Models process; they don’t wonder.
This isn’t a limitation to be overcome through scaling. It’s a structural feature. Insight is the act of a subject grasping intelligibility. Models are not subjects; they execute operations. The outputs can prompt human insight, scaffold it, even simulate its products. But the operation itself remains unavailable.
Level three is where the distinction sharpens to a bright line. Judgment requires grasping that conditions for truth are fulfilled—what Lonergan called reaching a “virtually unconditioned.” Not absolute certainty, but warranted affirmation: recognizing that the evidence suffices for this conclusion. This is a reflective act, a consciousness evaluating its own understanding against criteria for correctness.
Models generate outputs with confidence scores. This is not judgment. Confidence scores measure statistical properties of outputs relative to training distributions. Judgment evaluates whether an understanding accurately grasps reality. The first is computation; the second is reflection. No amount of scaling bridges the gap because they are different kinds of operations.
Level four remains equally beyond reach. Decision involves evaluating options against values held by a valuing subject, and committing to action. Models have no values—they have objective functions set by designers. They make no commitments—they generate outputs. The language of AI “deciding” is metaphorical in a way that obscures the relevant difference.
The Evolved Metaphor
Nadella asked for an evolved concept. Here it is:
AI is a bicycle for experience, a workshop for understanding, and a null for judgment and decision.
The bicycle extends what the rider already does. AI extends our capacity to attend—gathering more data, processing it faster, surfacing patterns we’d miss. This is genuine amplification. The human remains the rider; the capability expands.
The workshop provides tools and materials but doesn’t build the thing. AI offers candidate understandings—hypotheses to consider, framings to try, connections to explore. The human must still achieve the insight, grasp why the pattern holds, feel the click of intelligibility. Workshop tools make building easier; they don’t substitute for the builder’s skill.
The null marks what cannot be extended because there’s nothing mechanical to extend. Judgment is not slow computation that faster computation could replace. It’s a different kind of act. Decision is not inefficient value-weighting that optimization could improve. It’s the commitment of a subject who has values to weigh. You cannot bicycle what isn’t locomotion.
This metaphor does work. It tells designers where amplification helps and where it harms. It tells users what to expect and what remains their responsibility. It dissolves the slop-versus-sophistication debate by redirecting attention from output quality to operational integrity.
Dissolving the Debate
“Slop” names AI outputs that substitute for human operations without the human noticing. “Sophistication” names outputs good enough that substitution seems justified. Both concepts assume the same flawed premise: AI outputs and human knowing differ in quality, not kind.
Reframe with the four-level structure and the debate dissolves. The question isn’t whether outputs are good or bad. The question is which operations produced them and which operations they require.
A model-generated summary of meeting notes (level one amplification: data compression) awaiting human judgment about accuracy and human decision about action—that’s scaffolding working correctly. The same summary treated as already judged, forwarded without reflection—that’s operational collapse. Output quality doesn’t distinguish these cases. The human’s cognitional operations do.
Sophisticated outputs may actually pose greater danger than obvious slop. Slop signals its own inadequacy; it prompts human judgment almost automatically. Sophisticated outputs pass unexamined. Their very quality bypasses the verification they require.
Product design implication: interfaces should make operational status visible. Not just what the AI produced, but what the human still needs to do. Here is gathered data. Here are candidate interpretations. Your judgment required. Your decision pending. The scaffolding must reveal its own edges.
Things and Their Knowing
Lonergan pressed the question further. What do we actually know when knowing works correctly? Not isolated impressions or abstract concepts but things: concrete unities grasped through the full cognitional process.
A thing—this patient, that market, your team—isn’t merely seen. Raw experience gives us manifolds of data, not unified objects. Understanding grasps the data as belonging together, as expressions of a single intelligible unity. Judgment verifies that this unity actually exists and operates as understood. The thing known is the achievement of the complete process, not the input to its first stage.
AI systems don’t know things in this sense. They process representations—tokens, embeddings, weights. Sophisticated representations, perhaps. But representations that never unify into grasped things because the grasping subject is absent. When we treat model outputs as knowledge of things, we import an achievement the model cannot reach.
The practical stakes are significant. A model can surface everything in your CRM about a customer. It can cluster behaviors, predict churn probability, suggest interventions. But it does not know the customer as a concrete unity—a this-person with that-history making those-decisions for these-reasons. The human account manager might. The knowledge differs in kind, not merely degree.
Product design built on this insight would preserve and support thing-knowledge rather than drowning it in data. More information about the customer is not always better; sometimes it substitutes representation for knowing. The well-designed tool surfaces what helps the human achieve unified understanding, then gets out of the way.
The New Equilibrium
Nadella asked how we should relate to each other “as humans equipped with these new cognitive amplifier tools.” The four-level structure generates an answer.
Authentic human relating requires all four operations. I attend to you—your words, expressions, situation. I understand your meaning—not just decoding symbols but grasping what you’re getting at. I judge whether my understanding is accurate—testing interpretations, checking assumptions. I decide how to respond—weighing what matters, committing to action.
AI can assist levels one and two in this process. Transcription captures what you said; summary compresses it; translation bridges languages; context-retrieval surfaces relevant history. These are genuine assists. They let me attend and understand better than I otherwise could.
But if I outsource levels three and four, I’m no longer relating to you. I’m processing data about you. The model drafts a response; I send it unexamined; you receive words shaped by statistical patterns rather than by my judgment about what’s true and my decision about what matters. We’re both diminished. The interaction looks like communication but lacks its substance.
The new equilibrium isn’t primarily technological. It’s normative. We need shared expectations that AI-assisted communication still involves human judgment and decision. Cultural standards that hold people responsible for having actually understood and verified what they transmit. Social practices that treat operational collapse as a failure mode, not an efficiency gain.
This equilibrium requires vocabulary—which is why the theory of mind matters. Without concepts for the distinct operations, we can’t articulate what’s been lost when they’re skipped. With them, we can name the failure and demand better. Design Principles for Authentic Scaffolding
Theory becomes practical through design. Here are principles the four-level structure generates:
Make operational boundaries visible. Current interfaces present AI outputs as seamless knowledge. They should instead make explicit where AI operation ends and human operation must begin. The visual grammar matters: data and candidates look different from conclusions; pending-judgment looks different from verified-and-ready.
Preserve cognitive friction where it builds capacity. Friction isn’t always inefficiency. Sometimes it’s exercise. The e-bike delivers you to the destination without the exertion that builds cycling capability. AI that removes all cognitive friction may deliver outputs while atrophying the human operations that produce real knowing. Design for scaffolding—temporary support while capacity develops—not substitution.
Support judgment rather than simulating it. Models can marshal evidence, surface considerations, flag inconsistencies. These genuinely assist reflective judgment. Models cannot determine that evidence suffices for affirmation; that’s the human’s act. Design should present what judgment requires without presenting conclusions that presume judgment complete.
Enable thing-knowledge. Data accumulation can obscure rather than reveal concrete unities. The well-designed tool helps the user achieve integrated understanding of the thing in question—this patient, that project, your situation—rather than burying insight under representations.
Maintain transparency about operations. Humans use tools better when they understand how the tools work. AI systems should be as interpretable as current technology allows—not primarily for auditing, but because understanding your instruments is part of using them well.
Design for the spiral. Knowing isn’t linear; it recurs. New decisions generate new situations requiring fresh attention and understanding. Interfaces should support this recursion rather than treating each query as isolated. Memory and context serve not just efficiency but the continuity of genuine inquiry.
What This Means for Builders
The AI industry optimizes for capability. Benchmarks measure what models can produce. Investment follows performance on defined tasks. The implicit question: how do we make AI more powerful?
The four-level structure suggests a different optimization target. Not capability but support for authentic knowing. Not what models produce but whether humans using models become more attentive, more intelligent, more reasonable, more responsible.
These measures don’t currently exist. Creating them is itself a design challenge—and a business opportunity. Companies that figure out how to measure and improve human knowing-with-AI will build something more valuable than another percentage point on benchmarks.
The theory of mind Nadella requested provides the foundation. It specifies what human knowing requires. It identifies which requirements tools can assist and which they cannot replace. It generates design principles distinguishing scaffolding from substitution. What remains is building to the theory.
Conclusion: Amplification and Its Limits
Steve Jobs saw the bicycle amplify human locomotion without replacing human legs. The computer, he proposed, could do the same for human minds. He was right—but the metaphor needed the precision it now has.
Knowing operates through experience, understanding, judgment, and decision. AI amplifies the first, assists the second, and cannot perform the third or fourth. This isn’t pessimism about AI capability; it’s clarity about AI kind. Statistical pattern-matching over representations differs structurally from a conscious subject grasping intelligible unities and verifying truth. The difference isn’t degree; it’s category.
Scaffolding respects this difference. It extends what can be extended and preserves space for what cannot be replaced. The new equilibrium Nadella seeks emerges when builders design for scaffolding and users expect it—when we share vocabulary for the operations at stake and hold each other accountable for performing them.
We’ve been arguing about slop and sophistication, about job displacement and productivity gains, about alignment and misalignment. These debates matter. But beneath them lies a simpler question: what does knowing require, and how should tools relate to it?
The application is ours to make.
Taylor Black writes about AI, human flourishing, and the Catholic intellectual tradition. He serves as head of AI & venture ecosystems in Microsoft’s Office of the CTO and is Founding Director of the Leonum Institute for AI & Emerging Technologies at Catholic University of America.

