Human Knowing and Machine Learning: Aquinas, Lonergan, and Statistical Intelligence

A deep comparative study for Thomists, Lonerganians, and AI theorists exploring how human knowing both parallels and diverges from artificial knowing, getting deep into the weeds.

May 27, 2025

How does a human mind grasp truth? How does a machine learn to respond intelligently? These questions drive us into two worlds – the classical philosophy of knowledge and the cutting edge of artificial intelligence. This article will journey through how humans know (in the precise terms of Thomas Aquinas and Bernard Lonergan) and how machines “know” (through large language models, reinforcement learning agents, and neuro-symbolic systems). The comparison reveals striking parallels and fundamental differences. We proceed step by step, sticking to technical explanations and letting the facts speak. No speculation – just a deep look at knowing, human and artificial.

Aquinas and Lonergan on Human Knowing

Aquinas’s Inner and Outer Words. In Aquinas’s account, knowing involves an inner word – a concept or act of understanding formed in the mind – and an outer word – the spoken or written expression of that concept. The verbum mentis (word of the mind, or verbum cordis “word of the heart”) is the interior word that proceeds from our act of understanding. It is produced when the intellect grasps the essence or truth of something. This inner word is then signified externally by verbum vocis, the word of the voice. In short, for Aquinas, every outward word we utter is meant to express a prior inward word conceived by the intellect. The verbum mentis is the concept or judgment in the mind; the verbum vocis is its sensible sign. Aquinas, following Augustine, sees the inner word as a crucial “mediator” in cognition – the mind forms a concept (inner word) as an intelligible expression of what it knows, and that concept can be communicated via language. Human knowing thus inherently has this two-stage structure: first an idea is formed inside, then it may be expressed outside.

Experiencing, Understanding, Judging, Formulating. Bernard Lonergan builds on the Thomist tradition and details the structured process of human knowing. He identifies distinct levels or steps: (1) Experience – the empirical level of data, sensations, images; (2) Insight – the act of understanding, where an intelligible pattern or idea (“inner word”) emerges from the data; (3) Judgment – the rational act of verifying that the insight is true, yielding a yes/no affirmation of fact; and (4) Decision or Formulation – the level of deliberating value or articulating the insight into words or action. We will focus on the first three as the core of cognitional structure, and on the formulation of insight in language as the fruition of knowing.

At the first level, experience, the mind is presented with “data” – not only sensory impressions but also the contents of imagination and memory. Aquinas would call these the phantasms (mental images) that the intellect considers. Lonergan notes that mere experience alone is not yet knowing; it provides the materials for inquiry.

Then comes insight. Insight is the “Aha!” moment – an act of understanding that grasps a form or pattern in the experiential data. For Aquinas, this is the intellect’s illumination of the phantasm to abstract a concept. Lonergan describes it as answering a “question for intelligence” – we wonder What is this? Why does it behave so? and an insight comes as a creative leap. He emphasizes that insight is not automatic; it’s an intellectual leap that often requires curiosity and careful attention. When it comes, a direct insight can condense a jumble of facts into a coherent unity – we see the point. Aquinas calls the concept born of a direct insight a definition (if it grasps the essence of a thing). This is one kind of inner word the mind generates.

Next is judgment, which Lonergan calls a reflective act of understanding. After an insight, we naturally ask a “question for reflection”: Is this really so? Is my understanding correct?. We marshal evidence, check for consistency, and only then responsibly affirm that our insight is true (or decide it was mistaken). Aquinas similarly distinguished the inner word of judgment (“composition or division” – joining subject and predicate in a proposition, affirming or denying) as proceeding from a more critical act of understanding. In Aquinas’s terms, the mind’s first act of understanding yields a concept (“what is it?”), while a second act can join concepts in a statement of truth (“X is Y” or “X is not Y”). Lonergan, in his analysis of Aquinas, put it this way: the intelligere (act of understanding) that produces a definition is a direct insight into a phantasm; the intelligere that produces a judgment is a reflective, critical act looking at all the evidence. The inner word of judgment thus expresses “possessed truth” – a truth that the knower has validated as correct.

Finally, formulation. Once a person has insight and judges it to be true, they can formulate it in explicit terms – in concepts, statements, theories. This corresponds to articulating the inner word as an outer word (writing it down, speaking it, or otherwise encoding it in a communicable form). Formulation is closely intertwined with insight and judgment: we often formulate a hypothesis while having an insight, and refine that formulation as we move to judgment. Lonergan notes that the very formulation of a hypothesis is the expression of an insight, an initial concept that seeks to account for the data. If judgment then affirms the hypothesis, the formulation becomes established knowledge. In Aquinas’s framework, this is when the verbum mentis fully proceeds to the verbum vocis: one speaks the concept or assertion that the mind has validated. The goal, as Aquinas held, is adaequatio rei et intellectus – the alignment of intellect with reality, achieved when the inner word (our concept or proposition) truly corresponds to the thing known. Human knowing is successful when experience has been distilled by insight, checked by judgment, and formulated in a true verbum.

Primary and Secondary Insight. Not all insights are of the same kind. Lonergan differentiates direct insights from reflective insights, which we have essentially covered: the initial act of understanding versus the subsequent act of understanding one’s own knowing. Sometimes these are termed primary and secondary insights. A primary insight is the insight into an object or problem – for example, grasping how to solve a math puzzle or suddenly understanding why the sky is blue. A secondary insight is an insight about the act of understanding itself or about the sufficiency of evidence – essentially, the insight that underpins a judgment. In Lonergan’s analysis of Aquinas, he notes that “both definition and judgment proceed from acts of understanding, but the former from direct, the latter from reflective understanding.” The reflective insight is what tells you that you have enough evidence to assert “Yes, this is true.” It often integrates many pieces: experiences, prior insights, and the newly grasped idea, to evaluate truth. In other words, the primary insight might answer What is it? and the secondary (reflective) insight answers Is it indeed so?. Together, they secure genuine knowledge. Aquinas anticipated this layering by distinguishing the inner word of simple understanding (concept) and the inner word of judgment (assertion of truth). Lonergan built upon that, making explicit that knowing isn’t just having an idea, but also critically affirming that idea against experience.

Summing up the human side: Our knowing has depth and structure. We begin with experience, rise to an insight (an inner word) that captures an intelligible form, and then we scrutinize and possibly endorse that insight through judgment, finally formulating what we know in communicable form. The verbum mentis—the inner word—is central: it is meaning grasped by understanding. The verbum vocis—outer expression—communicates that meaning. This entire flow is driven by an intentional thrust: we desire to know, we ask questions, we seek to understand why and whether. This will stand in stark contrast to what goes on in a machine.

Lonergan’s Theory of Statistical Knowing

Human inquiry naturally searches for order and intelligibility. Classical science (and classical philosophy) sought determinate laws and reasons for things – what Lonergan calls classical intelligibility. But Lonergan, writing in the mid-20th century, recognized that modern science had uncovered another form of order: statistical intelligibility. Where classical laws answer “What must happen, given these conditions?”, statistical laws answer “How often, on average, do things happen this way?”. Lonergan insisted that this is a distinct mode of intelligibility, not reducible to classical law, and it underpins how we know realities governed by probability.

In Lonergan’s view, classical laws are deterministic or formulaic relationships – e.g. Newton’s laws of motion, which say precisely what will happen if certain conditions are met. Statistical laws, by contrast, deal in frequencies and probabilities. They do not predict an individual event with certainty; they predict patterns over many events. For example, a radioactive atom’s decay is (in quantum theory) unpredictable in exact timing (no classical law can say when this atom will decay), but we have statistical laws that say half of a large sample will decay in 1 hour (a half-life). The intelligibility here lies in aggregate regularities.

Lonergan articulated this by comparing the “questions” each type of scientist asks. Classical inquiry seeks the “nature of” things – the universal law or essence that necessitates outcomes. Statistical inquiry seeks the “state of” things – the dispositions and probabilities governing how often outcomes occur. He wrote, “classical laws tell what would happen if conditions were fulfilled; statistical laws tell how often conditions are fulfilled.”. In other words, classical laws yield conditional certainties (“if X, then Y will always occur”), whereas statistical laws yield probabilistic expectations (“if X, then Y occurs 60% of the time”). Crucially, Lonergan saw them as complementary. Natural science uses both: for instance, gas laws (like pressure-volume relationships) are classical formulas, but they work in tandem with statistical laws of thermodynamics that describe average behavior of random molecular motions.

Why did Lonergan emphasize the legitimacy of statistical knowing? Because earlier thinkers often regarded probabilities as just ignorance or as incomplete laws. Lonergan argued that in a universe where many factors combine in complex ways, some patterns are intelligible only at the statistical level. He introduced the notion of the “empirical residue” – the aspect of data that escapes systematic (classical) understanding, appearing as random deviations. Those deviations aren’t simply mistakes; they follow statistical patterns. For example, the exact distribution of genetic traits in a population can’t be captured by one simple law (too many chance combinations), yet it follows intelligible frequency distributions (like bell curves, etc.). The intelligibility of statistical laws resides in sequences or averages, not in any single event by itself. An individual coin toss is unpredictable, but the average of many tosses (around 50% heads) is intelligible as a probability.

Lonergan went so far as to propose that the very structure of the universe is an interplay of classical and statistical laws, which he encapsulated in the idea of emergent probability. Emergent probability means the world is an ongoing mixture of regularities and chance: stable schemes of events emerge, conditioned by underlying probabilities. This allows novelty and development in the world without breaking deterministic laws – because determinism alone is incomplete. In Lonergan’s cosmology, “classical and statistical laws are not opposed but complementary”, providing a unified account of order.

For our purposes, the key takeaway is that Lonergan validated statistical understanding as genuine understanding. Knowing frequencies and probabilities is a bona fide mode of knowledge, different from but equal in dignity to knowing a deterministic mechanism. This insight will become very relevant when we look at artificial intelligence – which operates overwhelmingly on statistical principles. Lonergan’s distinction suggests that a machine might “know” in a way more analogous to how we know the statistical patterns of many events, rather than how we grasp an essential idea in one insight. A large part of AI’s power is indeed harnessing massive statistical correlations. We turn now to exactly how that works.

How Machines Learn and “Know”

Human knowing engages experience, insight, judgment, and is driven by meaning. But machines – especially modern AI systems – operate on a very different basis. They do not have intellects or inner words. They have data, algorithms, and representations. Yet, intriguingly, what they achieve can resemble knowledge: they recognize patterns, generate coherent language, and make decisions. At root, AI models exhibit a refined form of statistical knowing – high-dimensional and data-driven. Let’s unpack the technical mechanics of this, focusing on three paradigms: large language models, reinforcement learning agents, and neuro-symbolic systems. Along the way, we’ll use some formalism (equations and algorithms) to pin down how these systems learn, store, and use what we might call “knowledge.”

Large Language Models: Learning by Prediction

Large Language Models (LLMs) like GPT-4, BERT, or other transformer-based AIs have made headlines for their uncanny ability to generate human-like text. At their core, these models learn statistically from vast textual data. How? By a process known as self-supervised learning, essentially next-word prediction. During training, the model sees billions of example sentences and learns to predict the probability of the next token (word or sub-word) given the prior context. Formally, an LLM is trained to approximate the conditional distribution $P(\text{next word} \mid \text{previous context})$. The training objective is usually to minimize the prediction error across the training corpus – which is a form of empirical risk minimization.

In more concrete terms, suppose the training data is a huge set of sequences (sentences or documents) denoted $D = {(x^{(i)})}$ where each $x^{(i)}$ is a sequence of tokens $[t_1, t_2, ..., t_n]$. The model defines a function $f_\theta$ (with parameters $\theta$ being the millions or billions of weights) that gives a probability distribution over possible next tokens given a context. The learning algorithm chooses $\theta$ to minimize the average loss (often cross-entropy loss) between the model’s predicted distribution and the actual next token that occurred in the data. In equation form, if $\ell$ is the loss for one prediction, the training seeks:

θ^ = arg⁡min⁡θ 1N∑i=1N∑t=1n(i)−1 ℓ(fθ(t1…tt−1), tt),\hat{\theta} \;=\; \underset{\theta}{\arg\min}\; \frac{1}{N}\sum_{i=1}^{N} \sum_{t=1}^{n^{(i)}-1} \; \ell\Big(f_\theta(t_1 \dots t_{t-1}),\; t_{t}\Big),

where we sum over each position $t$ in each training sequence $i$, comparing the model’s predicted distribution $f_\theta(\text{context})$ to the actual next token $t_t$. This is a specific case of the Empirical Risk Minimization principle, which in general would be written as choosing $\hat{f} = \arg\min_f \frac{1}{N}\sum_{i=1}^N \ell(f(x_i), y_i)$ for some dataset of inputs $x_i$ and targets $y_i$. Here the “input” is the context and the “target” is the next word. By minimizing the average loss (which is the negative log-likelihood of the true next token), the model gradually improves its predictions.

Training a large language model means adjusting millions of internal weights by gradient descent so as to reduce this loss. The famous transformer architecture that underlies LLMs consists of layers of neurons that implement self-attention and feed-forward computations. The self-attention mechanism enables the model to weight different parts of the context when predicting the next word. It computes attention scores comparing the current token representation with every other token in the context, effectively learning which previous words are most relevant. These attention weights are dynamic and context-dependent, allowing the model to capture long-range dependencies and associations. For example, in the sentence “The cat that the dog chased was black,” a transformer can learn to strongly attend from the word “was” back to “cat” (not just the immediate neighbor “chased”) to predict that “was” will likely be followed by an adjective describing the cat. The result is that the model learns subtle statistical regularities of language: syntax, semantics, and even factual associations.

Inside an LLM, knowledge is encoded in the parameters – notably in the huge matrices of weights in each layer. During training, if the model sees many sentences about “Paris is the capital of France,” it will adjust certain weights so that the prompt “Paris is the capital of ...” leads to a high probability for “France.” There is no single “Paris -> France” fact stored at one location; rather, the knowledge is distributed across many connections. Researchers have found evidence that certain neurons or directions in the network’s representation space correspond to specific facts or concepts (for instance, one can sometimes locate a subset of weights that, when edited, change a specific factual association the model has learned). But generally, the model’s knowledge is encoded as patterns in a high-dimensional vector space.

When the model is put to use (inference time), it takes a prompt (some input text) and processes it through its layers to produce an output distribution. The final layer outputs a vector of logits (scores for each possible next token). These logits are converted to probabilities via a softmax function: $P(\text{token}=j) = \frac{\exp(z_j)}{\sum_k \exp(z_k)}$, where $z_j$ is the logit for token $j$. The softmax ensures the outputs form a probability distribution over the vocabulary. The model then typically samples or picks the highest-probability token as the next word, and the process repeats. In effect, the LLM “knows” statistically which words tend to follow which – not by understanding concepts as a human does, but by having adjusted its internal parameters to reflect the correlations present in its training data.

To illustrate the scale: A model like GPT has on the order of $10^{11}$ parameters (weights). Each parameter learned some tiny aspect of the data’s structure. Together, they encode an astonishing amount of statistical “knowledge” about language and the world described by that language. For instance, an LLM can continue a story in the style of Shakespeare because it has absorbed patterns of Elizabethan English. It can answer factual questions (up to a point) because it has seen many facts in context and knows the statistical cues for those facts. But it’s critical to note: This “knowledge” is not organized by logical facts or meanings – it’s embedded in a vast web of probabilities. It’s statistical knowing par excellence: the model excels at predicting likely continuations.

Reinforcement Learning: Learning by Reward

Another major paradigm in AI is reinforcement learning (RL). While LLMs passively absorb text, RL agents actively learn by interacting with an environment. The setup involves an agent (the AI) and an environment (which provides situations and feedback). At each time step, the agent observes the state of the environment and takes an action; the environment then returns a reward (a numerical score) and a new state. The agent’s goal is to maximize the cumulative reward it receives over time – what RL literature calls the return.

Formally, one often models this as a Markov Decision Process. The agent has a policy $\pi(a|s)$ giving the probability of taking action $a$ in state $s$. The return from time $t$ is $R_t = \sum_{k=0}^{T} \gamma^k , r_{t+k}$, where $r_{t+k}$ are future rewards and $\gamma\in[0,1)$ is a discount factor that makes future rewards somewhat less valuable than immediate ones. The objective is to find a policy $\pi^$ that maximizes the expected return from the start state: $\pi^ = \arg\max_{\pi} \mathbb{E}[R_0 | \pi]$. In plain English, the agent must learn to choose actions that yield the best long-term outcomes.

How does learning happen? Through trial and error guided by reward feedback. Early in training, the agent might behave randomly. It explores different actions and gradually “discovers” which ones tend to give higher rewards. Algorithms like Q-learning or policy gradients formalize this. For example, in Q-learning the agent tries to learn a value function $Q(s,a)$ that estimates the total future reward for taking action $a$ in state $s$. The famous Bellman optimality equation is:

Q∗(s,a)=Es′[ r(s,a)+γmax⁡a′Q∗(s′,a′) ],Q^*(s,a) = \mathbb{E}_{s'}\big[\,r(s,a) + \gamma \max_{a'} Q^*(s',a')\,\big],

meaning if you take the best possible next action $a'$ from the next state $s'$, the $Q$ of the current state-action is the immediate reward plus the discounted $Q$ of that best future move. The agent doesn’t know $Q^*$ initially, but it can iteratively update its estimates to satisfy this equation using experience samples. Over time, $Q(s,a)$ converges towards the true values, and the policy $\pi(s) = \arg\max_a Q(s,a)$ becomes the optimal one.

In policy gradient methods, instead of learning a value table, the agent directly adjusts its policy (often parameterized by a neural network $\theta$) to increase reward. A typical objective is $J(\theta) = \mathbb{E}{\pi\theta}[R_0]$. The gradient of $J$ can be estimated from sampled episodes of experience (using formulas derived from the policy gradient theorem), and $\theta$ is nudged in the direction that improves expected return.

Regardless of the algorithm, the result is that the agent’s policy improves with experience, in the direction of actions that yield higher frequencies of reward. It’s another flavor of statistical learning: the agent is essentially estimating the expected value of actions from empirical frequency of reward. There is no explicit “insight” or concept formation; the agent doesn’t suddenly realize a truth. Instead, it stochastically approaches an optimal behavior pattern by accumulating reward statistics.

Consider a concrete example: AlphaGo, the system that mastered the game of Go. It used reinforcement learning (combined with deep neural networks) to play millions of games against itself. Initially its moves were random; gradually, through many games, it identified patterns of moves that led to winning more often. The final policy network in AlphaGo effectively encodes knowledge of Go strategy – but that knowledge is stored as neural network weights optimized for high win probability. If we could inspect those weights, we wouldn’t see a neat list of Go tactics; we would just find numbers. The knowledge is implicit, embedded in the network’s ability to map a board state to a strong move.

In summary, an RL agent “knows” what to do in the sense that it behaves successfully. But how it knows is completely empirical. The agent has no semantic understanding; it has just statistically learned a mapping from states to actions that yields good results. This is often called behavioral knowledge or know-how (akin to procedural knowledge), as opposed to declarative facts. It’s reminiscent of learning by habit or association – except optimized by powerful algorithms. The grounding in probability is clear: reinforcement learning algorithms rely on averaging over many trials, estimating expected rewards (a fundamentally statistical quantity). The knowledge gained is not a universal law (“in all cases do X”), but a policy effective in aggregate. It might even exploit quirks of the training environment distribution. This will be important when we compare to human cognition: RL is closer to learning by experience and feedback (like an animal training) than to insightful understanding.

Neuro-Symbolic AI: Integrating Patterns and Concepts

A third approach worth noting is neuro-symbolic AI, which attempts to combine the strengths of neural networks (pattern recognition, statistical learning) with the strengths of symbolic systems (explicit logic, rules, and knowledge representation). Pure deep learning (as in LLMs or deep RL) is powerful but can be “opaque” and brittle – it lacks transparency and sometimes fails to capture logical relationships. Pure symbolic AI (expert systems, knowledge graphs) is easily interpretable and can handle clear rules and relations, but it struggles with raw perceptual data and requires manual knowledge engineering. Neuro-symbolic AI aims for the best of both: efficient pattern learning plus transparent reasoning.

In a neuro-symbolic system, knowledge may be stored in two forms: distributed weights (in neural components) and explicit symbols (in a knowledge base or logic rules). For example, consider a vision system that needs to identify objects and also reason about them. A neural network might handle the visual recognition (pixel patterns to “cat” or “dog” labels), while a symbolic reasoner might handle a rule like “if the pet is a cat, it doesn’t bark.” In a purely neural system, such a rule would have to be inferred from statistical correlations. In a purely symbolic system, recognizing the pet from an image would be extremely hard. In a neuro-symbolic system, the neural part can feed the symbolic part with recognized entities, and the symbolic part can apply logical constraints or consistency checks on the neural outputs.

One concrete example: IBM’s Neuro-Symbolic Concept Learner integrated a CNN (convolutional neural net) for image perception with a logic reasoning module. The CNN outputs probabilities for image features or objects, and the logic module uses a knowledge base to interpret combinations of features – improving accuracy and ensuring the results adhere to known rules (like spatial relations or hierarchical relations among objects). Another example is using a knowledge graph together with a language model: the language model generates a response, but a symbolic component checks a database or performs a reasoning step to verify facts before finalizing the answer.

From a learning perspective, neuro-symbolic systems often use neural networks for the statistical part and maintain a set of symbolic facts or rules that are either pre-built or learned in a more discrete way. Some research attempts to learn the symbolic rules themselves from data (which is challenging – it requires discovering interpretable structure in what a neural net has learned). The promise, however, is explainability and generalization: the symbolic part can be inspected, and it may generalize based on logical inference rather than just pattern similarity.

Technically, integrating the two is hard because neural networks operate with continuous numbers and gradient-based learning, while symbolic reasoning is discrete and non-differentiable. Approaches to bridge this include differentiable logical operators (fuzzy logic or probabilistic logic networks), or alternating phases of neural learning and symbolic optimization.

For our purposes, the neuro-symbolic trend highlights that not all knowledge in AI has to remain subsumed in opaque statistical form. There is an effort to have AI systems possess explicit knowledge (e.g., a fact database or a rule set) while still leveraging neural nets to process raw inputs. You might say a neuro-symbolic AI has something analogous to an “verbum vocis”: a symbolic expression it can manipulate. For instance, it might internally represent “Cat(X) AND On(X,Mat)” as a logical formula describing a scene – that’s like an internal description which it can then use to answer questions (an external response). In a purely neural system, by contrast, there is no easily readable assertion; there are only activations and weights.

In summary, neuro-symbolic AI seeks to address the weaknesses of purely statistical learning by reintroducing explicit structures. It’s still an emerging area, but it underscores a key point: even machines can potentially benefit from a classical law framework. However, most mainstream AI today (like LLMs or deep RL) is still overwhelmingly statistical and subsymbolic. We now have a picture of how machines learn: through lots of data, adjusting numerical weights to minimize error or maximize reward, capturing knowledge in distributed form, unless explicitly given symbolic frameworks as resources.

Next, we will discuss how we peek inside these machine “minds” – because unlike a human mind, a neural network doesn’t tell us what it “thinks” in words. We need special techniques to interpret and verify the knowledge it has absorbed.

Interpreting and Explaining AI “Knowledge”

AI models, especially deep neural networks, are often criticized as “black boxes.” We might observe what output they give, but not why. In human knowing, if you ask someone why they think something, they can attempt to explain (thanks to the inner word and conscious insight). A machine learning model doesn’t have conscious access to its “reasons” – it just computes. Nonetheless, researchers have developed explainability methods to shed light on the inner workings of AI systems. These methods don’t give the model a true inner voice, but they allow us (human observers) to understand aspects of what the model has learned or how it is making decisions. Here we highlight a few prominent techniques: attention visualization, concept activation vectors, knowledge attribution, and calibration. Each addresses a different facet of interpretability and reliability in statistical learning systems.

Attention Visualization. In models that use an attention mechanism (like transformers or certain image models), we can visualize the attention weights to see what the model is focusing on. For example, in a translation model translating a sentence from English to French, an attention map can show which source words the model found most relevant when generating each target word. In large language models, attention visualizations (using tools such as BertViz) can illustrate how the model’s heads attend to different parts of the input. One might find that, in processing a long paragraph, certain attention heads specifically track long-range dependencies (like a pronoun “it” attending back to the noun it refers to). While attention is not a perfect explanation of a model’s reasoning, it is a direct peek into one mechanism the model uses to combine information. For an input sentence, a heatmap of attention scores tells us “which words were most influential in the model’s intermediate computation for generating the next token.” This is analogous to highlighting which parts of the evidence a human might have considered – though in the model’s case it’s purely based on learned weights, not conscious choice.

Visualizing attention helps demystify model behavior. For instance, if an LLM outputs a surprising or seemingly irrelevant continuation, examining attention might reveal that the model latched onto an earlier phrase that a human reader ignored. In safety-critical applications, attention maps can sometimes help verify that the model is attending to the right inputs (e.g., focusing on relevant sections of a document for a question, rather than being distracted by irrelevant details). It’s not a full proof of correctness, but it’s one tool to interpret the statistical heuristics the model is using.

Concept Activation Vectors (CAV). Another interpretability approach is to probe the model’s internal representation space for human-understandable concepts. A Concept Activation Vector (CAV) is essentially a direction in the network’s latent space that corresponds to a concept we care about. For example, in an image classification network, we might take a bunch of images that humans classify as “striped” and another set of “non-striped” images. We feed them through the network to a certain layer and collect the neuron activations. Using these, we can derive a vector in that layer’s activation space that points toward “stripe-ness.” Once we have this “stripe” vector, we can test new images to see how strongly that concept is present by projecting their activations onto the concept vector. TCAV (Testing with Concept Activation Vectors) uses this to answer questions like “Is the concept of stripes important for the network’s classification of zebras versus horses?”. If the zebra images strongly align with the “stripe” vector and horses don’t, and the model predicts zebra vs horse accordingly, we deduce that the model has internally represented the concept of stripes and uses it for its decision.

In language models, analogous techniques exist (though language concepts are trickier to isolate than visual ones). Researchers try to find directions in the embedding space corresponding to attributes like sentiment, gender, or tense. For instance, one might find a direction in GPT’s hidden state space that, when moved along, shifts a sentence from present tense to past tense usage, indicating the model implicitly structured that grammatical concept.

Concept vectors are valuable because they translate the model’s distributed knowledge into interpretable units. They provide a glimpse that, yes, the model learned something akin to the concept a human would also use. It makes the otherwise opaque high-dimensional weights a bit more transparent by linking them to real-world notions. It’s as if we can ask the model, “Did you notice the stripes?”, not in words but through this vector projection, and it can respond “yes, strongly” or “no” in terms of magnitude. This is still an emerging area of interpretability research, grappling with issues like concepts being entangled (a single neuron or vector often isn’t purely one concept). But it shows promise in auditing what knowledge a network has internalized.

Knowledge Attribution. A different angle is to trace which part of the network or which training examples are responsible for a given piece of output – especially a factual statement. Suppose a language model outputs, “The capital of France is Paris.” We might wonder: Does the model “know” this as a fact stored in its weights, or did it just parrot a specific sentence it saw? Knowledge attribution methods try to answer such questions. One approach is Neuron-level attribution, where we identify which neurons (or which layers) contribute most to producing a specific factual association. For instance, researchers have found certain neurons in GPT that, when activated, specifically steer towards country-capital pairs. By intervening on those neurons, you can sometimes make the model forget or change a particular fact without affecting other outputs – implying those neurons held that piece of knowledge.

Another approach is training data attribution. For a given output, algorithms (using influence functions or gradient-based trace techniques) can find which training samples had the most influence on that output. If the model says “Paris is the capital of France,” knowledge attribution might reveal a set of sentences in its training data about Paris and capitals that most affected the model’s parameters towards producing that answer. This can increase our trust: if we see it was influenced by many correct statements, we’re more confident in the model’s answer than if it was influenced by a single obscure source.

Knowledge attribution recognizes that a large model’s “knowledge base” is implicitly spread across its trillions of learned weights. By analyzing those, we aim to accurately identify the factual knowledge stored in the neural network. This helps in several ways: (1) debugging model errors (e.g., why did it incorrectly say a factual? Perhaps a certain neuron mis-associates two facts); (2) updating models (there’s research on editing model knowledge by locating the specific weights tied to a fact and adjusting them); and (3) verifying sources (an important direction is training models to not only output an answer but also cite the training sources – effectively making the training attribution explicit at inference time). In sum, knowledge attribution techniques are an attempt to map the model’s vast statistical knowledge back to human-understandable provenance or components.

Calibration. Finally, calibration addresses how well a model’s confidence in its predictions matches reality. A model is well-calibrated if, among all predictions it assigns 80% confidence to, about 80% of those turn out to be correct. Calibration doesn’t explain internal mechanisms per se, but it’s crucial for understanding if we can trust the model’s outputs probabilistically. An LLM might say “I am 99% sure that...” – but if it’s wrong half the time it says that, it’s miscalibrated. Many deep models, it turns out, are over-confident: they might give very high probability to an answer that is actually incorrect. This is dangerous in applications because users and systems may over-trust the AI’s outputs.

Methods like temperature scaling are used to improve calibration. Temperature scaling, for example, learns a single parameter $T$ such that when we divide the model’s logits by $T$ before softmax, the resulting probabilities better reflect true accuracy likelihood. Essentially, if a model’s predictions are too confident across the board, increasing the “temperature” will soften the probabilities (bringing that 99% down to maybe 90%, etc.) without changing the predicted class itself. More complex calibration techniques involve modeling prediction uncertainty or using ensembles of models.

Why include calibration in a discussion of knowing? Because in human knowing, the closest analogue is certainty vs doubt. A rational human knower not only tries to be correct, but also to have their degree of confidence be appropriate to the evidence. We consider someone well-calibrated if they “know what they know (and how well they know it).” Current AI models do not know that they don’t know – they often blithely give an answer with high confidence even when extrapolating outside their knowledge. Calibration techniques are a way to imbue the system with a more honest representation of its statistical certainty. This is a part of making AI’s statistical knowledge more usable and interpretable: if a model says it’s only 60% sure, a human can decide to double-check that answer.

Together, these interpretability and reliability methods are improving our understanding of AI systems. We can watch where a model “looks” (attention), see if it has learned human-like ideas (concept vectors), trace what it knows and where that knowledge came from (knowledge attribution), and adjust how it expresses uncertainty (calibration). These tools emphasize the point: AI’s knowing is mechanistic and statistical, but we can partially translate it into our human frame of reference. Still, there’s no ghost in the machine contemplating truth. There are matrices and calculations, which we interpret after the fact.

Human Knowing vs. Machine Knowing: A Comparative Synthesis

After this deep dive, let’s step back and compare side-by-side the key features of Thomist-Lonerganian human knowing and modern machine learning-based “knowing.” Both involve layers and processes, but their nature is fundamentally different. The following table highlights some of the parallels and contrasts:
Link to Google Sheet

This comparative table underscores: Human knowing is an intentional, conceptual, and self-aware process, whereas machine learning is an unintentional, statistical, and opaque process. Both involve building up answers from data (no magic involved), but a human knows why they know (at least at the reflective level), while a machine simply behaves as if it knows, by correlating inputs to outputs.

Yet, we should not ignore the analogy: Lonergan’s point about statistical knowing being a bona fide form of intelligibility is almost vindicated by AI. A large language model is essentially statistical knowledge made functional. It has no insight in the Thomist sense, but it doesn’t need it to be useful to us. It leverages sheer volume of experience (training data) and sophisticated pattern extraction (deep networks) to achieve results that we can use in our own human understanding. We ask the model a question, it gives a coherent answer – not because it truly understands the question as we do, but because it has seen many linguistically similar situations and statistically inferred a fitting response.

To avoid concluding on a mystic note, let’s be clear: today’s AI does not possess intellect or true understanding. It cannot form a verbum mentis – it has no concept of “concept.” It cannot perform the self-reflective judgment that something is true; it only outputs what its learned probabilities suggest. In Aquinas’s terms, an AI does not have an agent intellect to illuminate phantasms; it is more like a very complex automated phantasm manipulator, finding patterns in verbia vocis with no insight. But what it does exemplify is the power of statistical knowing: given enough data and a powerful enough statistical model, you can approximate certain outputs of rational knowing. An LLM can answer many factual questions correctly – effectively, it has statistically absorbed those facts. It has a kind of inert, assembled verbum vocis (in that its output sentence is like an outer word devoid of an inner word) since it is working off a training dataset of vast reams of outer words. Reinforcement learners can achieve goals in games or robotics akin to someone knowing how to do a task – though the machine doesn’t “know it knows,” it just performs on the basis of it’s refined statistical model.

In conclusion, Aquinas and Lonergan give us a rich picture of human knowing as a layered, intentional act that goes from experience to idea to verified truth and expression. Lonergan also opens the door to understanding less direct forms of knowing, like the statistical laws of phenomena. Modern AI resides firmly in the realm of statistical intelligibility. It lacks the higher insights and reflective judgment, but it capitalizes on pattern frequencies at a scale no human could ever process. In a way, AI is harnessed emergent probability: massive data and statistical algorithms producing intelligent-like behavior. As interdisciplinary experts, we can appreciate that comparing human and machine cognition isn’t about declaring one superior, but about understanding two very different realizations of “knowing.” One is rooted in meaning and being, the other in correlation and function. Both are fascinating – and together, they force us to refine what we mean by knowledge itself.

Sources:

Aquinas, Thomas. Summa Theologiae, I q.27 a.1-2 (on the Word in God and analogy of the inner word).
Lonergan, Bernard. Insight: A Study of Human Understanding. Toronto: University of Toronto Press, 1992 (Collected Works edition).
Lonergan, Bernard. Verbum: Word and Idea in Aquinas. Notre Dame: University of Notre Dame Press, 1967.
Internet Encyclopedia of Philosophy – “Bernard Lonergan”.
Henriques, Mendo Castro. “The Lonerganian Revolution in the Understanding of Scientific Research.” International Journal of Communication Research 6, no.3 (2016): 225-233.
Kendall, Geoff. “Ethics, Emergent Probability, and Freedom.” (Paper on Lonergan’s cosmology and intelligibility).
Spinning Up (OpenAI). “Key Concepts in RL.” (Online documentation).
mlweb.loria.fr. “Empirical Risk Minimization.” (Online textbook excerpt).
Pleiss, Geoff et al. “On Calibration of Modern Neural Networks.” Proceedings of ICML 2017.
Kim, Been et al. “Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV).” ICML 2018.
Meng, Z. et al. “Neuron-Level Knowledge Attribution.” EMNLP 2024 (arXiv:2312.12141).
Vig, Jesse. “A Multiscale Visualization of Attention in the Transformer Model” (BertViz tool).
Turing Institute. “Neuro-symbolic AI: Integrating deep learning and symbolic structures” (Interest Group page).

Poured Brews

Discussion about this post