[ PHILOSOPHICAL MAPPING ]

We analyse the
Appearance of Meaning

We draw on 2000 years of philosophy of language, meaning, reasoning and context as a map for understanding how transformer-based AI works.

MENU
Intuition

Let us meet at the bank.
No one knows what this means, because a word does not always mean the same thing in every sentence. The word stays the same; the surrounding context changes its interpretation.

Same word, different meaning.
CONTEXT SHAPES INTERPRETATION
BANK

She deposited cash at the bank.

Finance ▲River ▽

They sat by the bank of the river.

Finance ▽River ▲
MENU
THE CENTRAL QUESTION

How does meaning emerge inside a language model?
One research question explored at three zoom levels: Theory (philosophy), Behaviourally (controlled tests), Mechanically (internal feature space).

One question, three zoom levels.
theory → behavior → mechanism
See the three levels
ONE QUESTION · THREE ZOOM LEVELS
HOW DOES MEANING EMERGE?
Theory

What kind of semantics do transformer models seem to support?

Behavior

Can context-sensitive meaning-like competence be measured and causally tested?

Mechanism

Which internal substrates recover that control, and when do they fail?

The central question

How does meaning emerge inside a language model?

One research question explored at three zoom levels: philosophy, controlled behavioral tests, and internal feature space.

One question • three zoom levels

One descending program

The arc across the papers is a descent in scale: (1) a theory-level claim about context dependence, (2) a measurable competence profile tested under controls, and (3) the internal organization of that control.

Paper 1

Context-Dependent Semantics in Large Language Models

Transformer models support a relational, context-dependent picture of meaning better than a fixed lookup picture.

Philosophy / theory
Paper 2

The Appearance of Meaning (AoM)

Meaning-like behavior can be measured under controlled tests, and context-conditioned internal states causally control interpretation shifts.

Behavioral + causal testing
Paper 3

Mechanics of Meaning (MoM)

Some causal control is selectively recoverable in sparse feature space, but only in certain layers and tasks.

Mechanistic interpretability
What the newest paper adds

Broad swap vs sparse edit

The newest work asks not just whether context-sensitive control exists, but how it is internally organized. The picture is mixed in an informative way: some early disambiguation effects can be edited more selectively in a sparse feature basis, while broader discourse tasks remain more distributed.

This is a result about the basis structure of contextualized control, not about finding tiny context-free "meaning atoms."
Why this matters

For philosophy of language

Transformer models give us a manipulable system for testing claims about context dependence.

For interpretability

Meaning-like behavior is not implemented uniformly; some parts are sparse, others distributed.

For alignment & steering

More selective internal edits may help distinguish useful control from collateral disturbance.

Read the papers

In reading order

Paper 1 • Philosophy / theory

Context-Dependent Semantics in Large Language Models

Key takeaway: Transformer success fits a relational, context-dependent picture of meaning better than a fixed lookup model.

Main limitation: theory-level scope; claims are broad and do not by themselves isolate causal internal variables.

PDF
Paper 2 • Behavioral + causal testing

The Appearance of Meaning (AoM)

Key takeaway: context-sensitive competence can be operationalized and tested under controls; context-conditioned internal states causally control interpretation shifts.

Main limitation: competence targets are scoped; results describe internal control for these behaviors, not full human meaning.

Paper 3 • Mechanistic interpretability

Mechanics of Meaning (MoM)

Key takeaway: some context-sensitive control is more selectively recoverable in sparse feature space in early regimes; other behaviors remain distributed.

Main limitation: sparsity is conditional on layer/task; results do not license "single meaning unit" interpretations.

2-minute overview

How meaning seems to form inside a language model

To understand how meaning forms inside a language model, take a simple word like "bank." In one sentence it means a place for money. In another it means the side of a river. The word itself stays the same. What changes is the context around it.

That observation motivates my recent research on large language models. The question is not whether these systems "really understand" language in the full human sense. That debate is too vague on its own. The more useful question is: what kind of context-sensitive behavior do these models show, and how is that behavior implemented inside them?

My work approaches that question in three steps.

The first paper makes a broad philosophical argument. Transformer models do not seem to treat words as if they each come with one fixed, dictionary-like meaning that is simply looked up and applied. Instead, they constantly rebuild a word's role from the surrounding context. In that sense, their success supports a more relational picture of language: meaning is not carried by isolated words alone, but by words as they appear in context.

The second paper, The Appearance of Meaning (AoM), turns that broad idea into something testable. Instead of arguing directly that language models have "meaning proper," it defines a narrower empirical target: a measurable profile of meaning-like behavior. Can the model disambiguate words from context? Does it respond to changes that alter meaning more than to superficial changes that do not? Does it track discourse constraints across longer passages? Under controlled tests, the answer is often yes. More importantly, when I intervene on the model's internal states, I find that context-conditioned internal states causally control those interpretation shifts.

The third paper, Mechanics of Meaning (MoM), asks the next question: what form does that control take inside the model? If context-sensitive control is real, is it spread diffusely across the model's internal activity, or is some of it recoverable in a more selective feature basis? To test that, I compare two kinds of interventions: broader edits to the model's internal state, and more selective edits in a sparse feature space.

The answer is qualified, but informative. In some early disambiguation settings, it can help to work in a more selective vocabulary - isolating a smaller set of internal variables - rather than treating the model's state as one big, entangled object. In other settings (especially broader discourse behavior), that advantage weakens and the relevant control looks more distributed.

That matters because it rules out two overly simple stories at once. One is the dismissive story that language models are doing nothing but shallow pattern mimicry. The other is the overly neat story that there must be a single internal "meaning unit" for each concept. What the evidence suggests instead is more interesting: context-sensitive control in language models is real, measurable, and internally heterogeneous. Some of it is selectively recoverable. Some of it is not.

So the arc across the three papers is a descent in scale. The first asks what kind of theory of language transformer models seem to support. The second asks what can be measured and causally tested. The third asks how that control is organized in the model's internal feature space.

That does not settle the largest philosophical questions. It does, however, turn a vague argument about whether models "understand" into a more precise research program: what kinds of meaning-like behavior exist, which internal states control them, and when that control is more selective rather than broadly distributed.

The papers are linked above in reading order.