How meaning seems to form inside a language model

The central question

How does meaning emerge inside a language model?

One research question explored at three zoom levels: philosophy, controlled behavioral tests, and internal feature space.

Theory Mechanics Browse the papers

One question · three lenses

"Let us meet at the bank."

resolves through

Philosophy

Does meaning depend on context?

Behavior

Can we measure and causally test that?

Feature space

How is control organized inside the model?

Context A · financial

Finance institution

Context B · river

Riverbank

One question • three zoom levels

One descending program

The arc across the papers is a descent in scale: (1) a theory-level claim about context dependence, (2) a measurable competence profile tested under controls, and (3) the internal organization of that control.

Paper 1

Context-Dependent Semantics in Large Language Models

Transformer models support a relational, context-dependent picture of meaning better than a fixed lookup picture.

Philosophy / theory

Paper 2

The Appearance of Meaning (AoM)

Meaning-like behavior can be measured under controlled tests, and context-conditioned internal states causally control interpretation shifts.

Behavioral + causal testing

Paper 3

Mechanics of Meaning (MoM)

Some causal control is selectively recoverable in sparse feature space, but only in certain layers and tasks.

Mechanistic interpretability

Visualizing the descent

What the newest paper adds

Broad swap vs sparse edit

The newest work asks not just whether context-sensitive control exists, but how it is internally organized. The picture is mixed in an informative way: some early disambiguation effects can be edited more selectively in a sparse feature basis, while broader discourse tasks remain more distributed.

This is a result about the basis structure of contextualized control, not about finding tiny context-free "meaning atoms."

Graphic 3

Raw residual intervention

broader internal edit

Sparse feature intervention

more selective internal edit

Early disambiguation: sparse edit can be cleaner / more efficient.

Broader discourse tasks: raw edit is often as good or better.

Why this matters

For philosophy of language

Transformer models give us a manipulable system for testing claims about context dependence.

For interpretability

Meaning-like behavior is not implemented uniformly; some parts are sparse, others distributed.

For alignment & steering

More selective internal edits may help distinguish useful control from collateral disturbance.

Read the papers

In reading order

Paper 1 • Philosophy / theory

Context-Dependent Semantics in Large Language Models

Key takeaway: Transformer success fits a relational, context-dependent picture of meaning better than a fixed lookup model.

Main limitation: theory-level scope; claims are broad and do not by themselves isolate causal internal variables.

PDF

Paper 2 • Behavioral + causal testing

The Appearance of Meaning (AoM)

Key takeaway: context-sensitive competence can be operationalized and tested under controls; context-conditioned internal states causally control interpretation shifts.

Main limitation: competence targets are scoped; results describe internal control for these behaviors, not full human meaning.

Theory PDF Code/data

Paper 3 • Mechanistic interpretability

Mechanics of Meaning (MoM)

Key takeaway: some context-sensitive control is more selectively recoverable in sparse feature space in early regimes; other behaviors remain distributed.

Main limitation: sparsity is conditional on layer/task; results do not license "single meaning unit" interpretations.

PDF Code/data Mechanics

2-minute overview

How meaning seems to form inside a language model

To understand how meaning forms inside a language model, take a simple word like "bank." In one sentence it means a place for money. In another it means the side of a river. The word itself stays the same. What changes is the context around it.

That observation motivates my recent research on large language models. The question is not whether these systems "really understand" language in the full human sense. That debate is too vague on its own. The more useful question is: what kind of context-sensitive behavior do these models show, and how is that behavior implemented inside them?

My work approaches that question in three steps.

The first paper makes a broad philosophical argument. Transformer models do not seem to treat words as if they each come with one fixed, dictionary-like meaning that is simply looked up and applied. Instead, they constantly rebuild a word's role from the surrounding context. In that sense, their success supports a more relational picture of language: meaning is not carried by isolated words alone, but by words as they appear in context.

The second paper, The Appearance of Meaning (AoM), turns that broad idea into something testable. Instead of arguing directly that language models have "meaning proper," it defines a narrower empirical target: a measurable profile of meaning-like behavior. Can the model disambiguate words from context? Does it respond to changes that alter meaning more than to superficial changes that do not? Does it track discourse constraints across longer passages? Under controlled tests, the answer is often yes. More importantly, when I intervene on the model's internal states, I find that context-conditioned internal states causally control those interpretation shifts.

The third paper, Mechanics of Meaning (MoM), asks the next question: what form does that control take inside the model? If context-sensitive control is real, is it spread diffusely across the model's internal activity, or is some of it recoverable in a more selective feature basis? To test that, I compare two kinds of interventions: broader edits to the model's internal state, and more selective edits in a sparse feature space.

The answer is qualified, but informative. In some early disambiguation settings, it can help to work in a more selective vocabulary - isolating a smaller set of internal variables - rather than treating the model's state as one big, entangled object. In other settings (especially broader discourse behavior), that advantage weakens and the relevant control looks more distributed.

That matters because it rules out two overly simple stories at once. One is the dismissive story that language models are doing nothing but shallow pattern mimicry. The other is the overly neat story that there must be a single internal "meaning unit" for each concept. What the evidence suggests instead is more interesting: context-sensitive control in language models is real, measurable, and internally heterogeneous. Some of it is selectively recoverable. Some of it is not.

So the arc across the three papers is a descent in scale. The first asks what kind of theory of language transformer models seem to support. The second asks what can be measured and causally tested. The third asks how that control is organized in the model's internal feature space.

That does not settle the largest philosophical questions. It does, however, turn a vague argument about whether models "understand" into a more precise research program: what kinds of meaning-like behavior exist, which internal states control them, and when that control is more selective rather than broadly distributed.

The papers are linked above in reading order.

Theory Mechanics Three-valued logic post

We analyse the Appearance of Meaning

How does meaning emerge inside a language model?

One descending program

Context-Dependent Semantics in Large Language Models

The Appearance of Meaning (AoM)

Mechanics of Meaning (MoM)

Broad swap vs sparse edit

For philosophy of language

For interpretability

For alignment & steering

In reading order

Context-Dependent Semantics in Large Language Models

The Appearance of Meaning (AoM)

Mechanics of Meaning (MoM)

How meaning seems to form inside a language model

We analyse the
Appearance of Meaning