Linear World Models in LLMs

Prior work framework studying whether LLMs encode world models as linear structures in their representations

Neighborhood — ranked by edge-count

paper

framework

Mass-Mean Probing
extends
Introduced in this paper: an optimization-free probing technique using difference-in-means direction with optional covariance correction

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Linear mixed-effects models (LMMs)method0.827
Primary statistical model with random intercept by conversation, REML estimation, for pooled conversation-turn observations
Linear Representation of Concepts in LLMsconcept0.825
The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
Large Language Models (LLMs)concept0.819
Transformer-based models like GPT-4, LaMDA, PaLM; assessed for GWT indicators.
Non-Linear Representations in LLMsconcept0.806
Recent work identifying cases where LLM features are not one-dimensionally linear, a caveat to the linearity hypothesis.
The better an LLM is at language modeling, the more it aligns with vision models, and vice versa — linear relationship between language modeling score and vision-language alignmentfinding0.780
Core cross-modal empirical result: larger and better language models align better with vision models
As LLMs scale, they develop increasingly general abstractions, with large models linearly representing abstract concepts like truth that capture shared properties of diverse inputsclaim0.778
Interpretive claim connecting scale to abstraction level in LLM representations
Reflection in LLMsconcept0.774
The core phenomenon studied: the ability of LLMs to evaluate and revise their own reasoning.
Auditory models are roughly aligned with LLMs up to a linear transformationfinding0.772
Ngo & Kim result extending cross-modal convergence to the auditory domain