representational drift

Accumulation of mismatch in later layers causing S degradation.

Neighborhood — ranked by edge-count

concept

representational mismatch dr
associated_with
Distance between prior and target representations.

finding

Systematic layer 20-28 degradation in S(ℓ) to S ≈ −2.40 by layer 27 on LLaMA
associated_withsupports
Validates representational drift theory: later layers specialize for next-token prediction, increasing dr

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Representational Divergenceconcept0.809
Core phenomenon studied: when causal interventions shift internal representations away from the natural distribution
Representational dynamicsconcept0.808
The evolution of an agent's latent representations over the course of training, shown to align with reward improvement when causal emergence is high.
Representational Convergenceconcept0.778
The central empirical phenomenon: different neural networks trained on different data/objectives develop increasingly similar representations
Representational Failureconcept0.769
A failure mode exposed by the SAE framework where model representations are entangled or collapse under intervention
Persona driftconcept0.768
Behavioural drift in multi-turn LLM interaction; documented in prior work for persona, identity, and instruction-following
Representational Transparencyconcept0.766
Property of conscious representations: they do not contain information about the fact that they are representations at the level of the representation itself
Representation Steeringconcept0.765
Parent concept; the practice of controlling neural network outputs by manipulating internal representations.
Representational Honestyconcept0.762
The proposed domain-general property indexed by deception features that governs both factual accuracy and experiential self-report