concept
active
concept:representational-driftrepresentational drift
Accumulation of mismatch in later layers causing S degradation.
Neighborhood — ranked by edge-count
Concepts (1)
concept
- representational mismatch drassociated_withDistance between prior and target representations.
Findings (1)
finding
- Systematic layer 20-28 degradation in S(ℓ) to S ≈ −2.40 by layer 27 on LLaMAassociated_withsupportsValidates representational drift theory: later layers specialize for next-token prediction, increasing dr
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core phenomenon studied: when causal interventions shift internal representations away from the natural distribution
- The evolution of an agent's latent representations over the course of training, shown to align with reward improvement when causal emergence is high.
- The central empirical phenomenon: different neural networks trained on different data/objectives develop increasingly similar representations
- A failure mode exposed by the SAE framework where model representations are entangled or collapse under intervention
- Behavioural drift in multi-turn LLM interaction; documented in prior work for persona, identity, and instruction-following
- Property of conscious representations: they do not contain information about the fact that they are representations at the level of the representation itself
- Parent concept; the practice of controlling neural network outputs by manipulating internal representations.
- The proposed domain-general property indexed by deception features that governs both factual accuracy and experiential self-report