concept
active
concept:representational-drift

representational drift

Accumulation of mismatch in later layers causing S degradation.

Neighborhood — ranked by edge-count

Concepts (1)

concept

Findings (1)

finding

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Core phenomenon studied: when causal interventions shift internal representations away from the natural distribution
  • The evolution of an agent's latent representations over the course of training, shown to align with reward improvement when causal emergence is high.
  • The central empirical phenomenon: different neural networks trained on different data/objectives develop increasingly similar representations
  • A failure mode exposed by the SAE framework where model representations are entangled or collapse under intervention
  • Persona driftconcept0.768
    Behavioural drift in multi-turn LLM interaction; documented in prior work for persona, identity, and instruction-following
  • Property of conscious representations: they do not contain information about the fact that they are representations at the level of the representation itself
  • Parent concept; the practice of controlling neural network outputs by manipulating internal representations.
  • The proposed domain-general property indexed by deception features that governs both factual accuracy and experiential self-report