concept
active
concept:representational-failure

Representational Failure

A failure mode exposed by the SAE framework where model representations are entangled or collapse under intervention

Neighborhood — ranked by edge-count

Claims (2)

claim

Concepts (3)

concept
  • Entanglement
    associated_with
    Less hierarchical than embedment; multiple texts work into and out of each other, creating associations across levels and connecting any single text to the matrix of all others.
  • Type of concept steering intervention that catastrophically collapses global model performance.
  • Entanglement phenomenon where age and pathology concepts cannot be independently steered without corrupting each other.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The central empirical phenomenon: different neural networks trained on different data/objectives develop increasingly similar representations
  • The evolution of an agent's latent representations over the course of training, shown to align with reward improvement when causal emergence is high.
  • Accumulation of mismatch in later layers causing S degradation.
  • Core phenomenon studied: when causal interventions shift internal representations away from the natural distribution
  • The proposed domain-general property indexed by deception features that governs both factual accuracy and experiential self-report
  • Measure of similarity between the similarity structures (kernels) induced by two different representations
  • Dominant interpretation of generative models as neural structures with representational content; main target of critique
  • How familiar a model is with a numeral system, manipulated via bases in Experiment 2.