Representational Failure

A failure mode exposed by the SAE framework where model representations are entangled or collapse under intervention

Neighborhood — ranked by edge-count

claim

concept

Entanglement
associated_with
Less hierarchical than embedment; multiple texts work into and out of each other, creating associations across levels and connecting any single text to the matrix of all others.
wrecking-ball intervention
extends
Type of concept steering intervention that catastrophically collapses global model performance.
age-pathology confounding
extends
Entanglement phenomenon where age and pathology concepts cannot be independently steered without corrupting each other.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Representational Convergenceconcept0.798
The central empirical phenomenon: different neural networks trained on different data/objectives develop increasingly similar representations
Representational dynamicsconcept0.777
The evolution of an agent's latent representations over the course of training, shown to align with reward improvement when causal emergence is high.
representational driftconcept0.769
Accumulation of mismatch in later layers causing S degradation.
Representational Divergenceconcept0.768
Core phenomenon studied: when causal interventions shift internal representations away from the natural distribution
Representational Honestyconcept0.765
The proposed domain-general property indexed by deception features that governs both factual accuracy and experiential self-report
Representational Alignmentconcept0.759
Measure of similarity between the similarity structures (kernels) induced by two different representations
Structural Representationalismframework0.754
Dominant interpretation of generative models as neural structures with representational content; main target of critique
Representational familiarityconcept0.752
How familiar a model is with a numeral system, manipulated via bases in Experiment 2.