concept
active
concept:representational-failureRepresentational Failure
A failure mode exposed by the SAE framework where model representations are entangled or collapse under intervention
Neighborhood — ranked by edge-count
Claims (2)
claim
- A critical failure mode identified in the paper demonstrating risk of naïve concept steering
- A specific representational failure with direct clinical safety implications
Concepts (3)
concept
- Entanglementassociated_withLess hierarchical than embedment; multiple texts work into and out of each other, creating associations across levels and connecting any single text to the matrix of all others.
- wrecking-ball interventionextendsType of concept steering intervention that catastrophically collapses global model performance.
- age-pathology confoundingextendsEntanglement phenomenon where age and pathology concepts cannot be independently steered without corrupting each other.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The central empirical phenomenon: different neural networks trained on different data/objectives develop increasingly similar representations
- The evolution of an agent's latent representations over the course of training, shown to align with reward improvement when causal emergence is high.
- Accumulation of mismatch in later layers causing S degradation.
- Core phenomenon studied: when causal interventions shift internal representations away from the natural distribution
- The proposed domain-general property indexed by deception features that governs both factual accuracy and experiential self-report
- Measure of similarity between the similarity structures (kernels) induced by two different representations
- Dominant interpretation of generative models as neural structures with representational content; main target of critique
- How familiar a model is with a numeral system, manipulated via bases in Experiment 2.