concept
active
concept:the-hydra-effect-emergent-self-repair-in-language-model-computations-mcgrath-et-al-2023The Hydra Effect: Emergent Self-Repair in Language Model Computations (McGrath et al., 2023)
Related work on model self-repair, contrasted with ESR which involves explicit active correction
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Claim about model phenomenology; models talk about luminousness and can be terrified or love it.
- Related work demonstrating LLM introspective capabilities with scale-dependent pattern paralleling ESR
- Foundational claim of the paper, defining self-evidencing.
- language models recapitulate cyclic structure of human concepts from pretraining datahypothesis0.761Explanation for why manifold geometry emerges: implicit structure in training data (co-occurrence patterns) shapes internal representations.
- Showed transformer representations predict brain representations in language areas; motivates Discussion about cortex as transformer.
- Analogy between LLM incoherence and schizophrenia symptoms
- Neural networks and physical systems with emergent collective computational abilities (Hopfield, 1982)concept0.746Original Hopfield network paper; the attractor dynamics in TEM memory retrieval are a continuous version of this.
- Cited as enabling precise behavioral control through SAE features, extending the same methodological line