concept
active
concept:incoherenceincoherence
Nonsensical or unphysical model outputs that result when interventions cross voids in activation space.
Neighborhood — ranked by edge-count
Claims (1)
claim
- General principle derived from the Mountain Car experiment: curved manifold-following yields coherent manipulation, linear shortcuts fail.
Findings (1)
finding
- Empirical demonstration that a semantically meaningful variable is encoded as a curved manifold, and that respecting its geometry is critical for effective intervention.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Kuhn's concept: the inability of ideas from one paradigm to be translated into the terms of another, causing communication breakdowns.
- A property that makes a segment of space stand out as a center; determined by symmetry, connectedness, convexity, etc.
- Multiple possible meanings for words like Alice, disambiguated by context; harder when grammar and meaning intertwine
- Non-uniform placement, like cars parked irregularly, that increases relationship and life.
- Ian Goodfellow quote used to illustrate the pre-paradigmatic state of interpretability research
- Inherent in Linda because an in statement chooses one matching tuple arbitrarily; essential for many parallel patterns.
- The property of an AI being safe to shut down or modify; discussed in context of GPT.