concept
active
concept:linear-interventionlinear intervention
Manipulation of activations along a straight line; shown to fail when it crosses voids, in contrast to manifold-following interventions.
Neighborhood — ranked by edge-count
Claims (1)
claim
- General principle derived from the Mountain Car experiment: curved manifold-following yields coherent manipulation, linear shortcuts fail.
Findings (1)
finding
- Empirical demonstration that a semantically meaningful variable is encoded as a curved manifold, and that respecting its geometry is critical for effective intervention.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Intervention mode where multiple interventions are applied simultaneously to the same base computation graph
- A straight vector in activation space, traditionally used for concept manipulation; claimed to be insufficient when true concept geometry is curved.
- Typical approach that adds a scaled steering vector to representations; the paper argues this is mismatched with actual representation geometry.
- The sequential, continuous order of text, often challenged by diagrammatic branching.
- Intervention mode where interventions are applied sequentially, each building on the previous one
- The fundamental operation of making in-place changes to model activations, placing the model in a counterfactual state
- pyvene's approach of storing interventions as shareable serialized objects rather than runtime code
- Property that additive modifications to activations affect all downstream computations, enabling tractable behavioral control