method
active
method:behavior-optimized-activation-path-recoveryBehavior-Optimized Activation Path Recovery
Method of optimizing activation-space interventions to produce behavioral paths along M_y, then measuring whether the resulting activation trajectories trace M_h curvature
Neighborhood — ranked by edge-count
Findings (1)
finding
- Demonstrates bidirectional causal link: behavior manifold geometry can be recovered by optimizing in representation space.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The general experimental approach of intervening along geometrically-defined paths rather than single-point or linear-direction interventions
- The path in activation space derived by optimizing steering interventions to produce outputs along the behavior manifold, independent of representation geometry.
- Organism's belief-guided action selection that instantiates generative model and maintains phenotypic states
- The conventional approach (e.g., SAEs, transcoders) of decomposing activations into interpretable features.
- The behavior a model would exhibit during real-world deployment, as opposed to evaluation behavior; the target of steering.
- Behavior that minimizes expected free energy under the generative model, balancing exploration and exploitation in a principled manner.
- Method that optimizes activation interventions so that resulting behaviors trace M_y, recovering activation paths that follow M_h curvature.