concept
active
concept:behavior-based-path

Behavior-based Path

The path in activation space derived by optimizing steering interventions to produce outputs along the behavior manifold, independent of representation geometry.

Neighborhood — ranked by edge-count

Methods (1)

method
  • The method of optimizing steering interventions in activation space to produce outputs that follow the behavior manifold, independent of the representation manifold.

Concepts (2)

concept
  • behavior manifold
    associated_with
    One-dimensional curved surface in output probability space; the paper shows this mirrors representation manifold structure.
  • The path in activation space derived by fitting the representation manifold, used to steer along the geometric structure of internal representations.

Findings (1)

finding

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The path traced through output probability distribution space as interventions are applied to activations
  • Observable behavioral pattern used to infer cognition; shared by plants and animals and proposed as evidence for sentience.
  • Method of optimizing activation-space interventions to produce behavioral paths along M_y, then measuring whether the resulting activation trajectories trace M_h curvature
  • Adaptive Behaviorconcept0.772
    Organism's belief-guided action selection that instantiates generative model and maintains phenotypic states
  • Behavior Spaceconcept0.766
    A geometric space of all output token probability distributions, equipped with Hellinger distance, used to visualize model behavior.
  • Path Patchingmethod0.765
    Method by Goldowsky-Dill et al. 2023 for localizing model behavior via targeted activation interventions
  • The general experimental approach of intervening along geometrically-defined paths rather than single-point or linear-direction interventions
  • Epistemic Behaviorconcept0.762
    World-disclosing behavior that resolves uncertainty; driven by epistemic value and novelty components of expected free energy