concept
active
concept:neural-network-intervention

Neural Network Intervention

The fundamental operation of making in-place changes to model activations, placing the model in a counterfactual state

Neighborhood — ranked by edge-count

Concepts (5)

concept
  • Counterfactual State
    associated_with
    The state a neural network is placed in when its activations are modified via intervention
  • Intervention mode where interventions are applied sequentially, each building on the previous one
  • Dict-based configuration format in pyvene that outlines which model components will be intervened upon
  • Intervention mode where multiple interventions are applied simultaneously to the same base computation graph
  • Intervention targeting specific dimensional subsets of activation vectors rather than full representations

Artifacts (1)

artifact

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The artificial agents trained with RL in this study, whose latent dynamics are analyzed for causal emergence.
  • Neural Networksconcept0.802
  • neural cognitionconcept0.768
    Cognition in nervous systems, used as a modelling target
  • Property that additive modifications to activations affect all downstream computations, enabling tractable behavioral control
  • The use of interventions (rather than correlations) to establish a causal link between representation geometry and behavioral geometry.
  • Manipulation of activations along a straight line; shown to fail when it crosses voids, in contrast to manifold-following interventions.
  • Fundamental operation for causal abstraction analysis; forces neurons to take values from source inputs to create counterfactuals.
  • Scalar function of the input corresponding to a direction in the vector space of neuron activations; claimed to be the fundamental unit of neural networks