concept
active
concept:principled-control-via-intervention-on-internals

Principled Control via Intervention on Internals

The goal of mechanistically-grounded, reliable control of neural network behavior via activation interventions

Neighborhood — ranked by edge-count

Concepts (2)

concept
  • Claim that geometry enables accurate intervention; steering moves from direction-finding to geometry-finding.
  • Central framework: steering neural networks by intervening along the curved manifold where a concept lives, rather than in straight lines through activation space.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.