principled control via internals using geometry

Claim that geometry enables accurate intervention; steering moves from direction-finding to geometry-finding.

Neighborhood — ranked by edge-count

claim

concept

Principled Control via Intervention on Internals
related_to
The goal of mechanistically-grounded, reliable control of neural network behavior via activation interventions

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Geometryconcept0.750
The actual shapes and spatial relationships of buildings, essential to living structure.
What is the right geometry for enabling principled steering of neural network behavior?question0.745
The reframed steering problem the paper introduces
What Principles Determine Which Scale And Mechanism Isquestion0.742
What Principled Criteria Can Be Used To Assessquestion0.740
Networks compute on geometric manifolds and control should respect that geometry.claim0.739
Strong interpretive assertion linking discovery and control: neural computation is fundamentally manifold-structured.
steering (intervention on internals)concept0.735
General technique of modifying activations to control model behavior.
Intentional Control of Internal Statesfinding0.735
Models can modulate their internal representations when instructed or incentivized to 'think about' a concept; effect replicates across all tested models regardless of capability.
The real character of the world, its flesh, is governed by the centers in the geometry.claim0.732
Strong statement that all qualitative aspects of places and situations are produced by the spatial system of centers.