framework
active
framework:assistant-axis

Assistant Axis

Contrast vector between mean default Assistant activation and mean of all fully role-playing role vectors; main contribution of the paper

Neighborhood — ranked by edge-count

Methods (3)

method
  • Causal intervention technique: edit NLA explanation, reconstruct via AR, use difference as steering vector to manipulate model behavior.
  • Clamping activations along the Assistant Axis to remain above a minimum threshold (25th percentile), introduced as a stabilization method
  • Method for extracting linear directions by subtracting mean activations of contrastive groups; used to define the Assistant Axis

Concepts (1)

concept
  • Persona Space
    implements
    Low-dimensional space of activation directions corresponding to diverse character archetypes in LLMs

Frameworks (1)

framework
  • Prior framework for monitoring and controlling character traits in LLMs via activation directions; this paper extends it to 275 roles

Artifacts (1)

artifact

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.