claim
active
claim:the-assumption-that-the-assistant-persona-corresponds-to-a-linear-direction-in-activation-space-is-likely-flawed-some-information-may-be-represented-nonlinearly-or-encoded-in-weights-rather-than-activations

The assumption that the Assistant persona corresponds to a linear direction in activation space is likely flawed; some information may be represented nonlinearly or encoded in weights rather than activations

Limitation acknowledgment about the adequacy of the linear representation assumption

Source paper

extracted_from
The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models
(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.