method
active
method:role-vector-extractionRole Vector Extraction
Pipeline for extracting mean post-MLP residual stream activations from model responses under persona-specific system prompts to produce role vectors
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Residual StreamusesProposed pathway flowing through layers at each position; calculates K/V values that feed horizontal information flow.
Methods (1)
method
- Standardized PCA run on role vectors to find main axes of persona variation
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Method for obtaining concept vectors by subtracting activations from two contrasting prompts.
- Limitation question motivating future work on persona elicitation strategies
- The initial stage of uncertainty metabolization, pulling usable value from sensations.
- Method using activations from the prompt 'Tell me about {word}' minus mean over other random words to obtain concept vectors.
- Type of steering vector enabling zero-shot task execution, cited from Todd et al. 2024
- Steering vector extracted in Experiment 2 capturing latent representation of desired role behavior and honesty semantics
- Computes reflection direction as mean difference between MLP and attention output representations of first tokens in reflection vs. non-reflection steps
- Prior framework for monitoring and controlling character traits in LLMs via activation directions; this paper extends it to 275 roles