method
active
method:pm-hybrid-methodPM Hybrid Method
Hybrid method combining Personality Prompting (P2) with MDS injections; best overall steering method
Neighborhood — ranked by edge-count
Papers (1)
paper
Frameworks (1)
framework
- Established baseline for OCEAN steering via personality-descriptive system prompts; compared against injection methods throughout
Methods (1)
method
- MDS InjectionusesMean-difference vectors derived from self-statement activations (h_s); best-performing injection method in open-ended generation
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Using language model log probabilities of answer choices (A)/(B) to produce preference labels.
- Computational algorithm mentioned as an example of diverse problem-solving strategies.
- Method for assessing consciousness in nonhuman animals by identifying behavioral/anatomical markers from humans and extrapolating; proposed adaptation for AI.
- Optimization technique that computes weight changes by following the gradient of an error function; contrasted with evolutionary stochastic search.
- Top-down interpretability approach studying linguistic properties at various residual stream stages; contrasted with the paper's bottom-up mechanistic approach
- Methods for bottom-up model space construction; contrasted with top-down BMR approach of this paper
- Alexander's proposed approach using high technology to provide processes (not components) that create sophisticated elements cheaply while fitting local circumstance.
- Methods that use latent reasoning; lack task generalization and are difficult to train with autoregressive parallelization.