claim
active
claim:repe-is-a-new-frontier-in-open-ended-psychological-steering-of-llms-outperforming-prompting-when-properly-calibratedRepE is a new frontier in open-ended psychological steering of LLMs, outperforming prompting when properly calibrated
Central interpretive claim overturning prior reports; supported by 11-of-14 LLM wins for MDS over P2
Source paper
extracted_from(2026) · Leonardo Blas · Robin Jia · Emilio Ferrara
Neighborhood — ranked by edge-count
Papers (1)
paper
Findings (1)
finding
- MDS injections outperform P2 in open-ended generation in 11 of 14 LLMs with Phi gains of 3.61% to 16.44%associated_withsupportsPrimary quantitative result overturning prior reports that prompting outperforms representation engineering
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Central thesis statement of the paper's contribution
- The core interpretive question the paper narrows but cannot definitively answer
- Central empirical conclusion of the paper about the fundamental limits of truth directions.
- Prior finding suggesting affective-like states in LLMs; cited as convergent evidence for structured self-representation
- Central interpretive claim of the paper supported by multiple convergent analyses
- Counterintuitive interpretive claim from Experiment 2: suppressing deception features increases affirmations, which is opposite to what sycophancy predicts
- The paper's claim that theoretical convergence across GWT, RPT, HOT, IIT makes the findings non-coincidental
- Empirically grounded claim citing Perez et al. 2022, showing RLHF can backfire on the self-preservation dimension