thinker:leonardo-blasLeonardo Blas
Authored papers (1)
Mean-difference-from-self (MDS) residual-stream injections outperform Personality Prompting (P²), the established baseline for OCEAN psychological steering, in open-ended generation across 11 of 14 tested LLMs—including Llama-3.1-8B-Instruct, Qwen3-8B, and gemma-3-12b-it—with steerability score (Φ) gains ranging from 3.61% to 16.44% on synthetic situational judgment tests scored by GPT-5.1. A hybrid method (PM) combining P² prompting with MDS injections extends this further, outperforming both constituents in 13 of 14 models with gains over P² of 5.56%–21.92% and over MDS alone of 3.30%–26.67%. These results directly overturn Banayeeanzade et al.'s prior finding that P² surpasses MD-based injection methods, with the gap traced to two methodological failures in prior work: sweeping injection strength in uncalibrated activation-space units and restricting search to a narrow coefficient range such as [0.4, 0.5, …, 1.5]. The psychological steering framework introduced here addresses both by defining centroid units—layer-wise calibrated scales anchored to the distance between construct and antithesis activation centroids—and operationalizing unbounded fluency-constrained sweeps using lightweight logistic-regression classifiers trained on Qwen3Embedding-0.6B embeddings rather than paid frontier APIs. OLS regression on 10-step α sweeps shows 89.23% of manipulated OCEAN trends achieve R² ≥ 0.85, confirming near-linear control consistent with the Linear Representation Hypothesis; however, the induced cross-trait covariance matches the Big Two metatrait model in only 46.15% of cases, implying that LLM representation geometry diverges meaningfully from the structure of human personality.
More papers — OpenAlex / S2
Affiliations (1)
- University of Southern California(institute)
Co-authors (3)
- Emilio Ferrara9 shared
- Robin Jia9 shared
- Thomas Lord3 shared
Recent mentions (1)
- papers-typedblas-2026-psychological.md