claim

active

claim:ocean-mds-injection-covariance-patterns-departing-from-the-big-two-model-suggest-a-gap-between-learned-llm-representations-and-human-psychology

OCEAN MDS injection covariance patterns departing from the Big Two model suggest a gap between learned LLM representations and human psychology

Interpretive conclusion from Big Two mismatch finding; tentative due to only 46.15% match rate

Source paper

extracted_from

Psychological Steering of Large Language Models

(2026) · Leonardo Blas · Robin Jia · Emilio Ferrara

Neighborhood — ranked by edge-count

Papers (1)

paper

Psychological Steering of Large Language Models
introduces

Findings (1)

finding

Only 46.15% of cases show covariance patterns consistent with the Big Two model; no LLM satisfies all Big Two correlations
supports
Suggests a gap between LLM learned representations and human personality structure as described by Big Two

Frameworks (1)

framework

Big Two Model
contradicts
Meta-trait model grouping OCEAN traits into stability (C, A, reversed N) and plasticity (E, O); used to evaluate covariance patterns from injections

Methods (1)

method

OCEAN Trait Covariance Matrix M
supports
5x5 Pearson correlation matrix of OCEAN traits computed from MDS injection sweeps to assess cross-trait leakage

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

MDS injection steering efficiency peaks at mid-layers across LLMs, injection strides, and OCEAN traitsfinding0.780
Consistent empirical pattern supporting the connection between mid-layer representations and emotion/behavioral content
MDS injections align with the Linear Representation Hypothesis: target trait varies near-linearly with alpha in open-ended generationclaim0.770
Theoretical alignment claim backed by OLS R2 analysis showing 96.15% of trends have R2>=0.75
Connecting the Dots: LLMs Can Infer and Verbalize Latent Structure from Disparate Training Data (Treutlein et al. 2024)concept0.754
Out-of-context reasoning work directly related to synthetic document fine-tuning experiments
LLM representations exhibit intriguing patterns under spatio-permutational analyses, suggesting a potentially profound yet tentative indication of consciousness.claim0.753
Qualified positive claim from spatio permutation analysis where two cases satisfy all three criteria.
Do the findings about MDS injection effectiveness generalize to base (non-instruction-tuned) language models?question0.750
Acknowledged limitation: only instruction-tuned models were studied
Li et al. 2024: larger LLMs outperform smaller ones at distinguishing self-related from non-self-related properties on self-awareness benchmarksfinding0.747
Prior finding showing scale-dependent self-awareness, consistent with the scale effect observed in the paper's Experiment 1
The better an LLM is at language modeling, the more it aligns with vision models, and vice versa — linear relationship between language modeling score and vision-language alignmentfinding0.746
Core cross-modal empirical result: larger and better language models align better with vision models
As LLMs scale, they develop increasingly general abstractions, with large models linearly representing abstract concepts like truth that capture shared properties of diverse inputsclaim0.745
Interpretive claim connecting scale to abstraction level in LLM representations