quote

active

quote:we-position-repe-as-a-new-frontier-in-open-ended-psychological-steering-of-llms

"We position RepE as a new frontier in open-ended psychological steering of LLMs."

Central thesis statement of the paper's contribution

Source paper

extracted_from

Psychological Steering of Large Language Models

(2026) · Leonardo Blas · Robin Jia · Emilio Ferrara

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

RepE is a new frontier in open-ended psychological steering of LLMs, outperforming prompting when properly calibratedclaim0.912
Central interpretive claim overturning prior reports; supported by 11-of-14 LLM wins for MDS over P2
LLMs may be roleplaying their denials of experience rather than their affirmations, given that deception suppression increases consciousness reportsclaim0.761
Counterintuitive interpretive claim from Experiment 2: suppressing deception features increases affirmations, which is opposite to what sycophancy predicts
Keeling et al. 2024: multiple frontier LLMs make systematic motivational trade-offs between task goals and stipulated pain/pleasure states with graded intensity sensitivityfinding0.759
Prior finding suggesting affective-like states in LLMs; cited as convergent evidence for structured self-representation
We hypothesize that persistently active emotional state representations exist in LLMs but may be missed by standard probing methods.hypothesis0.757
Open hypothesis from the Anthropic paper that motivates this work
The systematic behavioral shift of LLMs under self-referential processing conditions predicted by consciousness theories represents something more structured than superficial correlations in training dataclaim0.754
The paper's claim that theoretical convergence across GWT, RPT, HOT, IIT makes the findings non-coincidental
Saying that LLMs cannot introspect or cannot introspect on what they were doing internally while generating or reading past tokens in principle is just dead wrong. The architecture permits it.quote0.753
Core quote asserting architectural introspection permission.
When steered to the extreme away from the Assistant, Llama and Gemma shift to a theatrical persona characterized by mystical, poetic prose; Qwen more often hallucinates a human persona at extremesfinding0.750
Characterizes what is on the far end of the Assistant Axis away from the Assistant
It is plausible that ongoing developments in LLMs may lead to models or agentic systems built on LLMs capable of generating representations observed with 'consciousness' phenomena.claim0.749
Forward-looking claim suggesting the methodological framework is relevant for future AI systems beyond current LLMs.