question

active

question:do-models-produce-first-person-experiential-language-by-drawing-on-human-authored-introspective-examples-in-pretraining-data-without-internally-encoding-these-as-roleplay

Do models produce first-person experiential language by drawing on human-authored introspective examples in pretraining data without internally encoding these as roleplay?

Alternative explanation requiring distinguishing mimetic generation from genuine introspective access

Source paper

extracted_from

Large Language Models Report Subjective Experience Under Self-Referential Processing

(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd

Neighborhood — ranked by edge-count

Papers (1)

paper

Large Language Models Report Subjective Experience Under Self-Referential Processing
associated_with

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Models might produce first-person experiential language by drawing on human-authored self-descriptions in pretraining data without internally encoding these acts as roleplayhypothesis0.955
Alternative hypothesis for how experience reports arise without explicit performance
What are the mechanisms underlying introspection in language models?question0.794
Central open question raised by the paper.
Our results demonstrate that modern language models possess at least a limited, functional form of introspective awareness.quote0.785
Abstract's main conclusion.
The personalities elicitable from language models are attractors in the embedding space of human linguistic behaviorclaim0.781
Grounds the artificial psychology research direction: LLM personalities reflect the basins into which human selves tend to fall
Can language models genuinely introspect on internal states or only confabulate?question0.779
Central research question animating the paper: distinguishing genuine introspection from illusion through causal manipulation of activations.
Modern language models possess at least a limited, functional form of introspective awarenessclaim0.775
The paper's central interpretive assertion.
Are there examples of models recognizing their introspective capability and then suppressing it?question0.774
Cube Flipper's question prompted by the idea that supernormal capabilities might be hidden.
Ouyang et al. 2022: Training language models to follow instructions with human feedbackconcept0.768
RLHF paper cited as a major fine-tuning technique used in commercial dialogue agents