claim
active
claim:llm-personality-self-reports-are-illusory-post-training-alignment-creates-stable-human-like-reports-dissociated-from-actual-behavior-han-et-al-2025

LLM personality self-reports are illusory: post-training alignment creates stable human-like reports dissociated from actual behavior (Han et al. 2025)

Skeptical prior work motivating the need to validate self-reports against internal states rather than taking them at face value

Source paper

extracted_from
Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation
(2026) · Nicolas Martorell · Bianchi, Bruno

Neighborhood — ranked by edge-count

Thinkers (1)

thinker
  • Pengrui Han
    introduces
    Showed LLM personality self-reports are illusory; key skeptical prior work motivating the validation approach

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.