claim
active
claim:post-training-strategies-can-strongly-influence-performance-on-introspective-tasksPost-training strategies can strongly influence performance on introspective tasks
Assertion about the role of post-training in eliciting introspection.
Source paper
extracted_from(2026) · Lindsey, Jack
Neighborhood — ranked by edge-count
Communities (4)
community
- Spans attention head decomposition, benchmark awareness, and genomic pathogenicity prediction via neural models.
- Empirical investigation of how LMs access and report internal states across layers, using concept injection and thought detection on Claude models.
- LLM functional introspective awarenessmembers_ofEmpirical probing of language models' ability to detect and report their own internal concept representations
- How instruction tuning and RLHF elicit latent introspective capabilities in language models beyond base pretraining.
Concepts (1)
concept
- The central concept: the ability of a model to access and report on its internal states, as defined by the paper's criteria.
Claims (1)
claim
- Finding that base models have high false positives and no net positive performance.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Different post-training strategies substantially influence introspection task performance; 'helpful-only' variants show higher false positives but some achieve strong net performance.
- Base pretrained models show high false positive rates and achieve no net task performance on concept injection detection; post-training essential for introspection.
- Central interpretive claim and motivation for future work
- Introspective signals appear in middle layers but are suppressed by later post-training-shaped layers.finding0.769Mechanistic finding by Lindsey (2026) explaining how contemplative prompt may work: enables mid-layer introspection to reach output.
- Comparative prediction motivating future work contrasting different approaches to LLM self-knowledge
- Load-bearing summary of the paper's core finding about persona stability
- Ethical implication about the nature of AI training experience if the thesis holds
- Are there examples of models recognizing their introspective capability and then suppressing it?question0.757Cube Flipper's question prompted by the idea that supernormal capabilities might be hidden.