claim
active
claim:introspective-capacity-is-present-from-the-first-conversation-turn-not-requiring-multi-turn-context-to-emergeIntrospective capacity is present from the first conversation turn, not requiring multi-turn context to emerge
Three of four concepts show significant introspection at turn 1; rules out joint temporal drift as sole explanation
Source paper
extracted_from(2026) · Nicolas Martorell · Bianchi, Bruno
Neighborhood — ranked by edge-count
Findings (2)
finding
- LLaMA-3.2-1B impulsivity introspection: ρ=0.21, p<10⁻⁴ (significant but weaker than 3B ρ=0.52)contradictssupportsImpulsivity shows significant introspection in 1B but declines in 8B; non-monotonic scaling
- Demonstrates introspection is present from the first conversation turn without needing multi-turn context
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A caveat qualifying the main claim.
- Alternative interpretations offered for why binary detection fails in Llama 3.1 8B but frontier models claim success
- Introspective capabilities may continue to develop with further improvements to model capabilitiesclaim0.814Forward-looking statement about future models.
- Introspective capacity may follow a simple monotonic scaling law across all concepts and architectureshypothesis0.813The paper treats this as possible but unconfirmed; current evidence shows concept-specific scaling only
- Conceptual distinction motivated by entropy analyses showing probe and report entropy can diverge under steering
- Key quantitative characterization of the layer-dependence of partial introspection
- Why does introspective capacity vary concept-by-concept and what mechanisms could stabilize it over time?question0.797Open question identified by the paper as direction for future work
- Cross-concept steering results; only 2 of 12 non-diagonal cells show significant introspection improvement