claim
active
claim:introspective-ability-is-concept-specific-quality-differs-across-emotive-concepts-and-the-same-intervention-helps-some-concepts-but-not-othersIntrospective ability is concept-specific: quality differs across emotive concepts and the same intervention helps some concepts but not others
Cross-concept steering results; only 2 of 12 non-diagonal cells show significant introspection improvement
Source paper
extracted_from(2026) · Nicolas Martorell · Bianchi, Bruno
Neighborhood — ranked by edge-count
Findings (3)
finding
- Strongest cross-concept introspection improvement; survives BH correction (q≈0.011)
- Impulsivity introspective fidelity decreases from turn 1 to turn 10: ∆R²=-0.28 in LLaMA-3.2-3BsupportsOpposite temporal trend to wellbeing/interest/focus; introspective fidelity weakens over conversation for impulsivity
- Introspective fidelity erodes in Qwen as conversations progress; contrasts with LLaMA-3B trend
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Secondary research question addressed through cross-concept steering experiments
- Why does introspective capacity vary concept-by-concept and what mechanisms could stabilize it over time?question0.826Open question identified by the paper as direction for future work
- Alternative interpretations offered for why binary detection fails in Llama 3.1 8B but frontier models claim success
- Conceptual distinction motivated by entropy analyses showing probe and report entropy can diverge under steering
- Most of 4×4 cross-concept steering matrix shows no significant effect; two conditions survive
- Introspective capabilities may continue to develop with further improvements to model capabilitiesclaim0.801Forward-looking statement about future models.
- A caveat qualifying the main claim.
- There may exist a global introspective faculty or steering direction that improves introspection uniformly across all conceptshypothesis0.799Framed as an open problem; current evidence only points to local pair-specific improvement