hypothesis
active
hypothesis:there-may-exist-a-global-introspective-faculty-or-steering-direction-that-improves-introspection-uniformly-across-all-conceptsThere may exist a global introspective faculty or steering direction that improves introspection uniformly across all concepts
Framed as an open problem; current evidence only points to local pair-specific improvement
Source paper
extracted_from(2026) · Nicolas Martorell · Bianchi, Bruno
Neighborhood — ranked by edge-count
Papers (1)
paper
Findings (1)
finding
- Cross-concept steering: focus→wellbeing R² increases from 0.30 (α=-4) to 0.76 (α=+4), ∆R²=0.30, p<0.001 in LLaMA-3.2-3Bassociated_withStrongest cross-concept introspection improvement; survives BH correction (q≈0.011)
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Secondary research question addressed through cross-concept steering experiments
- Most of 4×4 cross-concept steering matrix shows no significant effect; two conditions survive
- Cross-concept steering results; only 2 of 12 non-diagonal cells show significant introspection improvement
- Interpretation of the observation that the most capable models performed best.
- Forward-looking prediction about whether early-layer introspection generalizes to larger models or recurrent architectures
- Interpretive claim about the mechanistic substrate of introspection in LLMs
- Introspective capabilities may continue to develop with further improvements to model capabilitiesclaim0.777Forward-looking statement about future models.
- Alternative interpretations offered for why binary detection fails in Llama 3.1 8B but frontier models claim success