claim
active
claim:basal-introspective-performance-is-not-always-maximal-and-some-failure-cases-are-solvable-by-representational-intervention-rather-than-reflecting-complete-absence-of-introspective-capacityBasal introspective performance is not always maximal and some failure cases are solvable by representational intervention rather than reflecting complete absence of introspective capacity
Supported by cross-concept steering finding that focus→wellbeing steering dramatically improves introspection
Source paper
extracted_from(2026) · Nicolas Martorell · Bianchi, Bruno
Neighborhood — ranked by edge-count
Findings (2)
finding
- Strongest cross-concept introspection improvement; survives BH correction (q≈0.011)
- Second significant cross-concept introspection improvement; marginal after BH correction (q≈0.066)
Questions (1)
question
- Secondary research question addressed through cross-concept steering experiments
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A caveat qualifying the main claim.
- Comparative prediction motivating future work contrasting different approaches to LLM self-knowledge
- We hypothesize that partial introspection may fail under adversarial prompts, distribution shift, and multiple simultaneous injectionshypothesis0.774Stress-test prediction about robustness limits of the partial introspection finding
- Key quantitative characterization of the layer-dependence of partial introspection
- Central empirical claim of the paper supported by statistical tests
- Forward-looking prediction about whether early-layer introspection generalizes to larger models or recurrent architectures
- Cube Flipper's prediction about convergence of insight practice on field model.
- Opus 4.1 is most effective at recognizing injected abstract concepts (e.g., justice, peace) but detects other categories too.