concept
active
concept:internal-uncertaintyInternal uncertainty
The model's internal representation of uncertainty hypothesized to trigger self-reflection
Neighborhood — ranked by edge-count
Hypotheses (1)
hypothesis
- Core hypothesis linking internal uncertainty to self-reflection behavior, tested via probing experiments
Concepts (1)
concept
- Reflection directionassociated_withA direction in the model's representation space that governs self-reflection behavior, computed as mean difference between reflection and non-reflection embeddings
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The possibility of a stably encoded, causally active emotional state within LLMs, as distinct from token-by-token semantic content
- The latent representational state of a model's answer confidence as decoded from activations, distinct from what appears in generated text
- The view that epistemic justification is fully determined by factors internal to the subject's mind, often linked to consciousness.
- Uncertainty about which moral theory is correct, used to argue for hedged policies regarding super-beneficiary creation
- The inferred mechanism underlying ESR whereby the model tracks coherence of its own outputs
- Feature representing dilemmas, inner conflict; used to correct deceptive behavior.