Internal uncertainty

The model's internal representation of uncertainty hypothesized to trigger self-reflection

Neighborhood — ranked by edge-count

hypothesis

Reasoning LLMs trigger reflection when their internal uncertainty is high
associated_with
Core hypothesis linking internal uncertainty to self-reflection behavior, tested via probing experiments

concept

Reflection direction
associated_with
A direction in the model's representation space that governs self-reflection behavior, computed as mean difference between reflection and non-reflection embeddings

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

internal emotional stateconcept0.775
The possibility of a stably encoded, causally active emotional state within LLMs, as distinct from token-by-token semantic content
Model Internal Beliefconcept0.761
The latent representational state of a model's answer confidence as decoded from activations, distinct from what appears in generated text
Epistemic Internalismframework0.760
The view that epistemic justification is fully determined by factors internal to the subject's mind, often linked to consciousness.
How Does Consciousness Relate To Uncertainty About Internalquestion0.756
Moral Uncertaintyconcept0.756
Uncertainty about which moral theory is correct, used to argue for hedged policies regarding super-beneficiary creation
Internal Consistency Monitoringconcept0.756
The inferred mechanism underlying ESR whereby the model tracks coherence of its own outputs
Internal conflict in AIconcept0.752
Feature representing dilemmas, inner conflict; used to correct deceptive behavior.
Consciousness As Uncertaintyconcept0.750