Model Internal Belief

The latent representational state of a model's answer confidence as decoded from activations, distinct from what appears in generated text

Neighborhood — ranked by edge-count

concept

Performative chain-of-thought
associated_with
Central concept: verbalized reasoning that occurs after the model has already internally settled on an answer, particularly on easier tasks.
Activation Probing
about
Technique of reading out model beliefs from internal activations before the final answer token is generated

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Epistemic Internalismframework0.795
The view that epistemic justification is fully determined by factors internal to the subject's mind, often linked to consciousness.
Internal model representationsconcept0.780
The latent activations or embeddings inside a neural network.
a model becomes strongly confident in its final answer, but continues generating tokens without revealing its internal beliefquote0.773
Core definitional quote for performative chain-of-thought
Internal uncertaintyconcept0.761
The model's internal representation of uncertainty hypothesized to trigger self-reflection
Zhu et al. 2024 - Language models represent beliefs of self and othersconcept0.756
Key prior finding that LLMs can internally represent beliefs of self and others, motivating SOO approach
Prior Beliefsconcept0.748
Beliefs about states before data; used to transcribe task instructions into agent's generative model
Internal Consistency Monitoringconcept0.742
The inferred mechanism underlying ESR whereby the model tracks coherence of its own outputs
internal emotional stateconcept0.742
The possibility of a stably encoded, causally active emotional state within LLMs, as distinct from token-by-token semantic content