concept
active
concept:model-internal-belief

Model Internal Belief

The latent representational state of a model's answer confidence as decoded from activations, distinct from what appears in generated text

Neighborhood — ranked by edge-count

Concepts (2)

concept
  • Central concept: verbalized reasoning that occurs after the model has already internally settled on an answer, particularly on easier tasks.
  • Technique of reading out model beliefs from internal activations before the final answer token is generated

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.