Black-box internal state monitoring

Monitoring approach not requiring internal model access; applicable to proprietary systems and scales naturally with model size

Neighborhood — ranked by edge-count

Methods (1)

method

Logit-based self-report
implements
Primary self-report measure: probability-weighted expected value over all ten digit-token logits, yielding a continuous rating that preserves full distributional signal

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Numeric self-report is a viable, complementary black-box tool for monitoring LLM internal emotive states alongside white-box probe methodsclaim0.740
Central practical conclusion; both methods partially track the same latent state but with different failure modes
Internal Consistency Monitoringconcept0.729
The inferred mechanism underlying ESR whereby the model tracks coherence of its own outputs
internal states of agentconcept0.717
States that encode perceptual model and expectations; emerge naturally from free-energy optimization.
internal emotional stateconcept0.713
The possibility of a stably encoded, causally active emotional state within LLMs, as distinct from token-by-token semantic content
Intentional Control of Internal Statesfinding0.692
Models can modulate their internal representations when instructed or incentivized to 'think about' a concept; effect replicates across all tested models regardless of capability.
internalized visual operationconcept0.686
The visual operation embedded inside a functional token, requiring no visual supervision.
Observation of a person's inner state of wholeness can give reliable and objective information about the objective living character of systems in the external worldclaim0.679
Central claim of the chapter: what appears subjective (inner feeling) is actually an objective measuring instrument for external reality
Internality Criterionconcept0.677
Criterion requiring that causal influence of internal state on description be internal, not routed through sampled outputs; rules out pseudo-introspection via self-observation.