Greedy-decoded self-report

Baseline self-report method selecting highest-probability token; shown to collapse to few uninformative values

Neighborhood — ranked by edge-count

method

Logit-based self-report
contradicts
Primary self-report measure: probability-weighted expected value over all ten digit-token logits, yielding a continuous rating that preserves full distributional signal

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Sampled-decoding self-reportmethod0.822
Temperature=0.8 sampled decoding for self-report; reduces collapse moderately but remains discrete and noisy
Greedy-decoded self-reports in LLaMA-3.2-3B collapse to 1.1–3.9 distinct values on a 10-point scalefinding0.793
Demonstrates that default decoding masks introspective capacity; entropy 0.03–1.10 bits
Self-reportconcept0.763
The model's verbal description of its internal state, which may be accurate or confabulated.
Epsilon-greedy explorationmethod0.753
A heuristic exploration strategy that selects a random action with probability epsilon, otherwise acts greedily.
Logit-based self-report unmasks introspective capacity that greedy decoding concealsclaim0.747
Central methodological contribution: computing probability-weighted expected value over digit-token logits recovers continuous, informative signal
Self-Evidencingconcept0.728
Concise framing of action-perception cycle whereby agents minimize surprise through perception and action.
Selfingconcept0.726
Process of reifying one's identity as an independent self; meditation practices aim to decrease selfing.
Self-correcting search with interpretability feedbackconcept0.722