Sampled-decoding self-report

Temperature=0.8 sampled decoding for self-report; reduces collapse moderately but remains discrete and noisy

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Greedy-decoded self-reportmethod0.822
Baseline self-report method selecting highest-probability token; shown to collapse to few uninformative values
Self-reportconcept0.799
The model's verbal description of its internal state, which may be accurate or confabulated.
Self-reflectionconcept0.767
The ability of reasoning LLMs to review and revise previous reasoning steps during inference
Self-Refineframework0.763
Prior self-evolving agent method iteratively improving outputs through self-feedback
Attention probes for belief decodingconcept0.754
Self-report of Injected Thoughtsfinding0.753
Models can detect and identify injected concept vectors ~20% of the time at optimal layer/strength in Opus 4.1, with immediacy suggesting internal rather than output-inferred detection.
Numeric self-reportmethod0.751
Primary tool in human psychometrics for tracking latent internal states; adapted as the core measure in this paper for LLMs
Rejection samplingmethod0.751
A technique to filter model outputs; Redwood Research's project mentioned.