method
active
method:sampled-decoding-self-reportSampled-decoding self-report
Temperature=0.8 sampled decoding for self-report; reduces collapse moderately but remains discrete and noisy
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Baseline self-report method selecting highest-probability token; shown to collapse to few uninformative values
- The model's verbal description of its internal state, which may be accurate or confabulated.
- The ability of reasoning LLMs to review and revise previous reasoning steps during inference
- Prior self-evolving agent method iteratively improving outputs through self-feedback
- Models can detect and identify injected concept vectors ~20% of the time at optimal layer/strength in Opus 4.1, with immediacy suggesting internal rather than output-inferred detection.
- Primary tool in human psychometrics for tracking latent internal states; adapted as the core measure in this paper for LLMs
- A technique to filter model outputs; Redwood Research's project mentioned.