method
active
method:sampled-decoding-self-report

Sampled-decoding self-report

Temperature=0.8 sampled decoding for self-report; reduces collapse moderately but remains discrete and noisy

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Baseline self-report method selecting highest-probability token; shown to collapse to few uninformative values
  • Self-reportconcept0.799
    The model's verbal description of its internal state, which may be accurate or confabulated.
  • Self-reflectionconcept0.767
    The ability of reasoning LLMs to review and revise previous reasoning steps during inference
  • Self-Refineframework0.763
    Prior self-evolving agent method iteratively improving outputs through self-feedback
  • Models can detect and identify injected concept vectors ~20% of the time at optimal layer/strength in Opus 4.1, with immediacy suggesting internal rather than output-inferred detection.
  • Primary tool in human psychometrics for tracking latent internal states; adapted as the core measure in this paper for LLMs
  • A technique to filter model outputs; Redwood Research's project mentioned.