finding
active
finding:logit-self-report-drift-positive-for-all-three-llama-sizes-turn-slopes-0-159-0-038-0-141-all-p-10-20-but-does-not-increase-monotonically-with-scale

Logit self-report drift positive for all three LLaMA sizes (turn slopes 0.159, 0.038, 0.141; all p<10⁻²⁰) but does not increase monotonically with scale

Unlike probe drift, report drift magnitude does not follow a clean scaling law; size-slope is negative

Source paper

extracted_from
Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation
(2026) · Nicolas Martorell · Bianchi, Bruno

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.