method
active
method:token-100-correlation-persistence-metricToken-100 correlation persistence metric
Measures emotion feature persistence as correlation between z-scored activation at token 0 and token 100 across all eligible target model tokens
Neighborhood — ranked by edge-count
Concepts (1)
concept
- The property of emotion features maintaining elevated activation well beyond the local token context that triggered them
Datasets (1)
dataset
- Dataset of 240 multi-turn conversations per model between target models and Claude Sonnet 4.5 as simulated human, used to measure probe persistence
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Quantitative measure of emotion feature persistence vs random baseline in Cogito
- Baseline persistence of any probe direction arising from the autoregressive nature of LLMs, not specific to emotion content
- Persistence metric for SAE features: P(fires at t+100 | fired at t) minus P(fires at t+100 | did not fire at t)
- Core logical puzzle: if an agent does not change, it dies; if it changes, the self ceases to exist. Applies to all scales from organelles to evolutionary lineages.
- Core logical paradox: if a species fails to change it dies; if it changes, it ceases to exist. Same applies to individuals.
- The causal steering experiment persists KV state over steered tokens so downstream effects can be observed without continued steering
- Open methodological question acknowledged as limitation