concept
active
concept:backtracking-latentsBacktracking Latents
SAE latents that rise as correction approaches and peak after self-correction begins, complementing OTDs
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Off-Topic Detector Latentsassociated_with26 SAE latents identified as differentially activated during off-topic content and causally linked to ESR
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A sequence where no step forces the undoing of previous steps; these sequences lie at the core of the theory of living process and can be identified experimentally.
- Complementary temporal activation pattern suggesting distinct roles for OTD and backtracking latent classes
- Claims that although a purely mathematical identification method is lacking, a well-defined experimental procedure exists to find good sequences.
- Statistical regularities stored in pretrained models.
- Reasoning approach using learnable hidden embeddings.
- Interpretive claim linking observable CoT behaviors to genuine internal uncertainty shifts
- Methods that use latent reasoning; lack task generalization and are difficult to train with autoregressive parallelization.
- Feature detecting mentions of backdoors and hidden malicious functionality.