Hidden Pathways

Units, directions, or subcircuits inactive under natural inputs that become active under divergent interventions

Neighborhood — ranked by edge-count

paper

concept

Pernicious Divergence
associated_with
Divergences that activate hidden pathways or cause dormant behavioral changes, undermining mechanistic claims

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Hidden Statesconcept0.743
Latent variables causing observations in the generative model.
learnable hidden embeddingsconcept0.698
The component used in latent reasoning to perform internal computation.
Path Patchingmethod0.695
Method by Goldowsky-Dill et al. 2023 for localizing model behavior via targeted activation interventions
Hidden Chain-of-Thought Scratchpadmethod0.693
Mechanism allowing model to reason in SCRATCHPAD_REASONING tags not shown to users or used in RLHF
Path Integrationconcept0.686
Neural mechanism for tracking location through accumulation of self-movement vectors; shown to play the role of position encodings in TEM.
Hidden Markov Modelmethod0.676
Core computational method used to infer pain-belief from online observations of happiness
Minimizing divergence magnitude does not guarantee elimination of hidden pathways; it only reduces the risk surfaceclaim0.664
Important caveat to the CL loss solution, noting it is a step not a complete fix
Representation-based Pathconcept0.660
The path in activation space derived by fitting the representation manifold, used to steer along the geometric structure of internal representations.