concept
active
concept:hidden-pathwaysHidden Pathways
Units, directions, or subcircuits inactive under natural inputs that become active under divergent interventions
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- Pernicious Divergenceassociated_withDivergences that activate hidden pathways or cause dormant behavioral changes, undermining mechanistic claims
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Latent variables causing observations in the generative model.
- The component used in latent reasoning to perform internal computation.
- Method by Goldowsky-Dill et al. 2023 for localizing model behavior via targeted activation interventions
- Mechanism allowing model to reason in SCRATCHPAD_REASONING tags not shown to users or used in RLHF
- Neural mechanism for tracking location through accumulation of self-movement vectors; shown to play the role of position encodings in TEM.
- Core computational method used to infer pain-belief from online observations of happiness
- Important caveat to the CL loss solution, noting it is a step not a complete fix
- The path in activation space derived by fitting the representation manifold, used to steer along the geometric structure of internal representations.