framework
active
framework:patchscopesPatchscopes
Unifying framework for inspecting hidden representations of language models via representation interventions
Neighborhood — ranked by edge-count
Papers (1)
paper
Thinkers (1)
thinker
- Ghandeharioun, A.introducesLead author of Patchscopes paper on inspecting hidden representations
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A set property meaning all coordinate patches of its elements remain within the set; proved equivalent to axis-aligned hyperrectangles
- Method by Goldowsky-Dill et al. 2023 for localizing model behavior via targeted activation interventions
- Synchronization construct encapsulating shared data and protected access routines.
- Standard method in mechanistic interpretability that intervenes on activations; VPD flips this paradigm by patching parameters.
- Oscillatory gene network in the vertebrate embryo that paces somite formation; exhibits collective intelligence properties.
- Interpretability tools that decode information from internal model activations; here, linear probes are used for data attribution.
- Gradient-based method to estimate the effect of zeroing a feature on a specific logit difference.
- SAEs trained on pretrained Gemma-2 models used for steering in Gemma family experiments