framework
active
framework:patchscopes

Patchscopes

Unifying framework for inspecting hidden representations of language models via representation interventions

Neighborhood — ranked by edge-count

Thinkers (1)

thinker
  • Lead author of Patchscopes paper on inspecting hidden representations

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Patch-Closureconcept0.776
    A set property meaning all coordinate patches of its elements remain within the set; proved equivalent to axis-aligned hyperrectangles
  • Path Patchingmethod0.764
    Method by Goldowsky-Dill et al. 2023 for localizing model behavior via targeted activation interventions
  • monitorsconcept0.734
    Synchronization construct encapsulating shared data and protected access routines.
  • Standard method in mechanistic interpretability that intervenes on activations; VPD flips this paradigm by patching parameters.
  • Segmentation Clockconcept0.722
    Oscillatory gene network in the vertebrate embryo that paces somite formation; exhibits collective intelligence properties.
  • Probesconcept0.718
    Interpretability tools that decode information from internal model activations; here, linear probes are used for data attribution.
  • Gradient-based method to estimate the effect of zeroing a feature on a specific logit difference.
  • GemmaScope SAEsconcept0.700
    SAEs trained on pretrained Gemma-2 models used for steering in Gemma family experiments