method
active
method:linear-probing

Linear Probing

Used to evaluate representation quality across VTAB tasks

Neighborhood — ranked by edge-count

Frameworks (2)

framework
  • The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior
  • Multi-task benchmark of linguistic behaviours for measuring causal efficacy of interpretability methods, adapted from SyntaxGym

Methods (1)

method
  • Linear Probe
    related_to
    Simple linear classifiers trained on model activations used as the probing technique within the introduced method.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Method for fitting a linear classifier on collected activations to predict task-relevant features
  • Nguyen et al. trained linear probes on activations to distinguish evaluation from deployment scenarios.
  • Probing Methodsmethod0.796
    Top-down interpretability approach studying linguistic properties at various residual stream stages; contrasted with the paper's bottom-up mechanistic approach
  • Earlier interpretability method applying classifiers to DNN hidden representations; shares complexity-accuracy dilemma with causal abstraction
  • Linear Decodingmethod0.786
    Correlative technique measuring the type of information encoded in distributed representations via linear predictability.
  • linearityconcept0.779
    The sequential, continuous order of text, often challenged by diagrammatic branching.
  • Probesconcept0.765
    Interpretability tools that decode information from internal model activations; here, linear probes are used for data attribution.
  • The idea that features are encoded as directions in activation space.