Minimum Description Length Probing

Probing approach that explicitly controls probe complexity via information-theoretic criteria

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Minimum Description Lengthframework0.867
Related to variational free energy; compressibility corresponds to complexity reduction in structure learning
Diagnostic Probingmethod0.760
Earlier interpretability method applying classifiers to DNN hidden representations; shares complexity-accuracy dilemma with causal abstraction
Sparse Probingmethod0.750
Method from Gurnee et al. 2023 for finding feature directions including individual neuron analysis
Probing Methodsmethod0.737
Top-down interpretability approach studying linguistic properties at various residual stream stages; contrasted with the paper's bottom-up mechanistic approach
Linear Probingmethod0.735
Used to evaluate representation quality across VTAB tasks
Unsupervised Probingmethod0.725
Probing approach avoiding supervision to sidestep complexity-accuracy tradeoff
base model probingmethod0.724
Method of using base models (no post-training) to observe spontaneous self-referential behaviors without confound of memorized introspection language.
Probesconcept0.711
Interpretability tools that decode information from internal model activations; here, linear probes are used for data attribution.