claim

active

claim:the-difference-in-means-direction-is-the-unique-nullity-1-projection-kernel-that-eliminates-all-linearly-recoverable-binary-classification-information-from-a-dataset

The difference-in-means direction is the unique nullity-1 projection kernel that eliminates all linearly-recoverable binary classification information from a dataset

Formal consequence of Belrose et al. (2023) Theorem G.1 connecting mass-mean probing to optimal linear concept erasure

Source paper

extracted_from

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

(2023) · Samuel Marks · Max Tegmark

Neighborhood — ranked by edge-count

Papers (1)

paper

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
supports

Frameworks (1)

framework

Mass-Mean Probing
supports
Introduced in this paper: an optimization-free probing technique using difference-in-means direction with optional covariance correction

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Simple difference-in-mean probes generalize as well as other probing techniques while identifying directions which are more causally implicated in model outputsclaim0.769
Key methodological claim: MM probes are both competitive in accuracy and superior in causal influence
The two-dimensional subspace reported by Burger et al. (2024) seems to reflect a stage of transition in the model's processing, rather than a universal property of truth directions.quote0.760
Load-bearing interpretive claim about the layer-specificity of Burger et al.'s finding.
A family of contrastive learners converges to a representation whose kernel is the pointwise mutual information (PMI) of the underlying eventshypothesis0.744
Mathematical formalization of what representation models converge to
Difference-in-Means Directionconcept0.743
Vector from mean of false representations to mean of true representations; core of mass-mean probing
Logistic regression fails to identify the true feature direction when a confounding feature is non-orthogonal to the truth direction, converging instead to the maximum margin separatorclaim0.742
Motivates the introduction of mass-mean probing as an alternative to LR
probably helps not only with faithful reconstruction but also creates interference patterns that encode nuanced information about the deltas and convergences between states.quote0.738
Key quote connecting path redundancy to interferometric information encoding.
Assuming linear representations enables identifying the location of certain variables in a DNN, but many insights fail to generalise when more powerful non-linear maps are usedclaim0.738
Interpretive claim about what linear DAS results actually tell us
The direction of information increase is relative to the observer or user of the computationclaim0.730
Example: 3×5→15 is a natural computation, but 15→3×5 (prime factorization) is also useful, showing that the 'gain' depends on the choice of normal form.