claim
active
claim:simple-difference-in-mean-probes-generalize-as-well-as-other-probing-techniques-while-identifying-directions-which-are-more-causally-implicated-in-model-outputsSimple difference-in-mean probes generalize as well as other probing techniques while identifying directions which are more causally implicated in model outputs
Key methodological claim: MM probes are both competitive in accuracy and superior in causal influence
Source paper
extracted_from(2023) · Samuel Marks · Max Tegmark
Neighborhood — ranked by edge-count
Papers (1)
paper
Findings (3)
finding
- Mass-mean probe directions outperform LR and CCS in causal intervention experiments (NIE) in 7/8 experimental conditionsassociated_withsupportsCore result showing MM is superior to LR for causal implication despite similar classification accuracy
- Likely-trained MM probe is a surprisingly effective causal baseline due to correlation between truth and probability on sp_en_trans
- Despite being simpler and optimization-free, MM probes match accuracy of other techniques at scale
Questions (1)
question
- Open question raised in §7.1 about an unexplained anomalous result
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Key interpretive claim from Case Study II distinguishing probe accuracy from causal relevance
- Key methodological insight: introspection enables a new probe validation criterion beyond conventional separation metrics
- Shows the key divide is passive vs. active framing, not the specific wording of instructions.
- Explains why cities+neg_cities and larger_than+smaller_than training sets yield better OOD accuracy
- Motivation for causal evaluation over purely behavioural probing accuracy
- Key limitation acknowledged by authors.
- The ability of probes trained on one dataset to transfer accurately to topically and structurally different datasets
- What if the concept being manipulated does not lie on a straight line in the model's representations?question0.780The motivating question that opens the paper and leads to the development of manifold steering.