claim
active
claim:given-the-linear-representation-hypothesis-and-binary-linguistic-features-1d-dii-is-sufficiently-expressive-for-controlling-model-behaviour-in-causalgym

Given the linear representation hypothesis and binary linguistic features, 1D DII is sufficiently expressive for controlling model behaviour in CausalGym

Theoretical justification for the methodological choice of 1D DII throughout the benchmark

Source paper

extracted_from
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
(2024) · Aryaman Arora · Dan Jurafsky · Christopher Potts

Neighborhood — ranked by edge-count

Frameworks (1)

framework
  • The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.