question
active
question:multi-dimensional-linear-and-non-linear-interpretability-methods-have-not-been-benchmarked-on-causalgymMulti-dimensional linear and non-linear interpretability methods have not been benchmarked on CausalGym
Identified gap in benchmark coverage; only 1D linear methods are benchmarked
Source paper
extracted_from(2024) · Aryaman Arora · Dan Jurafsky · Christopher Potts
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Identified limitation calling for broader task coverage in future work
- Central thesis of the paper
- Identified limitation/gap calling for cross-lingual extension of CausalGym
- Establishes that the observed linear structure is not merely a representation of text probability
- Forward-looking assertion in conclusion about the lasting value of causal evaluation
- Claude 3 Opus ratings aligned with human judgment of feature descriptions.
- Interpretive claim connecting scale to abstraction level in LLM representations
- Task accuracy on CausalGym increases consistently with model scale from 0.62 (14M) to 0.89 (6.9B)finding0.752Scaling result showing larger pythia models perform better on CausalGym linguistic tasks