question

active

question:causalgym-covers-only-linguistic-tasks-benchmarking-interpretability-methods-on-non-linguistic-behaviours-remains-open

CausalGym covers only linguistic tasks; benchmarking interpretability methods on non-linguistic behaviours remains open

Identified limitation calling for broader task coverage in future work

Source paper

extracted_from

CausalGym: Benchmarking causal interpretability methods on linguistic tasks

(2024) · Aryaman Arora · Dan Jurafsky · Christopher Potts

Neighborhood — ranked by edge-count

Papers (1)

paper

CausalGym: Benchmarking causal interpretability methods on linguistic tasks
associated_with

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Multi-dimensional linear and non-linear interpretability methods have not been benchmarked on CausalGymquestion0.841
Identified gap in benchmark coverage; only 1D linear methods are benchmarked
CausalGym only includes English data; comparable experiments with other languages might yield substantially different resultsquestion0.841
Identified limitation/gap calling for cross-lingual extension of CausalGym
CausalGymframework0.790
Multi-task benchmark of linguistic behaviours for measuring causal efficacy of interpretability methods, adapted from SyntaxGym
Task accuracy on CausalGym increases consistently with model scale from 0.62 (14M) to 0.89 (6.9B)finding0.760
Scaling result showing larger pythia models perform better on CausalGym linguistic tasks
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models (Marks et al., 2025)concept0.758
Cited as enabling precise behavioral control through SAE features, extending the same methodological line
Causal abstraction is not enough for mechanistic interpretability because it becomes vacuous without assumptions about how models encode informationclaim0.751
Central thesis of the paper
The same charitable interpretation must be extended to all systems that display observable response patterns that are consistent with animal cognition, including artificial intelligences, metaplastic materials, and robotic systems.claim0.749
Call to extend the inference of sentience to non-biological systems as well.
Behavioural patterns associated with subjective experiences in humans are considered valid for inferring cognition in non-human animals but not in diverse other systems including plants.claim0.748
The double standard pointed out by S&C and endorsed by the authors.