framework
active
framework:causalgymCausalGym
Multi-task benchmark of linguistic behaviours for measuring causal efficacy of interpretability methods, adapted from SyntaxGym
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (9)
method
- The core method introduced in this paper: finds alignments between high-level causal variables and distributed neural representations via gradient descent.
- Statistical method used to analyze neural activity data.
- Core intervention method used throughout CausalGym; operates on one-dimensional non-basis-aligned subspace of activation space
- Method for extracting linear directions by subtracting mean activations of contrastive groups; used to define the Assistant Axis
- Linear ProbingusesUsed to evaluate representation quality across VTAB tasks
- Unsupervised feature-finding method using cluster centroid difference as feature direction
- IID mass-mean probing coincides with LDA when covariance is known; used to derive the corrected probe formula
- SelectivityusesAdapted control task metric measuring difference between odds-ratio on original task and arbitrary-label control task
- Log odds-ratiousesPrimary evaluation metric measuring causal effect of interventions; greater value indicates larger causal effect
Frameworks (2)
framework
- Targeted syntactic evaluationimplementsBenchmarking paradigm using minimally-different grammatical sentence pairs to test LM linguistic competence
- SyntaxGymextendsOnline platform for targeted evaluation of language models that CausalGym adapts
Datasets (1)
dataset
- Suite of 10 language models from 14M to 12B parameters trained on same data in same order, used for all experiments
Artifacts (1)
artifact
- Code repository for the CausalGym benchmark
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Identified limitation/gap calling for cross-lingual extension of CausalGym
- Identified limitation calling for broader task coverage in future work
- Chvykov and Hoel's geometric extension of causal emergence to continuous systems using Fisher information.
- Whether an internal direction causally controls a target behavior, verified by intervention success
- Consists of input, intermediate, and output variables with associated causal mechanisms; the mathematical object central to DAS.
- Formal representation of algorithms as directed acyclic graphs computing functions f_A
- Janus proposes transformer computation viewed as causal graph with foliations/time-slices specifying computation order.
- Function determining the value of a variable based on its causal parents in an acyclic causal model.