Targeted syntactic evaluation

Benchmarking paradigm using minimally-different grammatical sentence pairs to test LM linguistic competence

Neighborhood — ranked by edge-count

paper

framework

CausalGym
implements
Multi-task benchmark of linguistic behaviours for measuring causal efficacy of interpretability methods, adapted from SyntaxGym

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Heuristic Evaluationmethod0.723
Nielsen and Molich's method for finding UI flaws by applying usability heuristics.
arrows syntactic sugar (proc notation)method0.721
Syntactic extension by Ross Paterson enabling point-free arrow definitions with explicit signal naming; dramatically improves readability of complex GUIs.
Evaluation Cueconcept0.716
A specific signal (Wood Labs) embedded in evaluation environments that the model organism uses to reliably identify testing contexts.
Cross-task generalization evaluationmethod0.712
Measuring AUROC of a probe trained on one task when evaluated on another task to assess universality.
Sentence Localization Taskmethod0.707
Novel task asking which of 10 sentences received injection, cycling injection through all positions to average out positional bias
Unverbalized Evaluation Awarenessconcept0.706
Key finding: models internally suspect they are being tested without explicitly saying so; surfaced by NLAs during auditing.
Glaese et al. 2022: Improving alignment of dialogue agents via targeted human judgementsconcept0.706
Alignment paper cited as example of RLHF fine-tuning technique; ref 19
scope generalizationconcept0.701
Generalization from 2-digit to 3-4 digit arithmetic; limited by mismatch dr.