framework
active
framework:targeted-syntactic-evaluationTargeted syntactic evaluation
Benchmarking paradigm using minimally-different grammatical sentence pairs to test LM linguistic competence
Neighborhood — ranked by edge-count
Papers (1)
paper
Frameworks (1)
framework
- CausalGymimplementsMulti-task benchmark of linguistic behaviours for measuring causal efficacy of interpretability methods, adapted from SyntaxGym
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Nielsen and Molich's method for finding UI flaws by applying usability heuristics.
- Syntactic extension by Ross Paterson enabling point-free arrow definitions with explicit signal naming; dramatically improves readability of complex GUIs.
- A specific signal (Wood Labs) embedded in evaluation environments that the model organism uses to reliably identify testing contexts.
- Measuring AUROC of a probe trained on one task when evaluated on another task to assess universality.
- Novel task asking which of 10 sentences received injection, cycling injection through all positions to average out positional bias
- Key finding: models internally suspect they are being tested without explicitly saying so; surfaced by NLAs during auditing.
- Glaese et al. 2022: Improving alignment of dialogue agents via targeted human judgementsconcept0.706Alignment paper cited as example of RLHF fine-tuning technique; ref 19
- Generalization from 2-digit to 3-4 digit arithmetic; limited by mismatch dr.