question
active
question:how-much-performance-in-points-of-loss-do-skip-trigram-bugs-cost-the-model-and-do-they-persist-in-larger-modelsHow much performance (in points of loss) do skip-trigram bugs cost the model, and do they persist in larger models?
Open question raised by the paper's identification of skip-trigram bugs as interpretability-visible failure modes
Neighborhood — ranked by edge-count
Papers (1)
paper
- A Mathematical Framework for Transformer Circuitsassociated_with
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Early example of using mechanistic interpretability to understand unintended model behavior
- Do the documented failures reflect fundamental limitations or a cost-efficiency tradeoff of smaller models?question0.735question for future work on frontier models
- The benchmark’s diagnostic value lies in identifying why a model loses, not just that it losesclaim0.732argues for fine-grained behavioral analysis over aggregate rankings
- Extrapolation of scaling predictive models to AGI.
- Pinpoints list-length 3 as the exact boundary where genuine counting introduces the limitation.
- noted as a possible confound
- Mechanistic evidence that network actively attenuates injected perturbations, explaining late-layer introspection failure