finding
active
finding:feature-1m-1013764-activates-on-diverse-code-errors-typos-in-code-array-overflow-divide-by-zero-type-mismatch-across-python-c-scheme-but-not-on-english-prose-typosFeature 1M/1013764 activates on diverse code errors (typos in code, array overflow, divide by zero, type mismatch) across Python, C, Scheme, but not on English prose typos.
Shows a general code error detector beyond simple typo detection.
Source paper
extracted_fromRelated by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Feature represents the 'addition' function abstractly.
- Concrete example of feature splitting revealing unexpected model structure
- Causal effect: activates generation of security bugs.
- Multimodal generalization to visual security bypass.
- Causal validation of base64 feature function via pinned feature sampling
- Causal effect: feature induces perception of bugs.
- Suppressing the feature makes the model ignore bugs.
- Universality of base64 feature across two transformers