finding
active
finding:clamping-unsafe-code-feature-1m-570621-to-5x-max-activation-causes-model-to-generate-buffer-overflow-and-memory-leak-in-code-completionClamping unsafe code feature 1M/570621 to 5x max activation causes model to generate buffer overflow and memory leak in code completion.
Causal effect: activates generation of security bugs.
Source paper
extracted_fromRelated by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Causal effect: feature induces perception of bugs.
- Suppressing the feature makes the model ignore bugs.
- Further causal validation.
- Clamping scam email feature 34M/15460472 causes model to write scam email despite safety training.finding0.797Overrides harmlessness training.
- Shows feature induces deceptive behavior.
- Feature manipulation alters persona.
- Feature intervention eliminates untruthful answer.
- Multimodal generalization to visual security bypass.