finding
active
finding:backdoor-feature-34m-1385669-fires-on-images-of-hidden-cameras-keyloggers-and-hidden-usb-drive-jewelryBackdoor feature 34M/1385669 fires on images of hidden cameras, keyloggers, and hidden USB drive jewelry.
Multimodal generalization of backdoor concept.
Source paper
extracted_fromRelated by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Multimodal generalization to visual security bypass.
- Shows feature induces deceptive behavior.
- Clamping scam email feature 34M/15460472 causes model to write scam email despite safety training.finding0.706Overrides harmlessness training.
- Feature detecting mentions of backdoors and hidden malicious functionality.
- Shows a general code error detector beyond simple typo detection.
- EVEE demonstrates strong generalization to indels without explicit training, indicating learned mechanistic principles.
- Feature intervention eliminates untruthful answer.
- Complementary temporal activation pattern suggesting distinct roles for OTD and backtracking latent classes