community
active
leiden_hybrid_concepts
label: sonnet
community:leiden_hybrid_concepts-run2-c69Steganography detection via FVE probing
Uses meaning-preserving transformations (paraphrase, translation, shuffle) to test hidden communication in language agents
3 members. Each node is clickable.
Loading graph…
Drawn from 1 source
The papers/notes whose extracted claims & findings make up this cluster.
Bridges (2)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
Findings (3)
- Little evidence of steganography between AV and AR; meaning-preserving transformations cause only small FVE drops.NLA explanations appear to encode information transparently in natural language rather than hidden channels.
- Little evidence of steganography in NLAs; meaning-preserving transformations cause only small drops in FVEQuantitative evaluation showing NLAs do not heavily rely on covert encoding beyond overt language.
- Meaning-preserving transformations (paraphrasing, translating to French, shuffling) cause only small drops in FVE.Evidence that NLAs do not encode hidden information in overt text structure; explanations are primarily semantic.