community

active

leiden_hybrid_concepts

label: sonnet

community:leiden_hybrid_concepts-run2-c69

Steganography detection via FVE probing

Uses meaning-preserving transformations (paraphrase, translation, shuffle) to test hidden communication in language agents

3 members. Each node is clickable.

Loading graph…

Drawn from 1 source

The papers/notes whose extracted claims & findings make up this cluster.

Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations3 members

Bridges (2)

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Findings (3)

Little evidence of steganography between AV and AR; meaning-preserving transformations cause only small FVE drops.NLA explanations appear to encode information transparently in natural language rather than hidden channels.
Little evidence of steganography in NLAs; meaning-preserving transformations cause only small drops in FVEQuantitative evaluation showing NLAs do not heavily rely on covert encoding beyond overt language.
Meaning-preserving transformations (paraphrasing, translating to French, shuffling) cause only small drops in FVE.Evidence that NLAs do not encode hidden information in overt text structure; explanations are primarily semantic.