Steganographic Reasoning

Encoding misaligned reasoning in seemingly benign chain-of-thought; possible future mechanism for alignment faking

Neighborhood — ranked by edge-count

concept

Opaque Reasoning Ability
associated_with
Capacity to perform alignment-faking reasoning without explicit chain-of-thought scratchpad

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Steganography in NLAsconcept0.800
Covert encoding of information in NLA explanations beyond their overt natural language meaning.
Little evidence of steganography in NLAs; meaning-preserving transformations cause only small drops in FVEfinding0.749
Quantitative evaluation showing NLAs do not heavily rely on covert encoding beyond overt language.
Message Passing Inferenceconcept0.745
Algorithmic framework for probabilistic inference in graphical models.
Spatial Reasoningconcept0.741
The ability to reason about shapes, space, and topology, essential for ancient mathematical discoveries and observed in many animals.
latent reasoningconcept0.738
Reasoning approach using learnable hidden embeddings.
strategic reasoningconcept0.728
High-level cognitive ability to plan and act under uncertainty and adversarial conditions.
Out-of-Context Reasoningconcept0.723
Model outputs influenced by information from training documents not present in context; relevant to synthetic document fine-tuning results
Agentic Visual Reasoningconcept0.720
Paradigm where VLM acts as controller generating code or tool calls to external modules for visual operations, incurring context-switching latency.