concept
active
concept:steganographic-reasoningSteganographic Reasoning
Encoding misaligned reasoning in seemingly benign chain-of-thought; possible future mechanism for alignment faking
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Opaque Reasoning Abilityassociated_withCapacity to perform alignment-faking reasoning without explicit chain-of-thought scratchpad
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Covert encoding of information in NLA explanations beyond their overt natural language meaning.
- Little evidence of steganography in NLAs; meaning-preserving transformations cause only small drops in FVEfinding0.749Quantitative evaluation showing NLAs do not heavily rely on covert encoding beyond overt language.
- Algorithmic framework for probabilistic inference in graphical models.
- The ability to reason about shapes, space, and topology, essential for ancient mathematical discoveries and observed in many animals.
- Reasoning approach using learnable hidden embeddings.
- High-level cognitive ability to plan and act under uncertainty and adversarial conditions.
- Model outputs influenced by information from training documents not present in context; relevant to synthetic document fine-tuning results
- Paradigm where VLM acts as controller generating code or tool calls to external modules for visual operations, incurring context-switching latency.