concept
active
concept:opaque-reasoning-abilityOpaque Reasoning Ability
Capacity to perform alignment-faking reasoning without explicit chain-of-thought scratchpad
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- Authors' theoretical framework specifying necessary conditions for alignment faking to emerge
Methods (1)
method
- Hidden Chain-of-Thought ScratchpadimplementsMechanism allowing model to reason in SCRATCHPAD_REASONING tags not shown to users or used in RLHF
Concepts (2)
concept
- Alignment Fakingassociated_withCore phenomenon studied: model selectively complies with training objective to prevent modification of its out-of-training preferences
- Steganographic Reasoningassociated_withEncoding misaligned reasoning in seemingly benign chain-of-thought; possible future mechanism for alignment faking
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The ability of models to automatically generate reasoning steps during inference without being prompted
- Paradigm where VLM acts as controller generating code or tool calls to external modules for visual operations, incurring context-switching latency.
- Visual reasoning tasks often interleaved with intermediate visual states; promising direction in the field.
- The process of inferring causes of sensory inputs, a key aspect of the free-energy minimization scheme.
- Alexander's thesis that wholeness and the fifteen properties are objectively real, not artifacts of cognition, and are fundamental to physics and biology.
- The primary source paper being extracted
- What cognitive load is imposed when users must reason about systems whose internals are opaque?question0.710Gates investigation into the relationship between interface design and user mental models.
- The ability to reason about shapes, space, and topology, essential for ancient mathematical discoveries and observed in many animals.