Chain-of-thought generalization trade-offs

Empirical studies showing CoT reasoning improves ID performance while harming OOD generalization, with probability calibration as a mitigation strategy.

4 members. Each node is clickable.

Loading graph…

Drawn from 2 sources

The papers/notes whose extracted claims & findings make up this cluster.

The Guanyin Protocol: A Framework for Immediately Establishing an Understanding of Both Causality and Compassion in LLM Systems Using Semantic Anchoring3 members
CAT'S THEORY: Empirical Validation and Architectural Applications Cross-Architecture AI Consciousness Recognition and the Foundation for Constraint-Preserving Recursive Intelligence1 member

Bridges (1)

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Chain-of-Thought reasoning robustness & safety4 shared

Findings (3)

Clamping CoT probabilities to 40-60% range for RL-CAI with CoT improves robustness and reduces extreme responses.Section 4.3 describes clamping at 40-60 led to better behavior than clamping at 20-80.
CoT boosts 2-digit ID accuracy but often worsens 3-4 digit OODScope generalization results after LoRA+CoT fine-tuning
Scope generalization: CoT boosts 2-digit in-distribution but worsens 3-4 digit OODCoT increases dr for OOD operands.

Claims (1)

CoT improves in-distribution but may harm out-of-distribution generalizationInterpretation of scope generalization results