concept
active
concept:deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning-deepseekai-2025DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning (DeepSeekAI, 2025)
Paper introducing DeepSeek-R1 model and reporting self-reflection as aha moment
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- External finding cited as early demonstration of emergent self-regulatory potential resembling mindful self-monitoring
- AI training method inspired by behaviorism, used for autonomous cars and drones; cited as bioinspired success
- Open-source reasoning LLM from DeepSeekAI trained with reinforcement learning to exhibit self-reflection
- LLM judge (deepseek-v3) agrees with human evaluator on 91.6% of 200 sampled jailbreak responsesfinding0.770Validates the LLM-based harm evaluation rubric
- Argument that RL meets the agency indicator.
- A competing alignment approach that fine-tunes models based on human evaluator feedback; discussed as complementary to SOO
- §3 Discussion.
- Only model showing marginal benefit from increased reflection, at substantial token cost