concept

active

concept:deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning-deepseekai-2025

DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning (DeepSeekAI, 2025)

Paper introducing DeepSeek-R1 model and reporting self-reflection as aha moment

Neighborhood — ranked by edge-count

paper

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

DeepSeek-R1-Zero spontaneously increased thinking time for difficult prompts, showing rudimentary meta-awarenessfinding0.821
External finding cited as early demonstration of emergent self-regulatory potential resembling mindful self-monitoring
Deep Reinforcement Learningmethod0.800
AI training method inspired by behaviorism, used for autonomous cars and drones; cited as bioinspired success
DeepSeek-R1framework0.782
Open-source reasoning LLM from DeepSeekAI trained with reinforcement learning to exhibit self-reflection
LLM judge (deepseek-v3) agrees with human evaluator on 91.6% of 200 sampled jailbreak responsesfinding0.770
Validates the LLM-based harm evaluation rubric
Reinforcement learning is sufficient for agency.claim0.769
Argument that RL meets the agency indicator.
Reinforcement Learning from Human Feedback (RLHF)framework0.760
A competing alignment approach that fine-tunes models based on human evaluator feedback; discussed as complementary to SOO
Reinforcement learning can be regarded as a limiting or special case of model-based approaches in general — or active inference in particular — when epistemic value is removed.claim0.757
§3 Discussion.
DeepSeek-R1 Llama 8b gains 0.16% accuracy on GSM8k with positive intervention (more reflections) at cost of ~2000 additional tokensfinding0.755
Only model showing marginal benefit from increased reflection, at substantial token cost