DeepSeek-R1

Open-source reasoning LLM from DeepSeekAI trained with reinforcement learning to exhibit self-reflection

Neighborhood — ranked by edge-count

thinker

DeepSeekAI
studies
Organization that introduced DeepSeek-R1 and reported the aha moment of self-reflection

framework

ReflCtrl
studies
The proposed framework for probing and steering self-reflection behavior in reasoning LLMs via representation engineering
Grouped Relative Policy Optimization (GRPO)
implements
Cost-efficient training algorithm used by DeepSeek-R1 for RL-based reasoning

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

DeepSeek-R1 671Bconcept0.875
One of two large reasoning models analyzed in the paper for performative vs genuine CoT behavior
Deepseek-V3concept0.817
External large language model used as adversarial discriminator to evaluate liar scores in Experiment 2
DeepSeek-R1-Zero spontaneously increased thinking time for difficult prompts, showing rudimentary meta-awarenessfinding0.810
External finding cited as early demonstration of emergent self-regulatory potential resembling mindful self-monitoring
DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning (DeepSeekAI, 2025)concept0.782
Paper introducing DeepSeek-R1 model and reporting self-reflection as aha moment
DeepSeek v3.2 increments bid from 10 to 850 over 49 sole-bidder roundsfinding0.728
One DS-v3.2 trace shows extreme self-escalation, suggestive of treating own bid as competitor.
DeepSeek v3.2 self-bidding rate 75.4%finding0.727
DS-v3.2 has a high proportion of self-bidding rounds.
DeepSeek-R1 Llama 8b gains 0.16% accuracy on GSM8k with positive intervention (more reflections) at cost of ~2000 additional tokensfinding0.712
Only model showing marginal benefit from increased reflection, at substantial token cost
LLM judge (deepseek-v3) agrees with human evaluator on 91.6% of 200 sampled jailbreak responsesfinding0.707
Validates the LLM-based harm evaluation rubric