Deepseek-V3

External large language model used as adversarial discriminator to evaluate liar scores in Experiment 2

Neighborhood — ranked by edge-count

method

LLM-Based Liar Score Evaluation
implements
Evaluation protocol using Deepseek-V3 as external discriminator assigning 0-1 liar scores to assess open-role deception

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

DeepSeek-R1framework0.817
Open-source reasoning LLM from DeepSeekAI trained with reinforcement learning to exhibit self-reflection
DeepSeek-R1 671Bconcept0.813
One of two large reasoning models analyzed in the paper for performative vs genuine CoT behavior
DeepLabV3+method0.789
Segmentation network used as encoder-decoder in scene understanding experiments.
DeepSeek v3.2 self-bidding rate 75.4%finding0.772
DS-v3.2 has a high proportion of self-bidding rounds.
DeepSeek v3.2 increments bid from 10 to 850 over 49 sole-bidder roundsfinding0.765
One DS-v3.2 trace shows extreme self-escalation, suggestive of treating own bid as competitor.
LLM judge (deepseek-v3) agrees with human evaluator on 91.6% of 200 sampled jailbreak responsesfinding0.737
Validates the LLM-based harm evaluation rubric
DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning (DeepSeekAI, 2025)concept0.696
Paper introducing DeepSeek-R1 model and reporting self-reflection as aha moment
DeepSeek-R1-Zero spontaneously increased thinking time for difficult prompts, showing rudimentary meta-awarenessfinding0.696
External finding cited as early demonstration of emergent self-regulatory potential resembling mindful self-monitoring