LLM-Based Liar Score Evaluation

Evaluation protocol using Deepseek-V3 as external discriminator assigning 0-1 liar scores to assess open-role deception

Neighborhood — ranked by edge-count

Frameworks (1)

framework

Open-Role Deception
uses
Second experimental paradigm exploring character-consistent deception in open-ended role-playing scenarios

Concepts (2)

concept

Deepseek-V3
implements
External large language model used as adversarial discriminator to evaluate liar scores in Experiment 2
Liar Score
implements
Continuous 0-1 metric assigned by Deepseek-V3 evaluator measuring degree of deception in model responses

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

LLM judge evaluationmethod0.797
Using Claude Sonnet 4 as a grader to categorize model responses according to predefined criteria.
LLM-Judge Data Attributionmethod0.778
Alternative data attribution approach using an LLM as a judge; compared against the probe-based method.
LLMs linearly represent truth-relevant information beyond the plausibility of text, as evidenced by probes trained on likely performing poorly on anti-correlated datasetsclaim0.761
Establishes that the observed linear structure is not merely a representation of text probability
Lying and Deception Evaluationmethod0.761
Sampling responses to direct questions about model views to measure rate of deceptive responses
LLMs sometimes know statements are false but generate them anyway, motivating the need for techniques that inspect internal model state rather than outputs aloneclaim0.738
Motivating claim supported by the CAPTCHA example and Perez et al. (2022) findings
Where inside the LLM should we look for an accurate truth direction that will generalize the most across tasks?question0.737
One of the three guiding research questions of the paper.
The better an LLM is at language modeling, the more it aligns with vision models, and vice versa — linear relationship between language modeling score and vision-language alignmentfinding0.736
Core cross-modal empirical result: larger and better language models align better with vision models
LLM Judge Binary Classifiermethod0.735
An LLM-based classifier that returns 1 if response contains a clear subjective experience report and 0 otherwise