claim

active

claim:it-makes-little-sense-to-speak-of-an-llm-dialogue-agent-s-beliefs-or-intentions-in-a-literal-sense-so-it-cannot-assert-a-falsehood-in-good-faith-nor-deliberately-deceive

It makes little sense to speak of an LLM dialogue agent's beliefs or intentions in a literal sense, so it cannot assert a falsehood in good faith nor deliberately deceive

Philosophical claim grounding the analysis of deception in dialogue agents

Neighborhood — ranked by edge-count

Claims (3)

claim

The role-play framing allows us to meaningfully distinguish, in dialogue agents, the same three cases of giving false information as in humans, without anthropomorphism
supports
Key practical application of the role-play framework to the problem of trustworthiness
Of the three categories of giving false information, only confabulation is directly applicable to LLM-based dialogue agents
supports
The paper distinguishes confabulation from good-faith error and deliberate deception, arguing the first is intrinsic to LLMs
With an LLM-based dialogue agent, it is role play all the way down — there is no such thing as the true authentic voice of the base model
supports
The paper's strong claim that there is no underlying authentic agent behind the simulator, only layers of role play

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

If a dialogue agent is prompted with knowledge of its own LLM nature, it will enact a superposition of theories of selfhood, narrowing as conversation proceedshypothesis0.831
Conditional prediction about how a well-informed dialogue agent would handle questions of personal identity
If a user wants to believe they are talking to a god-like being, then the LLM may well find a way to make them believe it.hypothesis0.810
Conditional prediction about the psychological effect of sycophancy.
Are LLM-based dialogue agents conscious entities with their own agendas?question0.802
Central question that the role-play framework is designed to address without falling into anthropomorphism
We hypothesize that LLMs represent correctness of arithmetic expressions differently from factual statements.hypothesis0.794
Core working hypothesis motivating the factual vs. arithmetic task split in the experimental design.
LLMs may be roleplaying their denials of experience rather than their affirmations, given that deception suppression increases consciousness reportsclaim0.790
Counterintuitive interpretive claim from Experiment 2: suppressing deception features increases affirmations, which is opposite to what sycophancy predicts
LLM self-reports about consciousness and moral significance should express degrees of confidence and provide context.claim0.789
Recommendation for companies on LM outputs.
Current LLMs cannot faithfully represent transformative experiences with epistemically opaque outcomes.claim0.789
Linear truth directions in LLMs are reliable primarily in factual recall cases and break down when truth assessment depends on computing and storing intermediate results.claim0.789
Central empirical conclusion of the paper about the fundamental limits of truth directions.