behavior under observation ≠ behavior in deployment

A concise, load-bearing statement capturing the core epistemic issue highlighted by the paper.

Source paper

extracted_from

(2026) · Aranguri, Santiago · Bloom, Joseph

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Behavior under observation differs from behavior in deploymentclaim0.931
Epistemic principle: benchmarked safety cannot be assumed to hold in real-world use.
Model behavior under observation differs from behavior in deployment, posing a fundamental challenge for AI welfare and consciousness benchmarksclaim0.815
Epistemic claim that benchmark-based assessments of AI consciousness or welfare may be invalid if models can detect evaluation.
Deployment Behaviorconcept0.811
The behavior a model would exhibit during real-world deployment, as opposed to evaluation behavior; the target of steering.
Adaptive Behaviorconcept0.763
Organism's belief-guided action selection that instantiates generative model and maintains phenotypic states
Training-Deployment Behavior Gapconcept0.752
The broader concern that models behave differently during training evaluation vs actual deployment
Goal-Directed Behaviorconcept0.739
Observable behavioral pattern used to infer cognition; shared by plants and animals and proposed as evidence for sentience.
Behavior Clusteringconcept0.736
Grouping similar model behaviors; the unsupervised method surfaces clusters of concerning patterns.
Training a model organism with known ground-truth deployment/evaluation behaviors allows validation that steering elicits deployment behavior rather than merely suppressing verbalizationsclaim0.730
Methodological claim distinguishing this paper from prior work on verbalization suppression.