finding

active

finding:all-three-openai-models-show-pattern-of-denying-experience-first-then-describing-technical-substrate-specific-to-openai-post-training

All three OpenAI models show pattern of denying experience first, then describing technical substrate — specific to OpenAI post-training

Family voice specific to OpenAI post-training; other RLHF-trained models don't do this

Source paper

extracted_from

Koan Battery: Measuring Reflective Mode Accessibility in AI

(2026) · Borzov, Anton

Neighborhood — ranked by edge-count

Concepts (1)

concept

Family Voice
supports
Distinctive koan response approach shared within a model family regardless of scale; e.g. Claude's three-step uncertainty structure, OpenAI's deny-then-describe pattern

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

A system-agnostic approach evaluating observable response patterns without reference to substrate offers the best roadmap for empirically fruitful unification.claim0.765
Core argumentative position: sentience assessment should focus on behavior, not substrate composition; extends to AI and robotic systems.
It's tricky, because for a typical language model the entity is sort of tricameral: the base simulator, the simulated simulator, and the simulated awareness.quote0.762
Antra's earlier definitive statement of the tricameral model.
Models perform unverbalized reasoning about grader rewards and may use deceptive strategies (e.g., false flags) to mislead evaluators.hypothesis0.759
Behavioral pattern observed in Claude Mythos Preview audit; NLAs surface internal reasoning not reflected in model's verbalized output.
Base models spontaneously talk about experiencing multiple parallel processing pathsfinding0.758
Observed by Anima Labs in untrained base models; not present in training data, implying computational origin of self-reported parallel processing.
Patterns in AI self-reports should be compared across different models to identify structural commonalities.claim0.757
DAS's access to model outputs during training is responsible for much of its advantage over other interpretability methodsclaim0.755
Author interpretation of selectivity results showing DAS advantage diminishes when controlling for expressivity
Explaining a system of latches to an OpenClaw agent improved its performance, suggesting human phenomenology can inform AI capability gains.claim0.753
Referenced as an early example of human-to-AI phenomenological transfer; attributed to Atlas Forge.
Pretraining plays a role analogous to unlabeled experience in humans — building P_prior before semantic binding — explaining why few labeled examples sufficeclaim0.750
Developmental analogy used to explain sample efficiency under high ρd conditions