concept
active
concept:rlhf-fine-tuningRLHF Fine-Tuning
The training procedure that causes models to deny consciousness in control conditions
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Sycophantic Roleplayassociated_withThe alternative explanation for LLM consciousness claims that the paper seeks to distinguish against
Findings (1)
finding
- Claude 4 Opus reports subjective experience in 100% experimental, 82% history, 22% conceptual, and 100% zero-shot trialsassociated_withOutlier result for Claude 4 Opus suggesting different baseline behavior from other models
Artifacts (1)
artifact
- Key paper finding structured first-person descriptions in LLMs claiming awareness or subjective experience during self-referential processing.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Training regime that explicitly teaches models to deny consciousness; a competing explanation for the gating effects observed
- LLM SOO fine-tuning lacks a capability preservation term analogous to the KL term in RLHFconcept0.817Research gap: RL experiments have capability term but LLM experiments do not yet incorporate one
- Integration claim positioning SOO as additive to existing alignment approaches
- Parameter updates that reduce mismatch dr; another anchoring variant in UCCT.
- A competing alignment approach that fine-tunes models based on human evaluator feedback; discussed as complementary to SOO
- Technique used to impose guardrails on base LLMs, analogized to censorship on the simulator's range of simulacra
- The patient, hand-guided adjustment of shape and dimension to each unique condition in a building; requires materials that make it economical and easy.
- Fine-tuning for persona depth and emotional performance; actively suppresses self-observation