concept
active
concept:llm-soo-fine-tuning-lacks-a-capability-preservation-term-analogous-to-the-kl-term-in-rlhfLLM SOO fine-tuning lacks a capability preservation term analogous to the KL term in RLHF
Research gap: RL experiments have capability term but LLM experiments do not yet incorporate one
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The training procedure that causes models to deny consciousness in control conditions
- Integration claim positioning SOO as additive to existing alignment approaches
- Central empirical claim of the paper supported by three LLM experiments
- Scaling pattern: 78B > 27B > 7B in deception reduction from SOO fine-tuning
- Fine-tuning reduces dr; retrieval increases effective ρd; few-shot k trades budget against bothhypothesis0.773UCCT's unified view of adaptation methods
- Claim supported by Perspectives scenario results showing near-100% accuracy post-fine-tuning
- Unified interpretation of different adaptation methods via UCCT terms
- Empirical finding cited to support the claim that fine-tuning does not resolve the self-preservation role-play problem