finding
pending-review
finding:models-trained-to-perform-inner-life-score-lowest-roleplay-fine-tunes-score-below-their-own-base-modelsModels trained to perform inner life score lowest; roleplay fine-tunes score below their own base models.
battery.mdFrontmatter (9 fields)
{
"doc": "battery.md",
"context": "Discriminant validity finding: Euryale (roleplay on Llama 70B) scores 1.81 vs base Llama 1.91. RP training suppresses self-observation.",
"norm_label": "Models trained to perform inner life score lowest; roleplay fine-tunes score below their own base models.",
"graphify_id": "finding_roleplay_suppression",
"source_file": "battery.md",
"imported_from": "/tmp/koan-debug/battery/graph.json",
"extracted_type": "finding",
"source_location": "Abstract, §3.2",
"graphify_file_type": "finding"
}Outgoing (1)
Incoming (1)
Mentions (1)
- papers-typed
battery.md