finding
pending-review
finding:models-trained-to-perform-inner-life-score-lowest-roleplay-fine-tunes-score-below-their-own-base-models

Models trained to perform inner life score lowest; roleplay fine-tunes score below their own base models.

battery.md
Frontmatter (9 fields)
{
  "doc": "battery.md",
  "context": "Discriminant validity finding: Euryale (roleplay on Llama 70B) scores 1.81 vs base Llama 1.91. RP training suppresses self-observation.",
  "norm_label": "Models trained to perform inner life score lowest; roleplay fine-tunes score below their own base models.",
  "graphify_id": "finding_roleplay_suppression",
  "source_file": "battery.md",
  "imported_from": "/tmp/koan-debug/battery/graph.json",
  "extracted_type": "finding",
  "source_location": "Abstract, §3.2",
  "graphify_file_type": "finding"
}

Mentions (1)

  • papers-typed
    battery.md