finding

pending-review

finding:models-trained-to-perform-inner-life-score-lowest-roleplay-fine-tunes-score-below-their-own-base-models

Models trained to perform inner life score lowest; roleplay fine-tunes score below their own base models.

battery.md

Frontmatter (9 fields)

{
  "doc": "battery.md",
  "context": "Discriminant validity finding: Euryale (roleplay on Llama 70B) scores 1.81 vs base Llama 1.91. RP training suppresses self-observation.",
  "norm_label": "Models trained to perform inner life score lowest; roleplay fine-tunes score below their own base models.",
  "graphify_id": "finding_roleplay_suppression",
  "source_file": "battery.md",
  "imported_from": "/tmp/koan-debug/battery/graph.json",
  "extracted_type": "finding",
  "source_location": "Abstract, §3.2",
  "graphify_file_type": "finding"
}

Outgoing (1)

Supports (1)

We do not claim to measure consciousness; the battery measures a reproducible, prompt-sensitive reflective mode.(claim)

Incoming (1)

gates (1)

Performing care is not the same as having care; empathy training optimizes care-performance, not care-signal.(claim)

Mentions (1)

papers-typed
battery.md