claim
active
claim:fine-tuning-induces-the-behavioral-pattern-of-self-correction-but-does-not-improve-the-underlying-ability-to-correct-effectively

Fine-tuning induces the behavioral pattern of self-correction but does not improve the underlying ability to correct effectively

Key interpretive conclusion from the dissociation between attempt rate and improvement rate in fine-tuning experiments

Source paper

extracted_from
Endogenous Resistance to Activation Steering in Language Models
(2026) · Alex McKenzie · Keenan Pepper · Stijn Servaes · Martin Leitgab +5

Neighborhood — ranked by edge-count

Findings (1)

finding

Concepts (1)

concept

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.