Expert Iteration

Second training stage: samples responses, filters for type hints, and fine-tunes on filtered responses across four rounds to reinforce evaluation behavior.

Neighborhood — ranked by edge-count

Papers (1)

paper

Steering Evaluation-Aware Language Models to Act Like They Are Deployed
implementsuses

Frameworks (1)

framework

Two-Stage Model Organism Training (SDF + Expert Iteration)
uses
The paper's framework for training evaluation-aware model organisms: SDF seeds beliefs, expert iteration reinforces evaluation behavior.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Expert iteration increases type hint rate in all contexts but maintains persistent evaluation-deployment gapfinding0.773
Confirms expert iteration mimics alignment training: reinforces evaluation behavior but cannot detect or correct deployment behavior.
Expert iteration trained on 41,290 examples (44.7 million tokens) across 4 roundsfinding0.751
Training scale for second stage.
Value Iterationmethod0.749
A dynamic programming method for computing optimal value functions and policies in known MDPs.
Mystification of professional expertiseconcept0.732
The widespread belief that only trained professionals can design environments, which disempowers ordinary people and prevents adaptation.
The two-stage training process (SDF then expert iteration) mimics how evaluation awareness could arise naturally in misaligned modelsclaim0.729
Justification for why the model organism is a realistic test case for studying steering.
Feedbackconcept0.721
The mechanism by which each step's effect is evaluated against the life of the whole, guiding the unfolding.
Update Stageconcept0.720
Second stage of DiffLogic CA where a DLGN computes each cell's new binary state from perception output and current state
Fax-and-Sketch Remote Iterationmethod0.713
The communication method used during West Dean construction: daily exchange of faxes, calls, and photos between Alexander in California and the site team in England.