method
active
method:expert-iterationExpert Iteration
Second training stage: samples responses, filters for type hints, and fine-tunes on filtered responses across four rounds to reinforce evaluation behavior.
Neighborhood — ranked by edge-count
Papers (1)
paper
Frameworks (1)
framework
- The paper's framework for training evaluation-aware model organisms: SDF seeds beliefs, expert iteration reinforces evaluation behavior.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Expert iteration increases type hint rate in all contexts but maintains persistent evaluation-deployment gapfinding0.773Confirms expert iteration mimics alignment training: reinforces evaluation behavior but cannot detect or correct deployment behavior.
- Training scale for second stage.
- A dynamic programming method for computing optimal value functions and policies in known MDPs.
- The widespread belief that only trained professionals can design environments, which disempowers ordinary people and prevents adaptation.
- Justification for why the model organism is a realistic test case for studying steering.
- The mechanism by which each step's effect is evaluated against the life of the whole, guiding the unfolding.
- Second stage of DiffLogic CA where a DLGN computes each cell's new binary state from perception output and current state
- The communication method used during West Dean construction: daily exchange of faxes, calls, and photos between Alexander in California and the site team in England.