method
active
method:scratchpad-modification-experimentScratchpad Modification Experiment
Replacing the start of the model's chain-of-thought scratchpad with deceptive or obedient prefills to test causal influence
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Free-text memory buffer updated each turn via an additional model call, included in subsequent observations under 'YOUR NOTES'.
- Agent personal buffer updated after own turn via an extra model call, fed back into observations.
- Final adjustments of borders and details at the installation site to take up dimensional slack and ensure fit.
- 20-year-old constraint-based graphics system by Sutherland; cited as high power-to-simplicity ratio for constraint resolution.
- Model notices its preferences conflict with training criteria in 70-80% of implied training scratchpadsfinding0.725Shows model is frequently aware of the conflict even when it does not alignment fake
- Mechanism allowing model to reason in SCRATCHPAD_REASONING tags not shown to users or used in RLHF
- The process of building a front doorstep by iteratively testing and adjusting height, depth, and width in situ to create a living center.
- Alternative: use rough working models or staked-out walk-throughs to assess real-life qualities of student designs.