method
active
method:dev-set-calibrationdev set calibration
Fixed dev pool of 1000 prompts used for whitening and z-scoring parameters.
Neighborhood — ranked by edge-count
Methods (1)
method
- whitening and z-scoring procedureimplementsCalibration protocol: whiten embeddings on dev pool, z-score ρd and dr per layer.
Datasets (1)
dataset
- calibration dev poolabout1000 mixed prompts used to compute whitening mean/covariance and z-scoring statistics.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Clarifies nature of S.
- Preprocessing pipeline for standardizing ρd, dr, and S across layers/models using dev-set covariance
- Criterion requiring that model's description of internal state be accurate, distinguishing genuine introspection from confabulation.
- Three reference responses at known quality levels shown alongside each target to eliminate score inflation in calibrated rubric scoring
- Baseline method: sweeps over shot count and resamples prompts; calibrates threshold for P(TRUE)-P(FALSE); performed surprisingly weakly
- Prior work using steering vectors to control reflection, motivated by reducing redundant self-reflection in long CoT.
- Alexander's structuralist approach treating design as homeostatic adaptation analogous to biological systems.
- Re-running probabilistic bisection on each fine-tuned checkpoint to normalize first-attempt difficulty