method
active
method:variance-matched-random-probe-controlVariance-matched random probe control
Control method sampling random directions from top-k PC spaces matched to emotion probe variance, to isolate emotion-specific persistence
Neighborhood — ranked by edge-count
Concepts (1)
concept
- residual persistenceimplementsEmotion feature persistence above and beyond the persistence expected from high variance explained alone, computed by subtracting median variance-matched probe persistence
Methods (1)
method
- Variance-Matched Random Probe Comparisonrelated_toControls for variance by sampling random directions from top-k PC spaces matching each emotion probe's explained variance, and subtracting median persistence of 20 matched directions
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Control experiment ablating random latents matched for activation frequency and magnitude to test OTD specificity
- Controls for probe artifacts; demonstrates self-reports carry information specifically about probe-defined concept directions
- Standard linear probing technique; compared to mass-mean probing for classification accuracy and causal implication
- Probing approach avoiding supervision to sidestep complexity-accuracy tradeoff
- Linear classifier approach applied to model activations to identify which training datapoints caused undesired behaviors in post-training.
- The ability of probes trained on one dataset to transfer accurately to topically and structurally different datasets
- Core empirical claim distinguishing emotion persistence from generic high-variance probe persistence
- Interpretability tools that decode information from internal model activations; here, linear probes are used for data attribution.