method
active
method:statistical-activation-analysisStatistical Activation Analysis
Component of the contrastive retrieval pipeline analyzing activation statistics.
Neighborhood — ranked by edge-count
Methods (1)
method
- A pipeline employing controlled semantic oppositions to distill monosemantic functional features from sparse activation spaces.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Internal representations of the model on which probes operate; the method uses activations to rank datapoints.
- Pearson correlation of feature activations across 40M tokens used to measure feature similarity and universality across models
- Token-level analysis of OTD and backtracking latent activations aligned at correction points across episodes
- Intervention method that adds a learned direction vector to residual stream activations to steer model behavior
- Dividing feature activation spectrum into 11 evenly-spaced intervals and sampling uniformly to evaluate monosemanticity across activation levels
- Technique of reading out model beliefs from internal activations before the final answer token is generated
- The branch of physics dealing with large numbers of particles, statistical laws, and the tendency to disorder, as described by Boltzmann and Gibbs.
- Model-independent feature comparison based on correlating activation vectors across a fixed diverse dataset