method
active
method:activation-interval-samplingActivation Interval Sampling
Dividing feature activation spectrum into 11 evenly-spaced intervals and sampling uniformly to evaluate monosemanticity across activation levels
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The mechanism by which LLMs generate text: drawing a token from the next-token distribution and appending it to context repeatedly
- Internal representations of the model on which probes operate; the method uses activations to rank datapoints.
- Component of the contrastive retrieval pipeline analyzing activation statistics.
- A technique to filter model outputs; Redwood Research's project mentioned.
- Clamping activations along the Assistant Axis to remain above a minimum threshold (25th percentile), introduced as a stabilization method
- A Bayesian exploration strategy that samples from the posterior distribution over model parameters to decide actions.
- Technique of reading out model beliefs from internal activations before the final answer token is generated
- Assumption that small anchor changes can produce sharp performance shifts when conditions are favorable.