method
active
method:activation-interval-sampling

Activation Interval Sampling

Dividing feature activation spectrum into 11 evenly-spaced intervals and sampling uniformly to evaluate monosemanticity across activation levels

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The mechanism by which LLMs generate text: drawing a token from the next-token distribution and appending it to context repeatedly
  • Activationsconcept0.759
    Internal representations of the model on which probes operate; the method uses activations to rank datapoints.
  • Component of the contrastive retrieval pipeline analyzing activation statistics.
  • A technique to filter model outputs; Redwood Research's project mentioned.
  • Clamping activations along the Assistant Axis to remain above a minimum threshold (25th percentile), introduced as a stabilization method
  • A Bayesian exploration strategy that samples from the posterior distribution over model parameters to decide actions.
  • Activation Probingconcept0.727
    Technique of reading out model beliefs from internal activations before the final answer token is generated
  • Assumption that small anchor changes can produce sharp performance shifts when conditions are favorable.