framework
active
framework:solu-activation-functionSoLU Activation Function
Prior Anthropic approach to increasing neuron monosemanticity via activation function design; found to make some neurons more interpretable at cost of others
Neighborhood — ranked by edge-count
Claims (1)
claim
- Author's conclusion after extensive investigation of architectural approaches to monosemanticity
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The nonlinear activation function used in MLP layers; prevents the linearization approach used for attention layers from extending to MLP layers
- Internal representations of the model on which probes operate; the method uses activations to rank datapoints.
- Token-level analysis of OTD and backtracking latent activations aligned at correction points across episodes
- Intervention method that adds a learned direction vector to residual stream activations to steer model behavior
- Component of the contrastive retrieval pipeline analyzing activation statistics.
- Technique of reading out model beliefs from internal activations before the final answer token is generated
- Latent model activations when processing inputs framed from another agent's perspective
- The conventional approach (e.g., SAEs, transcoders) of decomposing activations into interpretable features.