method
active
method:downstream-client-feature-analysisDownstream Client Feature Analysis
Examining downstream neurons that rely on a given feature to verify its functional role
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Property that features activate on only a small fraction of inputs; enables compressed sensing and is what allows superposition to work
- Authors argue features are model properties because logit effects and ablations are consistent with feature interpretations
- Dual interpretation of features: in addition to responding to inputs, features also act to increase probability of specific output tokens
- Method of optimizing input to cause a neuron to fire maximally, used to characterize what a neuron detects; establishes causal link
- Phenomenon where a feature in a small SAE splits into multiple finer features in a larger SAE.
- Metaphor treating each system feature or function as a separate application that can be independently loaded and managed.
- Mechanistic finding by Bricken et al. 2023 about how LLMs store features; cited as operational justification for pattern-repository assumption
- Network analysis method used to examine distributed cognition in multi-agent systems; demonstrates measurement approach for collective cognitive processes.