Downstream Client Feature Analysis

Examining downstream neurons that rely on a given feature to verify its functional role

Neighborhood — ranked by edge-count

paper

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Feature Sparsityconcept0.731
Property that features activate on only a small fraction of inputs; enables compressed sensing and is what allows superposition to work
Learned features reflect the functionality of the model and not just the data distribution, as evidenced by interpretable downstream effects not used in dictionary learningclaim0.730
Authors argue features are model properties because logit effects and ablations are consistent with feature interpretations
Action Featuresconcept0.728
Dual interpretation of features: in addition to responding to inputs, features also act to increase probability of specific output tokens
Feature Visualizationmethod0.725
Method of optimizing input to cause a neuron to fire maximally, used to characterize what a neuron detects; establishes causal link
Feature splittingconcept0.717
Phenomenon where a feature in a small SAE splits into multiple finer features in a larger SAE.
feature as applicationconcept0.709
Metaphor treating each system feature or function as a separate application that can be independently loaded and managed.
Superposition of Sparse Featuresconcept0.705
Mechanistic finding by Bricken et al. 2023 about how LLMs store features; cited as operational justification for pattern-repository assumption
Event Analysis of Systemic Teamwork (EAST)method0.704
Network analysis method used to examine distributed cognition in multi-agent systems; demonstrates measurement approach for collective cognitive processes.