atomic features

The idea that interpretability should decompose representations into minimal, indivisible feature units; contrasted with manifold-level descriptions.

Neighborhood — ranked by edge-count

paper

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Fourier featuresconcept0.743
Features identified in Llama-3.1-8B that compute sums using periods respecting base-10 addition (2, 5, 10) rather than concept-specific periods
Action Featuresconcept0.740
Dual interpretation of features: in addition to responding to inputs, features also act to increase probability of specific output tokens
feature as applicationconcept0.723
Metaphor treating each system feature or function as a separate application that can be independently loaded and managed.
Pure Featureconcept0.719
A feature that responds to only a single latent variable, contrasted with polysemantic features
Feature engineeringconcept0.717
Domain of techniques for constructing informative features from raw data; covariance pooling is a feature engineering method for token sequences.
Geometry of featuresconcept0.716
Research thread within About Blank concerning the structure and relational properties of neural network feature representations; covariance pooling tangentially supports this thread.
Internal Featuresconcept0.712
Representations inside LLMs that can be intervened upon.
Feature Visualizationmethod0.708
Method of optimizing input to cause a neuron to fire maximally, used to characterize what a neuron detects; establishes causal link