UMAP visualization for features

Dimensionality reduction of SAE decoder vectors to create interactive feature maps.

Neighborhood — ranked by edge-count

method

UMAP Embedding of Features
related_to
2D embedding of feature direction vectors used to visualize feature clusters and splitting geometry

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Feature Visualizationmethod0.832
Method of optimizing input to cause a neuron to fire maximally, used to characterize what a neuron detects; establishes causal link
Principal Component Analysis Visualizationmethod0.738
Used to visualize LLM true/false representations, revealing clear linear structure separating true from false statements
PCA Visualizationmethod0.729
Used to visually inspect separation of truth-related directions in model activation space across layers
feature as applicationconcept0.721
Metaphor treating each system feature or function as a separate application that can be independently loaded and managed.
Linear Representation of Featuresconcept0.714
The central object of study — the idea that a concept like truth is encoded as a direction in the LLM's latent space
Context Featureconcept0.711
Feature that activates across all tokens within a specific context (e.g., DNA sequences, base64 strings)
Action Featuresconcept0.710
Dual interpretation of features: in addition to responding to inputs, features also act to increase probability of specific output tokens
Pure Featureconcept0.709
A feature that responds to only a single latent variable, contrasted with polysemantic features