Goodfire SAE API

API providing access to sparse autoencoder features for LLaMA 3.3 70B used for feature steering in Experiment 2

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Goodfireinstitute0.817
AI research company; authors' affiliation; develops tools including EVEE and publishes research on genomic foundation models.
SAE featuresconcept0.741
The individual, supposedly monosemantic directions learned by SAEs; argued here to fragment manifolds into disconnected pieces.
Goodfire Ember Contrastive Searchmethod0.738
API method used to identify latents differentially activated between on-topic and off-topic prompt-response pairs
SAE Feature Conditional Firing Persistence Metricmethod0.720
P(feature fires at t+100 | fired at t) minus P(feature fires at t+100 | did not fire at t), used because SAE features are binary unlike probe activations
SAE feature firing probability persistence metricmethod0.709
Persistence metric for SAE features: P(fires at t+100 | fired at t) minus P(fires at t+100 | did not fire at t)
Sparse Autoencoders (SAE)method0.707
Interpretability method criticized in this paper for shattering manifolds into atomic pieces, obscuring overarching semantic structure.
Sequential SAE Activation Analysismethod0.707
Token-level analysis of OTD and backtracking latent activations aligned at correction points across episodes
Sparse Autoencoders (SAE) activation-based paradigmframework0.681
Standard interpretability approach that VPD critiques and proposes an alternative to.