method
active
method:goodfire-ember-contrastive-search

Goodfire Ember Contrastive Search

API method used to identify latents differentially activated between on-topic and off-topic prompt-response pairs

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • 26 SAE latents identified as differentially activated during off-topic content and causally linked to ESR

Artifacts (1)

artifact
  • API providing contrastive search functionality used to identify off-topic detector latents

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Goodfireinstitute0.799
    AI research company; authors' affiliation; develops tools including EVEE and publishes research on genomic foundation models.
  • Goodfire SAE APImethod0.738
    API providing access to sparse autoencoder features for LLaMA 3.3 70B used for feature steering in Experiment 2
  • Unsupervised probing method from Burns et al. 2023 that identifies directions along which contrast pair representations are far apart
  • Unsupervised probe by Burns et al. to predict latent truth representations; cited as related but limited in generalization
  • A pipeline employing controlled semantic oppositions to distill monosemantic functional features from sparse activation spaces.
  • Contrastconcept0.684
    The property that living structures contain intense contrast—far more than one imagines helpful; true opposites which annihilate each other when superimposed, creating differentiation that gives birth to something; contrast unifies rather than separates when used correctly
  • Method comparing brain activity in conscious vs. unconscious conditions.
  • Contrastive Pairsconcept0.674
    Pairs of prompts at different reflection levels used to compute steering vectors.