concept
active
concept:pure-feature

Pure Feature

A feature that responds to only a single latent variable, contrasted with polysemantic features

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • A neuron that responds to multiple unrelated inputs, posing a major challenge for circuit-level interpretation

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • A paradigm relying on recursion equations without assignment; Linda authors compare it on DNA sequence similarity problem.
  • Pure unityconcept0.783
    The ultimate condition of living structure where the whole becomes a single, undivided entity made of beings, all rooted in the same I.
  • Feature Sparsityconcept0.774
    Property that features activate on only a small fraction of inputs; enables compressed sensing and is what allows superposition to work
  • Metaphor treating each system feature or function as a separate application that can be independently loaded and managed.
  • Method of optimizing input to cause a neuron to fire maximally, used to characterize what a neuron detects; establishes causal link
  • Action Featuresconcept0.757
    Dual interpretation of features: in addition to responding to inputs, features also act to increase probability of specific output tokens
  • Property of features that form consistently across different models trained on the same or similar data, suggesting features are real representational units
  • The extracted set of sparse interpretable features from model embeddings via SAEs