concept
active
concept:feature-interference

Feature Interference

When non-orthogonal features cause logistic regression to identify a suboptimal probe direction

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • The direction logistic regression converges to on linearly separable data; shown to be suboptimal for identifying truth direction

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Logit weight contributions from a feature that arise due to superposition with other features, not from the feature's own causal role
  • Asymmetric transfer after fine-tuning: high-density bases (B10) are more robust.
  • Feature Sparsityconcept0.756
    Property that features activate on only a small fraction of inputs; enables compressed sensing and is what allows superposition to work
  • Feature splittingconcept0.745
    Phenomenon where a feature in a small SAE splits into multiple finer features in a larger SAE.
  • Used to knock down ion channel or gap junction genes to perturb bioelectric circuits.
  • Pure Featureconcept0.739
    A feature that responds to only a single latent variable, contrasted with polysemantic features
  • Method of optimizing input to cause a neuron to fire maximally, used to characterize what a neuron detects; establishes causal link
  • Domain of techniques for constructing informative features from raw data; covariance pooling is a feature engineering method for token sequences.