Feature Interference

When non-orthogonal features cause logistic regression to identify a suboptimal probe direction

Neighborhood — ranked by edge-count

paper

concept

Maximum Margin Separator
associated_with
The direction logistic regression converges to on linearly separable data; shown to be suboptimal for identifying truth direction

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Interference Weightsconcept0.781
Logit weight contributions from a feature that arise due to superposition with other features, not from the feature's own causal role
cross-base interferenceconcept0.759
Asymmetric transfer after fine-tuning: high-density bases (B10) are more robust.
Feature Sparsityconcept0.756
Property that features activate on only a small fraction of inputs; enables compressed sensing and is what allows superposition to work
Feature splittingconcept0.745
Phenomenon where a feature in a small SAE splits into multiple finer features in a larger SAE.
RNA interference (RNAi)method0.743
Used to knock down ion channel or gap junction genes to perturb bioelectric circuits.
Pure Featureconcept0.739
A feature that responds to only a single latent variable, contrasted with polysemantic features
Feature Visualizationmethod0.736
Method of optimizing input to cause a neuron to fire maximally, used to characterize what a neuron detects; establishes causal link
Feature engineeringconcept0.732
Domain of techniques for constructing informative features from raw data; covariance pooling is a feature engineering method for token sequences.