claim
active
claim:sae-features-generalize-to-images-despite-training-only-on-text-indicating-out-of-distribution-robustness

SAE features generalize to images despite training only on text, indicating out-of-distribution robustness.

A promising property for interpretability analysis off-distribution.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.