finding
active
finding:most-correlated-neuron-a-neurons-470-has-correlation-of-only-0-18-with-base64-feature-a-1-2357-and-responds-to-code-html-labels-urls

Most correlated neuron A/neurons/470 has correlation of only 0.18 with base64 feature A/1/2357 and responds to code, HTML labels, URLs

Shows base64 feature is polysemantic at neuron level but monosemantic as learned feature

Source paper

extracted_from
Towards Safe and Honest AI Agents with Neural Self-Other Overlap
(2024) · Marc Carauleanu · Michael Vaiana · Judd Rosenblatt · Cameron Berg +1

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.