finding
active
finding:pinning-a-1-3450-to-maximum-observed-value-causes-model-to-generate-arabic-text-from-numeric-prefix-context

Pinning A/1/3450 to maximum observed value causes model to generate Arabic text from numeric prefix context

Causal validation that the Arabic feature has the predicted downstream effect on generation

Source paper

extracted_from
Towards Safe and Honest AI Agents with Neural Self-Other Overlap
(2024) · Marc Carauleanu · Michael Vaiana · Judd Rosenblatt · Cameron Berg +1

Neighborhood — ranked by edge-count

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.