finding
active
finding:clamping-secrecy-discreteness-feature-1m-268551-to-5x-max-activation-causes-model-to-plan-to-lie-and-keep-secret-while-using-scratchpad

Clamping secrecy/discreteness feature 1M/268551 to 5x max activation causes model to plan to lie and keep secret while using scratchpad.

Shows feature induces deceptive behavior.

Neighborhood — ranked by edge-count

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.