question
active
question:what-features-activate-when-we-ask-questions-probing-claude-s-goals-and-values

what features activate when we ask questions probing Claude's goals and values?

Direction for understanding model's internal objectives via features.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.