community
active
leiden_hybrid_concepts
label: sonnet
community:leiden_hybrid_concepts-run2-c23

Natural Language Auditing of Neural Models

NLA explanations used as steering vectors and auditing tools to investigate model beliefs and misalignment.

10 members. Each node is clickable.

Loading graph…

Claims (6)

Findings (4)