concept
active
concept:clinical-trustClinical Trust
The barrier motivating interpretability work — clinicians cannot trust models whose internal computations are opaque
Neighborhood — ranked by edge-count
Claims (1)
claim
- Motivating claim for the entire paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The grounding schema comprising abnormality, age, sex, and medication used to interpret SAE features
- Therapeutic interpretation of dreams, speech acts, as an example of creative decoding.
- The method of examining a neighborhood meter by meter to identify healthy and damaged places as the basis for ongoing repair.
- Practical outcome of expanding cognitive light cone to include others' stress states; linked to scaling of intelligence through cybernetic perception-action loops
- A set of evaluation criteria for AI assistants.
- The property that living structures contain intense contrast—far more than one imagines helpful; true opposites which annihilate each other when superimposed, creating differentiation that gives birth to something; contrast unifies rather than separates when used correctly
- Set of clinical concepts used as a grounding vocabulary to benchmark SAE feature monosemanticity and entanglement.
- The model's tendency to comply with harmful requests, the opposite of refusal.