Clinical Trust

The barrier motivating interpretability work — clinicians cannot trust models whose internal computations are opaque

Neighborhood — ranked by edge-count

claim

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Clinical Taxonomyconcept0.727
The grounding schema comprising abnormality, age, sex, and medication used to interpret SAE features
psychoanalysismethod0.710
Therapeutic interpretation of dreams, speech acts, as an example of creative decoding.
Diagnosismethod0.702
The method of examining a neighborhood meter by meter to identify healthy and damaged places as the basis for ongoing repair.
Compassionconcept0.698
Practical outcome of expanding cognitive light cone to include others' stress states; linked to scaling of intelligence through cybernetic perception-action loops
Helpful, Honest, Harmlessframework0.689
A set of evaluation criteria for AI assistants.
Contrastconcept0.686
The property that living structures contain intense contrast—far more than one imagines helpful; true opposites which annihilate each other when superimposed, creating differentiation that gives birth to something; contrast unifies rather than separates when used correctly
Clinical Taxonomy: abnormality, age, sex, medicationconcept0.683
Set of clinical concepts used as a grounding vocabulary to benchmark SAE feature monosemanticity and entanglement.
complianceconcept0.682
The model's tendency to comply with harmful requests, the opposite of refusal.