concept
active
concept:faithfulnessfaithfulness
The condition that commitments are fulfilled.
Neighborhood — ranked by edge-count
Concepts (2)
concept
- Functional Faithfulnessrelated_toEmpirical effect where intervening on one feature induces coherent shifts across multiple linguistic dimensions aligned with the target attribute.
- Faithfulness of Explanationsrelated_toProperty of explanations that accurately reflect the actual causal mechanisms of the model being explained.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Idea that decomposed components must causally reflect the model's computation; enforced by adversarial ablation in VPD.
- A correctness condition requiring assertions to align with the program's beliefs.
- Definition of the newly named empirical effect.
- A necessary state of mind for making living things, characterized by absence of self-importance and complete attention to the thing itself.
- Cited regarding possibility of encoding misaligned reasoning in benign chains-of-thought
- Empirical effect observed in feature intervention experiments.
- A set of evaluation criteria for AI assistants.
- Epigraph from Elmer Fudd cartoon; playful reference to program faithfulness in fulfilling commitments.