quote
active
quote:functional-faithfulness-whereby-intervening-on-a-specific-internal-feature-induces-coherent-and-predictable-shifts-across-multiple-linguistic-dimensions-aligned-with-the-target-semantic-attribute

Functional Faithfulness, whereby intervening on a specific internal feature induces coherent and predictable shifts across multiple linguistic dimensions aligned with the target semantic attribute.

Definition of the newly named empirical effect.

Source paper

extracted_from
Mechanistic Knobs in LLMs: Retrieving and Steering High-Order Semantic Features via Sparse Autoencoders
(2026) · Ruikang Zhang · Shuo Wang · Q. Su

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.