Influence Functions

An interpretability approach mentioned as one of several alternatives to the mechanistic approach taken in this paper

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Radius of influenceconcept0.782
The maximum distance over which a cell alters the future fate of another cell; used to quantify the cognitive light cone.
Functionconcept0.752
The practical, working aspect of a building; reinterpreted as the dynamic harmony of moving centers.
Causal Influence Diagramsframework0.729
Framework informing path-specific objectives by identifying causal chains leading to risky behaviors
Alignment Functionconcept0.723
A learnable invertible transformation in DAS that maps neural representations to a basis aligned with causal variables
a and c functionsmethod0.719
Assignment and contents functions for state manipulation in Algol 50, from McCarthy 1963.
Reward Functionconcept0.718
In RL, a scalar signal from the environment that defines the agent's goal; in active inference, reward is just another observation with associated preference.
cognitive functionsconcept0.714
Functions such as memory, attention, perception, and sentience that can be realized by diverse substrates.
Dependmethod0.706
Attribute: attachment with issues of reliance, a text depending on another for meaning.