concept
active
concept:causal-importance

Causal importance

A measure of whether a subcomponent is necessary to reproduce model behavior on a specific prompt, predicted by the causal importance network.

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • causal importance network
    implementsrelated_to
    Auxiliary model trained alongside VPD to predict which subcomponents are causally important for each prompt, enabling mechanistic isolation of components.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Causal powerconcept0.798
    The ability of an agent to be a driver of subsequent events; a hallmark of cognition that causal emergence quantifies.
  • Causal Invarianceconcept0.795
    Property that causal mechanisms remain stable across environments; desirable for OOD.
  • Causal Mechanismconcept0.795
    Function determining the value of a variable based on its causal parents in an acyclic causal model.
  • Causal abstractionconcept0.794
    A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
  • Probability that effect would not occur without cause; counterfactual primitive.
  • The structural-realist grounding for self-evidencing after the bounded self is relinquished.
  • Framework informing path-specific objectives by identifying causal chains leading to risky behaviors
  • Causal Tracingconcept0.774
    Mechanistic interpretability technique for locating factual associations, mentioned as future work direction.