method
active
method:normalized-indirect-effect

Normalized Indirect Effect

Metric for intervention effectiveness: 0 = ineffective, 1 = full flip of model output from false to true or vice versa

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Metric for causal intervention experiments: 0 = wholly ineffective intervention, 1 = intervention causes model to label false statements as TRUE with the same confidence as genuine true statements
  • Off-target effectsconcept0.737
    Unintended changes in model behavior when performing edits; compared between VPD editing and fine-tuning.
  • Inductive Biasconcept0.731
    Assumptions or preferences (e.g., parsimony) that determine how a learning system generalizes beyond training data
  • Bias Amplificationconcept0.729
    Problem cited as a limitation of current LLMs; PRH predicts larger models should amplify bias less
  • Normalized EI bounded 0-1, decomposed into determinism minus degeneracy.
  • Graded notion of causal abstraction measured by IIA; when IIA is alpha < 100%, the model is alpha-on-average approximately abstract.
  • Field effectconcept0.724
    An emergent ordering created by a well-arranged structure of centers of different sizes, binding them into a whole.