Normalized Indirect Effect

Metric for intervention effectiveness: 0 = ineffective, 1 = full flip of model output from false to true or vice versa

Neighborhood — ranked by edge-count

paper

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Normalized Indirect Effect (NIE)concept0.880
Metric for causal intervention experiments: 0 = wholly ineffective intervention, 1 = intervention causes model to label false statements as TRUE with the same confidence as genuine true statements
Off-target effectsconcept0.737
Unintended changes in model behavior when performing edits; compared between VPD editing and fine-tuning.
Inductive Biasconcept0.731
Assumptions or preferences (e.g., parsimony) that determine how a learning system generalizes beyond training data
What Is The Nature Of The Effects Thatquestion0.730
Bias Amplificationconcept0.729
Problem cited as a limitation of current LLMs; PRH predicts larger models should amplify bias less
Effect Coefficient (Eff)concept0.728
Normalized EI bounded 0-1, decomposed into determinism minus degeneracy.
Approximate Causal Abstractionconcept0.724
Graded notion of causal abstraction measured by IIA; when IIA is alpha < 100%, the model is alpha-on-average approximately abstract.
Field effectconcept0.724
An emergent ordering created by a well-arranged structure of centers of different sizes, binding them into a whole.