method
active
method:normalized-indirect-effectNormalized Indirect Effect
Metric for intervention effectiveness: 0 = ineffective, 1 = full flip of model output from false to true or vice versa
Neighborhood — ranked by edge-count
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Metric for causal intervention experiments: 0 = wholly ineffective intervention, 1 = intervention causes model to label false statements as TRUE with the same confidence as genuine true statements
- Unintended changes in model behavior when performing edits; compared between VPD editing and fine-tuning.
- Assumptions or preferences (e.g., parsimony) that determine how a learning system generalizes beyond training data
- Problem cited as a limitation of current LLMs; PRH predicts larger models should amplify bias less
- Normalized EI bounded 0-1, decomposed into determinism minus degeneracy.
- Graded notion of causal abstraction measured by IIA; when IIA is alpha < 100%, the model is alpha-on-average approximately abstract.
- An emergent ordering created by a well-arranged structure of centers of different sizes, binding them into a whole.