finding
active
finding:attribution-graph-reveals-a-pathway-that-detects-the-verb-lost-and-upweights-object-pronounsAttribution graph reveals a pathway that detects the verb 'lost' and upweights object pronouns
Second component of the subnetwork for 'her', complementing the femaleness signal.
Source paper
extracted_fromNeighborhood — ranked by edge-count
Claims (1)
claim
- Central claim that VPD successfully uncovers genuine mechanisms.
Communities (4)
community
- Spans attention head decomposition, benchmark awareness, and genomic pathogenicity prediction via neural models.
- Tracing information flow through weight matrices and attention heads using attribution graphs to identify causally important subcomponents in language models.
- Tracing information flow through parameter subcomponents to isolate computational mechanisms for specific model predictions, using tools like attribution graphs and VPD.
- Mechanistic tracing of information flow through attention and MLP subcomponents for pronoun prediction tasks
Concepts (1)
concept
- Object pronoun upweightingassociated_withThe other pathway in the 'her' subnetwork, where the verb 'lost' upweights object pronouns (including 'her').
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- One component of the minimal subnetwork for predicting 'her', discovered via VPD attribution graph.
- Gradient-based technique using SAE features to estimate causal effects on completions; used to corroborate NLA findings.
- Shows how VPD-identified subnetworks can be analyzed to reveal interpretable pathways of computation (e.g., gender signal routing, syntactic role detection).
- Feature attribution (gradient-based) correlates 0.8 with ablation effects on the 'John' and 'Kobe' examples.finding0.747Validation of attribution as a fast proxy for causal importance.
- Extrapolation of scaling predictive models to AGI.
- Stronger version: all cognition attributions rely on observable behavior.
- Critical verbatim statement highlighting the universal inference basis of sentience.