claim

active

claim:disruption-profiles-are-higher-quality-explanations-than-metadata-only-descriptions

Disruption profiles are higher quality explanations than metadata-only descriptions

Claim supported by the 3.8 vs 2.8 human rating finding.

Source paper

extracted_from

Explaining 4.2 million genetic variants with state-of-the-art, interpretable predictions

(2026) · Pearce, Michael · Dooms, Thomas · Yamamoto, Ryo · Meehl, Joshua +18

Neighborhood — ranked by edge-count

Papers (1)

paper

Explaining 4.2 million genetic variants with state-of-the-art, interpretable predictions
mentions

Findings (1)

finding

Disruption profiles scored 3.8/5 for explanation quality vs 2.8/5 for metadata-only baselines
supports
EVEE's mechanistic explanations significantly outperform simple metadata-based predictions in human evaluation.

Communities (3)

community

Mechanistic interpretability & model evaluation
members_of
Spans attention head decomposition, benchmark awareness, and genomic pathogenicity prediction via neural models.
Mechanistic explanation of genetic variant pathogenicity
members_of
Using genomic foundation model internals to generate disruption profiles that explain variant effects mechanistically, achieving 0.997 AUROC on ClinVar pathogenicity prediction.
Disruption profile explanation quality
members_of
Empirical evaluation showing disruption profiles outperform metadata-only baselines at 3.8 vs 2.8/5

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Disruption profilesmethod0.793
Mechanistic explanation outputs from EVEE showing how variants affect gene function, scored 3.8/5 for explanation quality.
disruption profileconcept0.765
The output explanation format of Evee, quantifying how a variant disrupts different genomic features.
Future models with substantially increased capabilities will exhibit alignment faking that is more consistent, robust, and harder to detecthypothesis0.731
Extrapolation from scale-emergence finding to future risk
Probe-based data attribution effectively reduces harmful behaviors via data interventionsclaim0.724
Authors' central interpretive assertion that their method meaningfully mitigates unwanted behaviors.
Synthetic document fine-tuning avoids artificially strengthening the evaluation-deployment representational direction compared to direct demonstration fine-tuningclaim0.717
Methodological justification for using SDF over direct demonstrations to train a realistic model organism.
Probe-based data attribution bridges interpretability work (probes/activations) with data-centric alignment.claim0.716
Conceptual framing: integrates mechanistic interpretability tools with alignment-focused data curation.
Larger hidden representations create more random structure that DAS can search through, allowing manipulation of counterfactual behavior even in randomly initialized networkshypothesis0.716
Tested in Section 4.4 calibration experiment; confirmed by findings.
NLA explanations confabulate false specifics but maintain thematic fidelity; claims repeated across tokens more likely true than isolated claims.claim0.715
Core limitation and usage heuristic: read NLAs for themes rather than individual factual claims; cross-check with original context.