paper
active
paper:bushnaq-goodfire-vpd-parameters-2026Interpreting Language Model Parameters
/Users/antonborzov/Documents/Research.nosync/papers/bushnaq-goodfire-vpd-parameters-2026.mdExternal IDs
title_hash
2a6ce006deb380f9f9adca64fa4767451bf3b7fblegacy_slug
bushnaq-goodfire-vpd-parameters-2026Frontmatter (16 fields)
{
"url": "https://www.goodfire.ai/research/interpreting-lm-parameters",
"code": "https://github.com/goodfire-ai/param-decomp",
"tags": [
"mechanistic-interpretability",
"parameter-decomposition",
"VPD",
"adversarial-ablation",
"SAE-alternative",
"sparse-circuits",
"goodfire"
],
"year": 2026,
"saved": "2026-05-14",
"title": "Interpreting Language Model Parameters",
"venue": "Goodfire research post",
"status": "full-text-saved",
"authors": [
"Lucius Bushnaq",
"Dan Braun",
"Oliver Clive-Griffin",
"Bart Bussmann",
"Nathan Hu",
"Michael Ivanitskiy",
"Linda Linsefors",
"Lee Sharkey"
],
"published": "2026-05-05",
"affiliation": "Goodfire (+ MATS, Independent)",
"fulltext_md": "https://static.goodfire.ai/vpd-blog-post/post.md",
"openalex_year": 2022,
"local_fulltext": "bushnaq-goodfire-vpd-2026-fulltext.md",
"openalex_enriched_at": 1778974396,
"openalex_cited_by_count": 804
}Outgoing (6)
Associated with (2)
- Goodfire(institute)
- Lindsey Introspective Awareness 2026(paper)
Implements (2)
- Parameter Decomposition(concept)
- VPD (adVersarial Parameter Decomposition)(concept)
Member of (2)
- LLM Introspection(community)
- Neural Geometry(community)
Incoming (0)
None.
References (30)
- Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologiesreferenced-only
- Proceedings of the 24th annual meeting on Association for Computational Linguisticsreferenced-only
- Assessing the Prognostic Significance of Tumor-Infiltrating Lymphocytes in Patients With Melanoma Using Pathologic Features Identified by Natural Language Processingreferenced-only
- The PASCAL Recognising Textual Entailment Challengereferenced-only
- MedSTS: a resource for clinical semantic textual similarityreferenced-only
- Structured Data Entry in the Electronic Medical Record: Perspectives of Pediatric Specialty Physicians and Surgeonsreferenced-only
- GPT-3: Its Nature, Scope, Limits, and Consequencesreferenced-only
- Pre-trained models for natural language processing: A surveyreferenced-only
- Multiple features for clinical relation extraction: A machine learning approachreferenced-only
- Advances in neural information processing systems 7referenced-only
- Deep learningreferenced-only
- Evaluation and accurate diagnoses of pediatric diseases using artificial intelligencereferenced-only
- MIMIC-III, a freely accessible critical care databasereferenced-only
- Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Researchreferenced-only
- Survey on Sentence Similarity Evaluation using Deep Learningreferenced-only
- BioBERT: a pre-trained biomedical language representation model for biomedical text miningreferenced-only
- Electronic health record adoption in US hospitals: the emergence of a digital “advanced use” dividereferenced-only
- CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelinesreferenced-only
- Identifying relations of medications with adverse drug events using recurrent convolutional neural networks and gradient boostingreferenced-only
- 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical textreferenced-only
- Natural language processing: an introductionreferenced-only
- Evaluating temporal relations in clinical text: 2012 i2b2 Challengereferenced-only
- BioCreative/OHNLP Challenge 2018referenced-only
- Domain-Specific Language Model Pretraining for Biomedical Natural Language Processingreferenced-only
- Efficient Transformers: A Surveyreferenced-only
- A question-entailment approach to question answeringreferenced-only
- A study of deep learning methods for de-identification of clinical notes in cross-institute settingsreferenced-only
- MedNLI — A Natural Language Inference Dataset For The Clinical Domainreferenced-only
- Semantics-Aware BERT for Language Understandingreferenced-only
- Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extractionreferenced-only
Mentions (3)
- papers
/Users/antonborzov/Documents/Research.nosync/papers/bushnaq-goodfire-vpd-parameters-2026.md - papers
bushnaq-goodfire-vpd-parameters-2026.md - papers
bushnaq-goodfire-vpd-2026-fulltext.md