paper:arxiv-2304-14767Dissecting recall of factual associations in auto-regressive language models
Related work— refs + corpus + external arXiv
Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.
- Tracing Facts or just Copies? A critical investigation of the Competitions of Mechanisms in Large Language ModelsYanxu Chen, Sander Hoffman, Maria Heuss Dante Campregher2025≈ 75%
- Across the Levels of Analysis: Explaining Predictive Processing in Humans Requires More Than Machine-Estimated ProbabilitiesSathvik Nair and Colin Phillips2026≈ 75%
- DataDignity: Training Data Attribution for Large Language ModelsAndrzej Banburski-Fahey, Jaron Lanier Xiaomin Li2026≈ 75%
- Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics?Aaron Mueller Zhengyang Shan2025≈ 75%
- Causal Evidence that Language Models use Confidence to Drive BehaviorNathaniel Daw, Simon Osindero, Petar Velickovic, Viorica Patraucean Dharshan Kumaran2026≈ 75%
- Perceptions of Linguistic Uncertainty by Language Models and HumansMarkelle Kelly, Mark Steyvers, Sameer Singh, Padhraic Smyth Catarina G Belem2024≈ 74%
- Evaluating Large Language Models with PsychometricsYue Huang, Hongyi Wang, Ying Cheng, Xiangliang Zhang, James Zou, Lichao Sun Yuan Li2025≈ 74%
- Failing to Falsify: Evaluating and Mitigating Confirmation Bias in Language ModelsAnthony GX-Chen, Ilia Sucholutsky, Eunsol Choi Ayush Rajesh Jhaveri2026≈ 74%
- Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language ModelsMariam Mahran Katharina Simbeck2025≈ 74%
- A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language ModelsXuansheng Wu, Haiyan Zhao, Daking Rai, Ziyu Yao, Ninghao Liu, Mengnan Du Dong Shu2025≈ 74%
- Dissecting Bias in LLMs: A Mechanistic Interpretability PerspectiveZubair Bashir, Procheta Sen Bhavik Chandna2025≈ 74%
- Mechanistic Knobs in LLMs: Retrieving and Steering High-Order Semantic Features via Sparse Autoencodersin corpus2026≈ 74%
- Analyze Feature Flow to Enhance Interpretation and Steering in Language ModelsNikita Balagansky, Yaroslav Aksenov, Daniil Gavrilov Daniil Laptev2025≈ 74%
- Nationality encoding in language model hidden states: Probing culturally differentiated representations in persona-conditioned academic textRuizhe Li (2), and Elspeth Edelstein (3) ((1) Language Centre, School of Language, Literature, Music and Visual Culture, University of Aberdeen, United Kingdom, (2) School of Natural and Computing Sciences, University of Aberdeen, United Kingdom, (3) School of Language, Literature, Music and Visual Culture, University of Aberdeen, United Kingdom) Paul Jackson (1)2026≈ 74%
- ≈ 74%
- ≈ 74%
- Reanalyzing L2 Preposition Learning with Bayesian Mixed Effects and a Pretrained Language ModelJakob Prange and Man Ho Ivy Wong2026≈ 74%
- ≈ 73%
- ≈ 73%
- Model Alignment Searchin corpus2025≈ 72%
- Interpreting Language Model Parametersin corpus2026≈ 72%
- ≈ 71%
- ≈ 71%
- ≈ 70%
- Verbalized Eval Awareness Inflates Measured Safetyin corpus2026≈ 70%
- Quantitative Introspection in Language Models: Tracking Emotive States Across Conversationin corpus2026≈ 70%
- ≈ 70%
- ≈ 70%
Similar preprints — Semantic Scholar
Cited by (4)
- pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
pyvene is an open-source Python library that unifies intervention-based research on PyTorch neural models by treating the intervention itself—rather than model surgery code—as the primitive abstractio
- The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
At sufficient scale, LLMs linearly represent the truth or falsehood of factual statements in their internal activations — a claim supported by PCA visualizations, cross-dataset probe transfer, and cau
- Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
Manifold steering — intervening on model activations along paths constrained to lie on a learned activation manifold M_h rather than along Euclidean linear directions — produces behavioral trajectorie
- Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts
Llama-3.1-8B solves cyclic arithmetic (e.g., "what month is six months after August?") not by performing modular addition in the period of the cyclic concept (12 for months, 7 for days of the week) as