claim

active

claim:an-interplay-between-causal-abstraction-and-feature-geometry-deepens-mechanistic-understanding-of-language-models

An interplay between causal abstraction and feature geometry deepens mechanistic understanding of language models

Methodological claim about the scientific value of combining causal abstraction with representational geometry analysis

Source paper

extracted_from

Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts

(2026) · Sheridan Feucht · Tal Haklay · Usha Bhalla · Daniel Wurgaft +8

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Causal abstraction is not enough for mechanistic interpretability because it becomes vacuous without assumptions about how models encode informationclaim0.847
Central thesis of the paper
Causal abstraction implicitly relies on strong assumptions about feature encoding in DNNs, and becomes trivial without such assumptionsclaim0.832
Authors' interpretation connecting their proof to practical interpretability methodology
causal abstraction implicitly relies on strong assumptions about how features are encoded in deep neural networks (DNNs), and becomes trivial without such assumptionsquote0.818
Load-bearing formulation of the paper's central argument
How does representation geometry causally drive model behavior?question0.812
The central scientific question the paper addresses through the lens of interventional causality.
What can causal abstraction analyses tell us about how DNNs encode features if the methods themselves rely on encoding assumptions?question0.810
Circular dependency problem raised in discussion
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models (Marks et al., 2025)concept0.800
Cited as enabling precise behavioral control through SAE features, extending the same methodological line
Language models prefer reusing generic arithmetic mechanisms over learning task-specific modular computations even when task-specific geometry existsclaim0.798
Broader interpretive claim about LM learning bias inferred from the findings
We hypothesize that representation geometry drives model behavior — the geometric structure of internal representations causally shapes what models do externally.hypothesis0.797
The causal hypothesis motivating the use of causality (intervention) as the lens connecting representation and behavior geometry.