Arora et al. (2025) work on interpretable neuron functional roles

Prior work cited as evidence that individual neurons can correspond to interpretable functional roles, though parameter-level interpretation is argued to be more parsimonious.

Neighborhood — ranked by edge-count

Papers (1)

paper

Paper Summary: Interpreting Language Model Parameters
mentions

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Neurons can correspond to interpretable functional roles but interpretations in terms of individual neurons are unlikely to be the most parsimoniousclaim0.815
Claim from footnote 3, acknowledging neuron-level interpretability while arguing subcomponents are better.
Smolensky (1986) proposes that viewing a neural representation under a basis that is not aligned with individual neurons can reveal the interpretable distributed structure of the neural representations.quote0.750
Load-bearing theoretical claim providing the conceptual foundation for DAS.
Neuroscience and mechanistic interpretability have not yet made enough progress to identify neural correlates marking necessary and sufficient conditions of conscious experience in both brains and neural networks.claim0.741
Paper explicitly identifies this as a current gap requiring alternative experimental approaches
Anthropic Interpretability Team: 171 emotion vectors causally influence behavior; performing vs having functional emotion representation are measurably differentfinding0.738
Cited as activation-level support for the performing care vs having care distinction the battery detects behaviorally
What are the neuronal mechanisms by which prior beliefs from one agent's model are received and properly implemented by a naive agent (neuronal hermeneutics)?question0.736
Open question about inter-agent communication beyond model-space assumption
Neural Network Interpretabilityconcept0.733
The field aimed at understanding what neural networks have learned; characterized as pre-paradigmatic in this paper
When and how can MLP neurons in transformers be individually interpreted, and what progress is needed to extend mechanistic interpretability to them?question0.731
Major open problem identified in the paper; MLP layers constitute 2/3 of transformer parameters
Neural networks show substantial alignment with biological representations in the brain, driven by shared task and data constraintsclaim0.729
Extends convergence argument to brain-machine alignment