finding

active

finding:identity-subspace-of-left-equality-model-achieves-0-50-iia-indicating-equality-relations-cannot-be-decomposed-into-input-identities

Identity Subspace of Left Equality model achieves ~0.50 IIA, indicating equality relations cannot be decomposed into input identities

DAS reveals that the network encodes abstract equality relations rather than storing identities of inputs.

Source paper

extracted_from

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

(2023) · Atticus Geiger · Zhengxuan Wu · Christopher Potts · Thomas Icard +1

Neighborhood — ranked by edge-count

Claims (1)

claim

The feed-forward network truly implements a symbolic, tree-structured algorithm for hierarchical equality, with abstract equality relations not decomposable into input identities
associated_withsupports
DAS reveals that the neural network encodes abstract relational structure rather than raw input identities.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

The two-dimensional subspace reported by Burger et al. (2024) seems to reflect a stage of transition in the model's processing, rather than a universal property of truth directions.quote0.757
Load-bearing interpretive claim about the layer-specificity of Burger et al.'s finding.
Lexical entailment representation decomposes into word identity sub-representations with ~0.97-0.98 IIA (Lexeme Subspace of Lexical Entailment)finding0.750
In contrast to hierarchical equality, lexical entailment in BERT decomposes into representations of word identities, not a single abstract relation.
The discovery of perfect abstract equality representations that cannot be decomposed into entity representations is a foundational result informing our understanding of how symbolic and connectionist architectures coexistclaim0.735
Concluding claim about theoretical significance of the hierarchical equality finding.
Identity of first argument algorithm IIA consistently hovers around 50% for all alignment map types on hierarchical equality taskfinding0.730
Exception to the general trend; attributed to insufficient RevNet capacity rather than algorithm not being implemented
The two-dimensional subspace reported by Burger et al. reflects a transitional phase in model processing rather than a universal property of truth directions.claim0.726
Reinterpretation of Burger et al.'s finding as layer-specific rather than universal.
Linear alignment map ϕ_lin shows substantial IIA decrease in third layer for both equality relations and left equality relation algorithms in hierarchical equality taskfinding0.721
Replicates Geiger et al. 2024b pattern of layer-dependent IIA degradation with linear maps
The correlation between emotion subspace fraction and self-evaluated emotionality validates that emotion probe concepts somewhat overlap with the model's self-reported internal emotions.claim0.716
Claim supporting the validity of the probe construction method via cross-validation with self-report
Truth may be linearly separable in the model's representation space, but the structure is richer than a single linear axisclaim0.716
Interpretive synthesis of DIM and cone intervention successes