Non-Linear Alignment Map (ϕ_nonlin)

Alignment map implemented as a reversible residual network (RevNet); assumes non-linear representation hypothesis

Neighborhood — ranked by edge-count

Papers (1)

paper

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
introduces

Concepts (1)

concept

Non-Linear Representation Hypothesis
implements
Hypothesis that information may be encoded in arbitrary non-linear subspaces of a neural network

Methods (2)

method

Linear Alignment Map (ϕ_lin)
related_to
Alignment map ϕ(h)=W_orth*h using orthogonal matrix; assumes linear representation hypothesis
Reversible Residual Network (RevNet)
uses
Bijective invertible architecture used to implement non-linear alignment maps ϕ_nonlin

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Non-linear alignment map ϕ_nonlin achieves near-optimal IIA across all layers on hierarchical equality task, eliminating layer-dependent degradation seen with linear mapsfinding0.866
Key empirical result: non-linear maps overcome linear maps' failure in deeper layers
Alignment Map (ϕ)concept0.834
The bijective function mapping DNN inner neurons to latent variables in causal abstraction; its complexity is the central variable studied
Identity Alignment Map (ϕ_id)method0.796
Simplest alignment map ϕ(h)=h, equivalent to assuming privileged bases hypothesis
Linear Map (a ⊸ b)framework0.764
Semantic domain for linear transformations; denotation as actual linear function; Category instance generated from homomorphism principle.
Linear alignment map ϕ_lin shows substantial IIA decrease in third layer for both equality relations and left equality relation algorithms in hierarchical equality taskfinding0.762
Replicates Geiger et al. 2024b pattern of layer-dependent IIA degradation with linear maps
Non-linear ϕ_nonlin achieves near-perfect IIA on distributive law task for both And-Or and And-Or-And algorithms, eliminating linear/identity map differencesfinding0.758
Corroborating result on additional task confirming main paper findings
Variational Family V for Alignment Mapsconcept0.751
Generalised notion restricting alignment maps to a family V; linearity is special case
Alignmentconcept0.743
The goal of making model behavior match human values and intentions, often addressed during post-training.