method
active
method:non-linear-alignment-map-nonlinNon-Linear Alignment Map (ϕ_nonlin)
Alignment map implemented as a reversible residual network (RevNet); assumes non-linear representation hypothesis
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- Non-Linear Representation HypothesisimplementsHypothesis that information may be encoded in arbitrary non-linear subspaces of a neural network
Methods (2)
method
- Linear Alignment Map (ϕ_lin)related_toAlignment map ϕ(h)=W_orth*h using orthogonal matrix; assumes linear representation hypothesis
- Bijective invertible architecture used to implement non-linear alignment maps ϕ_nonlin
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Key empirical result: non-linear maps overcome linear maps' failure in deeper layers
- The bijective function mapping DNN inner neurons to latent variables in causal abstraction; its complexity is the central variable studied
- Simplest alignment map ϕ(h)=h, equivalent to assuming privileged bases hypothesis
- Semantic domain for linear transformations; denotation as actual linear function; Category instance generated from homomorphism principle.
- Replicates Geiger et al. 2024b pattern of layer-dependent IIA degradation with linear maps
- Corroborating result on additional task confirming main paper findings
- Generalised notion restricting alignment maps to a family V; linearity is special case
- The goal of making model behavior match human values and intentions, often addressed during post-training.