method
active
method:span-embedding-analysisspan embedding analysis
Extracting embeddings from instruction and example spans.
Neighborhood — ranked by edge-count
Methods (1)
method
- span embeddings extractionrelated_tosame_asObtain instruction and example span embeddings at layer L* with chosen pooling.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Framework for characterizing span-level information of sequences of representations, independent of any consciousness estimate; used as a comparison baseline.
- Preprocessing step that uses dev-set covariance to standardize embedding scales before computing ρd and dr.
- 2D embedding of feature direction vectors used to visualize feature clusters and splitting geometry
- The specific type of representation studied in the paper: function f: X→R^n assigning feature vectors to inputs
- PCA applied to token embedding and unembedding matrices to understand what fraction of residual stream dimensions they occupy and how they relate
- Lagged time series used to capture dynamical dependencies.
- Method concatenating boundary token vectors, their element-wise product, and difference to form span-level representations from (C)ARR.
- Baseline method for instruction discovery using surface-level input embedding similarity instead of steering vectors.