concept
active
concept:anisotropy-in-language-modelsAnisotropy in Language Models
Property of smaller saturated models making hidden states harder to access via alignment maps
Neighborhood — ranked by edge-count
Findings (1)
finding
- Attributed to model anisotropy from saturation making hidden states harder to access
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Primary substrate for manifold steering experiments; demonstrates method on reasoning and in-context tasks.
- Features related to gender, racial, ethnic biases, slurs, and hate speech.
- Primary test domain for manifold steering, including reasoning and ICL tasks
- Training objective interpretable as optimizing a diverse set of tasks; thus subject to multitask scaling convergence pressures
- Paper hypothesising LLMs model agent beliefs/desires/intentions with preliminary GPT-3 evidence; cited as ref 2
- Follow-up on empirical grounding; answered 'no one looked yet'.
- Core cross-modal empirical result: larger and better language models align better with vision models