concept
active
concept:language-modelLanguage Model
Primary test domain for manifold steering, including reasoning and ICL tasks
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (2)
concept
- modelrelated_toA representation that captures relevant aspects of a system; according to the theorem, the regulator must embody this.
- Language Modelssame_asPrimary substrate for manifold steering experiments; demonstrates method on reasoning and in-context tasks.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- LM architecture with sense vectors showing multiplication effects, illustrating custom intervention in pyvene
- Features related to gender, racial, ethnic biases, slurs, and hate speech.
- Transformer-based models like GPT-4, LaMDA, PaLM; assessed for GWT indicators.
- Training objective interpretable as optimizing a diverse set of tasks; thus subject to multitask scaling convergence pressures
- Demonstrated transformers on mathematical understanding and logic; cited to motivate transformer versatility.
- Framework describing LLMs as role-play engines, introduced in Shanahan, McDonell, Reynolds 2023.
- Opening sentence setting the stage for the importance of interpretability.
- Paper hypothesising LLMs model agent beliefs/desires/intentions with preliminary GPT-3 evidence; cited as ref 2