Language Model

Primary test domain for manifold steering, including reasoning and ICL tasks

Neighborhood — ranked by edge-count

paper

concept

model
related_to
A representation that captures relevant aspects of a system; according to the theorem, the regulator must embody this.
Language Models
same_as
Primary substrate for manifold steering experiments; demonstrates method on reasoning and in-context tasks.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Backpack Language Modelsframework0.835
LM architecture with sense vectors showing multiplication effects, illustrating custom intervention in pyvene
Bias in language modelsconcept0.825
Features related to gender, racial, ethnic biases, slurs, and hate speech.
Large Language Models (LLMs)concept0.819
Transformer-based models like GPT-4, LaMDA, PaLM; assessed for GWT indicators.
Autoregressive Language Modelingconcept0.814
Training objective interpretable as optimizing a diverse set of tasks; thus subject to multitask scaling convergence pressures
Language models are few-shot learners (Brown et al., 2020)concept0.807
Demonstrated transformers on mathematical understanding and logic; cited to motivate transformer versatility.
Role-play model of large language modelsframework0.807
Framework describing LLMs as role-play engines, introduced in Shanahan, McDonell, Reynolds 2023.
Language models are some of the most remarkable computer programs in existence.quote0.801
Opening sentence setting the stage for the importance of interpretability.
Andreas 2022: Language models as agent modelsconcept0.800
Paper hypothesising LLMs model agent beliefs/desires/intentions with preliminary GPT-3 evidence; cited as ref 2