framework
active
framework:linear-world-models-in-llmsLinear World Models in LLMs
Prior work framework studying whether LLMs encode world models as linear structures in their representations
Neighborhood — ranked by edge-count
Papers (1)
paper
Frameworks (1)
framework
- Mass-Mean ProbingextendsIntroduced in this paper: an optimization-free probing technique using difference-in-means direction with optional covariance correction
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Primary statistical model with random intercept by conversation, REML estimation, for pooled conversation-turn observations
- The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
- Transformer-based models like GPT-4, LaMDA, PaLM; assessed for GWT indicators.
- Recent work identifying cases where LLM features are not one-dimensionally linear, a caveat to the linearity hypothesis.
- Core cross-modal empirical result: larger and better language models align better with vision models
- Interpretive claim connecting scale to abstraction level in LLM representations
- The core phenomenon studied: the ability of LLMs to evaluate and revise their own reasoning.
- Ngo & Kim result extending cross-modal convergence to the auditory domain