Semantic structure in large language model embeddings

ByAustin C. Kozlowski·Callin Dai·Andrei Boutyline

DOI 10.48550/arxiv.2508.10003 arXiv 2508.10003

Original abstract (expand)

Psychological research consistently finds that human ratings of words across diverse semantic scales can be reduced to a low-dimensional form with relatively little information loss. We find that the semantic associations encoded in the embedding matrices of large language models (LLMs) exhibit a similar structure. We show that the projections of words on semantic directions defined by antonym pairs (e.g. kind - cruel) correlate highly with human ratings, and further find that these projections effectively reduce to a 3-dimensional subspace within LLM embeddings, closely resembling the patterns derived from human survey responses. Moreover, we find that shifting tokens along one semantic direction causes off-target effects on geometrically aligned features proportional to their cosine similarity. These findings suggest that semantic features are entangled within LLMs similarly to how they are interconnected in human language, and a great deal of semantic information, despite its apparent complexity, is surprisingly low-dimensional. Furthermore, accounting for this semantic structure may prove essential for avoiding unintended consequences when steering features.

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

Towards Uncovering How Large Language Model Works: An Explainability Perspective
Fan Yang, Bo Shen, Himabindu Lakkaraju, Mengnan Du Haiyan Zhao
2024
≈ 75%
A Survey of Large Language Models
Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie and Ji-Rong Wen Wayne Xin Zhao
2026
≈ 75%
Enhancing Cloud-Based Large Language Model Processing with Elasticsearch and Transformer Models
Jiang Wu, Hongbo Wang, Wenran Lu, Chenwei Zhang Chunhe Ni
2024
≈ 73%
Emergent Semantic Role Understanding in Language Models
Mirco Musolesi Carla Griffiths
2026
≈ 73%
Semantic Convergence: Investigating Shared Representations Across Scaled LLMs
Sanjana Rathore, Andrew Rufail, Adrian Simon, Daniel Zhang, Soham Dave, Cole Blondin, Kevin Zhu, Sean O'Brien Daniel Son
2025
≈ 73%
Do Language Models Encode Semantic Relations? Probing and Sparse Feature Analysis
Ansgar Scherp Andor Diera
2026
≈ 73%
Mechanistic Decomposition of Sentence Representations
Vikram Natarajan, Jonathan Michala, Milton Lin, Juri Opitz Matthieu Tehenan
2025
≈ 73%
Bootstrapping Cognitive Agents with a Large Language Model
Reid Simmons Feiyu Zhu
2026
≈ 73%
Interpreting Language Models Through Concept Descriptions: A Survey
Laura Kopf Nils Feldhus
2026
≈ 73%
Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings
Katharina Kann, Timothy J. Hazen, Eneko Agirre and Hinrich Sch\"utze Yadollah Yaghoobzadeh
2019
≈ 73%
Do Multilingual LLMs Think In English?
Yarin Gal and Sebastian Farquhar Lisa Schut
2025
≈ 73%
Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling
Nathan Schneider, Lingpeng Kong Jakob Prange
2026
≈ 72%
Revealing emergent human-like conceptual representations from language prediction
Qi Zhang, Chao Du, Qiang Luo, Xipeng Qiu, Xuanjing Huang, Menghan Zhang Ningyu Xu
2025
≈ 72%
What do Language Models Learn and When? The Implicit Curriculum Hypothesis
Kaiser Sun, Millicent Li, Isabelle Lee, Lindia Tjuatja, Jen-tse Huang, Graham Neubig Emmy Liu
2026
≈ 72%
Mechanistic Indicators of Understanding in Large Language Models
Pierre Beckmann and Matthieu Queloz
2026
≈ 72%
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
in corpus
2023
≈ 67%
Paper Summary: Interpreting Language Model Parameters
in corpus
≈ 67%
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
in corpus
2023
≈ 66%
Interpreting Language Model Parameters
in corpus
2026
≈ 66%
The Guanyin Protocol: A Framework for Immediately Establishing an Understanding of Both Causality and Compassion in LLM Systems Using Semantic Anchoring
in corpus
2025
≈ 65%
Model Alignment Search
in corpus
2025
≈ 65%
Large Language Models Report Subjective Experience Under Self-Referential Processing
in corpus
2025
≈ 64%
Can "consciousness" be observed from large language model (LLM) internal states? Dissecting LLM representations obtained from Theory of Mind test with Integrated Information Theory and Span Representation analysis
in corpus
2025
≈ 64%
The World Inside Neural Networks
in corpus
2026
≈ 64%
Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders
in corpus
2026
≈ 63%
Mechanistic Knobs in LLMs: Retrieving and Steering High-Order Semantic Features via Sparse Autoencoders
in corpus
2026
≈ 63%
Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
in corpus
≈ 63%

Similar preprints — Semantic Scholar

Cited by (1)

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
Manifold steering — intervening on model activations along paths constrained to lie on a learned activation manifold M_h rather than along Euclidean linear directions — produces behavioral trajectorie