paper:arxiv-2508-10003Semantic structure in large language model embeddings
Original abstract (expand)
Psychological research consistently finds that human ratings of words across diverse semantic scales can be reduced to a low-dimensional form with relatively little information loss. We find that the semantic associations encoded in the embedding matrices of large language models (LLMs) exhibit a similar structure. We show that the projections of words on semantic directions defined by antonym pairs (e.g. kind - cruel) correlate highly with human ratings, and further find that these projections effectively reduce to a 3-dimensional subspace within LLM embeddings, closely resembling the patterns derived from human survey responses. Moreover, we find that shifting tokens along one semantic direction causes off-target effects on geometrically aligned features proportional to their cosine similarity. These findings suggest that semantic features are entangled within LLMs similarly to how they are interconnected in human language, and a great deal of semantic information, despite its apparent complexity, is surprisingly low-dimensional. Furthermore, accounting for this semantic structure may prove essential for avoiding unintended consequences when steering features.
Related work— refs + corpus + external arXiv
Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.
- Towards Uncovering How Large Language Model Works: An Explainability PerspectiveFan Yang, Bo Shen, Himabindu Lakkaraju, Mengnan Du Haiyan Zhao2024≈ 75%
- A Survey of Large Language ModelsKun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie and Ji-Rong Wen Wayne Xin Zhao2026≈ 75%
- Enhancing Cloud-Based Large Language Model Processing with Elasticsearch and Transformer ModelsJiang Wu, Hongbo Wang, Wenran Lu, Chenwei Zhang Chunhe Ni2024≈ 73%
- ≈ 73%
- Semantic Convergence: Investigating Shared Representations Across Scaled LLMsSanjana Rathore, Andrew Rufail, Adrian Simon, Daniel Zhang, Soham Dave, Cole Blondin, Kevin Zhu, Sean O'Brien Daniel Son2025≈ 73%
- Do Language Models Encode Semantic Relations? Probing and Sparse Feature AnalysisAnsgar Scherp Andor Diera2026≈ 73%
- Mechanistic Decomposition of Sentence RepresentationsVikram Natarajan, Jonathan Michala, Milton Lin, Juri Opitz Matthieu Tehenan2025≈ 73%
- ≈ 73%
- ≈ 73%
- Probing for Semantic Classes: Diagnosing the Meaning Content of Word EmbeddingsKatharina Kann, Timothy J. Hazen, Eneko Agirre and Hinrich Sch\"utze Yadollah Yaghoobzadeh2019≈ 73%
- ≈ 73%
- Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language ModelingNathan Schneider, Lingpeng Kong Jakob Prange2026≈ 72%
- Revealing emergent human-like conceptual representations from language predictionQi Zhang, Chao Du, Qiang Luo, Xipeng Qiu, Xuanjing Huang, Menghan Zhang Ningyu Xu2025≈ 72%
- What do Language Models Learn and When? The Implicit Curriculum HypothesisKaiser Sun, Millicent Li, Isabelle Lee, Lindia Tjuatja, Jen-tse Huang, Graham Neubig Emmy Liu2026≈ 72%
- Mechanistic Indicators of Understanding in Large Language ModelsPierre Beckmann and Matthieu Queloz2026≈ 72%
- The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasetsin corpus2023≈ 67%
- ≈ 67%
- Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representationsin corpus2023≈ 66%
- Interpreting Language Model Parametersin corpus2026≈ 66%
- ≈ 65%
- Model Alignment Searchin corpus2025≈ 65%
- ≈ 64%
- ≈ 64%
- The World Inside Neural Networksin corpus2026≈ 64%
- ≈ 63%
- Mechanistic Knobs in LLMs: Retrieving and Steering High-Order Semantic Features via Sparse Autoencodersin corpus2026≈ 63%
- ≈ 63%
Similar preprints — Semantic Scholar
Cited by (1)
- Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
Manifold steering — intervening on model activations along paths constrained to lie on a learned activation manifold M_h rather than along Euclidean linear directions — produces behavioral trajectorie