Gender Representation in LLMs

The case study target in Section 4: localizing gender information in hidden representations of Pythia-6.9B

Neighborhood — ranked by edge-count

paper

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Non-Linear Representations in LLMsconcept0.811
Recent work identifying cases where LLM features are not one-dimensionally linear, a caveat to the linearity hypothesis.
Linear Representation of Concepts in LLMsconcept0.796
The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
LLM Internal Representationsconcept0.792
High-dimensional vectors produced at each transformer layer for each input token; the primary substrate analyzed in this study.
Reflection in LLMsconcept0.775
The core phenomenon studied: the ability of LLMs to evaluate and revise their own reasoning.
What is the relationship between different dimensions or clusters of dimensions in LLM representations? Do they and/or how do they interact with each other?question0.758
Motivates the RN hypothesis by pointing to the unknown relational structure within high-dimensional representation vectors.
Emotion Features in LLMsconcept0.744
Internal representations encoding emotion concepts in large language models, identified by probing and SAE methods
Linear World Models in LLMsframework0.743
Prior work framework studying whether LLMs encode world models as linear structures in their representations
LLM Judge Binary Classifiermethod0.739
An LLM-based classifier that returns 1 if response contains a clear subjective experience report and 0 otherwise