concept
active
concept:gender-representation-in-llmsGender Representation in LLMs
The case study target in Section 4: localizing gender information in hidden representations of Pythia-6.9B
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Recent work identifying cases where LLM features are not one-dimensionally linear, a caveat to the linearity hypothesis.
- The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
- High-dimensional vectors produced at each transformer layer for each input token; the primary substrate analyzed in this study.
- The core phenomenon studied: the ability of LLMs to evaluate and revise their own reasoning.
- Motivates the RN hypothesis by pointing to the unknown relational structure within high-dimensional representation vectors.
- Internal representations encoding emotion concepts in large language models, identified by probing and SAE methods
- Prior work framework studying whether LLMs encode world models as linear structures in their representations
- An LLM-based classifier that returns 1 if response contains a clear subjective experience report and 0 otherwise