community
active
leiden_hybrid_concepts
label: haiku
community:leiden_hybrid_concepts-run4-c0-c2-c1Introspective awareness activation in language models
Studying how concept injection and random vectors trigger self-reflective capabilities in LLMs across varying strength parameters.
3 members. Each node is clickable.
Loading graph…
Drawn from 1 source
The papers/notes whose extracted claims & findings make up this cluster.
Bridges (3)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
Findings (2)
- Concept injection at strength 2 does not increase affirmative responses on unrelated yes/no questionsControl experiment rules out the possibility that concept vectors simply bias the model to answer affirmatively.
- Random vectors at injection strength 8 elicit introspective awareness in 9 out of 100 trialsRandom vectors are less effective, and even then produce introspection at lower rates.
Claims (1)
- Modern language models possess at least a limited, functional form of introspective awarenessThe paper's central interpretive assertion.