community

active

leiden_hybrid_concepts

label: haiku

community:leiden_hybrid_concepts-run4-c0-c2-c1

Introspective awareness activation in language models

Studying how concept injection and random vectors trigger self-reflective capabilities in LLMs across varying strength parameters.

3 members. Each node is clickable.

Loading graph…

Drawn from 1 source

The papers/notes whose extracted claims & findings make up this cluster.

Emergent Introspective Awareness in Large Language Models3 members

Bridges (3)

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Mechanistic interpretability & model evaluation3 shared
Mechanistic introspection in language models3 shared
LLM functional introspective awareness1 shared

Findings (2)

Concept injection at strength 2 does not increase affirmative responses on unrelated yes/no questionsControl experiment rules out the possibility that concept vectors simply bias the model to answer affirmatively.
Random vectors at injection strength 8 elicit introspective awareness in 9 out of 100 trialsRandom vectors are less effective, and even then produce introspection at lower rates.

Claims (1)

Modern language models possess at least a limited, functional form of introspective awarenessThe paper's central interpretive assertion.