LLM interpretability & self-awareness

Methods for probing, explaining, and evaluating internal representations and reflective behaviors in large language models.

12 members. Each node is clickable.

Loading graph…

Drawn from 12 sources

The papers/notes whose extracted claims & findings make up this cluster.

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.