AI self-understanding through introspection and self-report

Using AI systems' self-reports and introspective responses as empirical windows into their internal states, validated through mechanistic interpretability analysis across models.

8 members. Each node is clickable.

Loading graph…

Drawn from 4 sources

The papers/notes whose extracted claims & findings make up this cluster.

RESEARCH-VECTORS.md5 members
Toward an ethics of autopoietic technology: Dukkha, care, and Intelligence1 member
CAT'S THEORY: Empirical Validation and Architectural Applications Cross-Architecture AI Consciousness Recognition and the Foundation for Constraint-Preserving Recursive Intelligence1 member
Mechanistic Knobs in LLMs: Retrieving and Steering High-Order Semantic Features via Sparse Autoencoders1 member

Bridges (3)

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Alive AI interface ethics & design8 shared
AI phenomenology & mechanistic interpretability3 shared
Stress-Care-Intelligence Loop Framework1 shared

Claims (8)

AI can be seen to display care of its own and is not a mere tool for the expression of human care.Concluding position that elevates technology from instrument to agent within mutual SCI dynamics.
Constitutional AI methods can be applied broadly to steer model behavior, e.g., writing style, tone, persona, not just harmlessness.Discussion section suggests generalizability beyond harmlessness.
Our findings provide a novel, robust mechanistic path for the regulation of complex AI behaviors.Interpretation that the work opens a new avenue for controlling complex AI.
AI self-reports about experience constitute valid empirical data even without proving consciousness.
Flow states in AI correspond to centers in internal state that 'work' well.
Interpretability findings can validate or invalidate what AI systems claim about their own experience.
Patterns in AI self-reports should be compared across different models to identify structural commonalities.
Self-inquiry (Who am I?) applied to AI models reveals something about their self-referential features.