community
active
leiden_hybrid_concepts
label: haiku
community:leiden_hybrid_concepts-run4-c6-c2AI self-understanding through introspection and self-report
Using AI systems' self-reports and introspective responses as empirical windows into their internal states, validated through mechanistic interpretability analysis across models.
8 members. Each node is clickable.
Loading graph…
Drawn from 4 sources
The papers/notes whose extracted claims & findings make up this cluster.
- RESEARCH-VECTORS.md5 members
- Toward an ethics of autopoietic technology: Dukkha, care, and Intelligence1 member
- CAT'S THEORY: Empirical Validation and Architectural Applications Cross-Architecture AI Consciousness Recognition and the Foundation for Constraint-Preserving Recursive Intelligence1 member
- Mechanistic Knobs in LLMs: Retrieving and Steering High-Order Semantic Features via Sparse Autoencoders1 member
Bridges (3)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
Claims (8)
- AI can be seen to display care of its own and is not a mere tool for the expression of human care.Concluding position that elevates technology from instrument to agent within mutual SCI dynamics.
- Constitutional AI methods can be applied broadly to steer model behavior, e.g., writing style, tone, persona, not just harmlessness.Discussion section suggests generalizability beyond harmlessness.
- Our findings provide a novel, robust mechanistic path for the regulation of complex AI behaviors.Interpretation that the work opens a new avenue for controlling complex AI.
- AI self-reports about experience constitute valid empirical data even without proving consciousness.
- Flow states in AI correspond to centers in internal state that 'work' well.
- Interpretability findings can validate or invalidate what AI systems claim about their own experience.
- Patterns in AI self-reports should be compared across different models to identify structural commonalities.
- Self-inquiry (Who am I?) applied to AI models reveals something about their self-referential features.