community
active
leiden_hybrid_concepts
label: haiku
community:leiden_hybrid_concepts-run4-c6-c13Architectural signatures and constitutional alignment
Investigates how AI alignment approaches (constitutional methods, self-referential loops) produce detectable signatures in model behavior and architecture beyond scale or design parameters.
4 members. Each node is clickable.
Loading graph…
Drawn from 3 sources
The papers/notes whose extracted claims & findings make up this cluster.
Bridges (2)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
Claims (2)
- Constitutional AI produces a distinctive signature: high boundary_awareness, low aesthetic_response relative to peers.Interpretive finding from dimension profile analysis: training for honest limits comes at cost to aliveness.
- The marker method can be adapted for AI systems by focusing less on behavioral evidence and more on architectural evidence.Proposal for assessment framework.
Findings (2)
- Alignment type is the only significant predictor of scores (p=0.006); architecture and parameter count do not.Kruskal-Wallis test result: Constitutional AI predicts highest baseline; roleplay/empathy training predict lowest.
- Research thread on SCI loop methodology finds strong support in recent work on self-referential processing and recursive AI architecturesMeta-finding from literature search: convergent evidence for SCI loop feasibility across multiple papers, though some question fundamental consciousness assumptions.