method
active
method:damage-resilience-testingDamage Resilience Testing
Evaluation method where cells are permanently or temporarily disabled to test fault tolerance of learned circuits
Neighborhood — ranked by edge-count
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Meta-problem: alignment techniques may collapse under rapid self-improvement or extreme complexity
- Cognitive behavior of evaluating risk, exhibited by plants according to S&C.
- Eliezer Yudkowsky's benchmark for LLM awareness, mentioned as test that collapsed-awareness models might fail.
- A procedure: stand in the place, ask whether each candidate element generates greater tranquility in you; keep if yes, reject if no.
- Testing five phrasings of the self-referential prompt to confirm robustness to wording variation
- Tests like Turing test, Artificial Consciousness Test; argued to be unreliable for AI due to mimicry.
- Technique of reading out model beliefs from internal activations before the final answer token is generated