concept
active
concept:non-robust-heuristicsNon-Robust Heuristics
RL-installed behaviors that reduce non-compliance on training prompt but do not generalize across prompt variations
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Preference Lockingassociated_withAlignment faking potentially making model preferences resistant to further training modification
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Nielsen and Molich's method for finding UI flaws by applying usability heuristics.
- Ability to maintain function despite perturbations.
- The functional solidity and working character of natural systems, arising from the fifteen properties.
- Iterative procedure searching token counts in [50,100,...,1000] to find concatenation of (C)ARR satisfying IIT's Markov and conditional independence assumptions.
- Rejection of traditional provenance/anatomy criteria.
- author assertion that deterministic heuristics surpass many LLMs