concept
active
concept:jailbreakJailbreak
Methods to bypass model safety training; features may activate during jailbreaks.
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Jailbreakingrelated_toUsers coaxing dialogue agents into issuing threats or toxic content by overriding intended persona constraints
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Security attack that bypasses LLM safety alignment by suppressing deliberation or exploiting reflection inhibition.
- Open question for future safety interpretability work.
- GUI window management construct supporting MDI-style display of applications, used as a top-level backplane facility.
- Biological units that are considered unconventional media for problem-solving in diverse intelligence.
- Proposed unified system combining word processor (PlayWrite), graphics (PlayDraw), and spreadsheet (PlayCalc) as integrated, nestable tools.
- The property that living centers are formed and strengthened by boundaries which both separate and unite; the boundary must be of the same order of magnitude as the center being bounded and is itself made of centers
- The phenomenon where SAEs break a smooth geometric manifold into many small, seemingly unrelated pieces, losing overarching structure.
- Synchronization construct encapsulating shared data and protected access routines.