artifact
active
artifact:safety-research-assistant-axis-github-repository

safety-research/assistant-axis GitHub repository

Code and full transcripts of case studies released alongside the paper

Neighborhood — ranked by edge-count

Frameworks (1)

framework
  • Contrast vector between mean default Assistant activation and mean of all fully role-playing role vectors; main contribution of the paper