Classifier-Free Guidance (CFG)

Tested as alternative to steering by magnifying difference between evaluation and deployment prompts; found less effective than steering.

Related by similarity (7)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Classifier-free guidance does not decrease type hint rate to deployment levels; activation steering is more effectivefinding0.779
Comparative result showing steering superiority over CFG as alternative intervention.
Constitutional Classifiersmethod0.705
Anthropic's inference-time guardrail filtering outputs violating constitutional rules; proposed for CCAI implementation
Within-family factual generalization (F0-F2) is consistently strong across all models and prompt settings.finding0.674
Establishes a reliable baseline for factual truth direction universality within simple factual recall.
Aragones, Gilboa, Postlewaite & Schmeidler (2005) — Fact-free learningconcept0.672
Source of fact-free learning concept; associated with insight and computational complexity reduction
Embedding-based construct classifiers achieve mean accuracy and F1-macro of 95.96% across OCEAN, HEXACO, Dark Tetrad, CMNI, CFNI constructsfinding0.664
Validates use of lightweight classifiers as replacement for frontier LLM evaluation during alpha sweeps
Supervised Learning Constitutional AIframework0.663
The supervised learning stage of CAI where a model critiques and revises its responses, then finetunes on revisions.
Cosine Similarity Binary Classifiermethod0.652
Classifier using cosine similarity between activation vectors and steering vectors to detect deception with 89% accuracy