method
active
method:classifier-free-guidance-cfgClassifier-Free Guidance (CFG)
Tested as alternative to steering by magnifying difference between evaluation and deployment prompts; found less effective than steering.
Related by similarity (7)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Comparative result showing steering superiority over CFG as alternative intervention.
- Anthropic's inference-time guardrail filtering outputs violating constitutional rules; proposed for CCAI implementation
- Within-family factual generalization (F0-F2) is consistently strong across all models and prompt settings.finding0.674Establishes a reliable baseline for factual truth direction universality within simple factual recall.
- Source of fact-free learning concept; associated with insight and computational complexity reduction
- Validates use of lightweight classifiers as replacement for frontier LLM evaluation during alpha sweeps
- The supervised learning stage of CAI where a model critiques and revises its responses, then finetunes on revisions.
- Classifier using cosine similarity between activation vectors and steering vectors to detect deception with 89% accuracy