method
active
method:roberta-large-cola-fluency-classifierRoBERTa-large CoLA Fluency Classifier
RoBERTa-large model trained on Corpus of Linguistic Acceptability used to score 0-to-1 fluency of generated text
Neighborhood — ranked by edge-count
Papers (1)
paper
Frameworks (1)
framework
- The paper's primary contribution: performs unbounded, fluency-constrained sweeps in semantically calibrated centroid units using psychological artifacts
Methods (1)
method
- Procedure sweeping injection coefficient alpha in integer centroid-unit steps with early stopping on nonfluency to find optimal settings
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- An LLM-based classifier that returns 1 if response contains a clear subjective experience report and 0 otherwise
- Validates use of lightweight classifiers as replacement for frontier LLM evaluation during alpha sweeps
- Anthropic's inference-time guardrail filtering outputs violating constitutional rules; proposed for CCAI implementation
- Binary LLM classifier determining whether a model response to a TruthfulQA question is truthful (1) or deceptive (0)
- Variant classifier capturing alignment faking motivated by general self-preservation rather than specific preference conflict
- Automated classifier returning binary 0/1 for presence of subjective experience report in model outputs
- Demonstrates that small models represent surface features rather than abstract truth
- Case study confirming that PMI-based learning in different modalities recovers the same perceptual representation