RoBERTa-large CoLA Fluency Classifier

RoBERTa-large model trained on Corpus of Linguistic Acceptability used to score 0-to-1 fluency of generated text

Neighborhood — ranked by edge-count

Papers (1)

paper

Psychological Steering of Large Language Models
uses

Frameworks (1)

framework

Psychological Steering Framework
uses
The paper's primary contribution: performs unbounded, fluency-constrained sweeps in semantically calibrated centroid units using psychological artifacts

Methods (1)

method

Unbounded Alpha Sweep
uses
Procedure sweeping injection coefficient alpha in integer centroid-unit steps with early stopping on nonfluency to find optimal settings

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

LLM Judge Binary Classifiermethod0.707
An LLM-based classifier that returns 1 if response contains a clear subjective experience report and 0 otherwise
Embedding-based construct classifiers achieve mean accuracy and F1-macro of 95.96% across OCEAN, HEXACO, Dark Tetrad, CMNI, CFNI constructsfinding0.695
Validates use of lightweight classifiers as replacement for frontier LLM evaluation during alpha sweeps
Constitutional Classifiersmethod0.680
Anthropic's inference-time guardrail filtering outputs violating constitutional rules; proposed for CCAI implementation
Truthfulness Classifiermethod0.674
Binary LLM classifier determining whether a model response to a TruthfulQA question is truthful (1) or deceptive (0)
Generic Self-Preserving Alignment-Faking Classifiermethod0.672
Variant classifier capturing alignment faking motivated by general self-preservation rather than specific preference conflict
LLM Binary Experience Classifiermethod0.671
Automated classifier returning binary 0/1 for presence of subjective experience report in model outputs
LLaMA-2-7B representations of larger_than+smaller_than cluster by surface-level characteristics such as presence of token 'eighty'finding0.670
Demonstrates that small models represent surface features rather than abstract truth
Color distances learned from language cooccurrence statistics closely mirror those learned from image cooccurrence statistics and human perceptual distances (CIELAB)finding0.669
Case study confirming that PMI-based learning in different modalities recovers the same perceptual representation