LLM Judge Binary Classifier

An LLM-based classifier that returns 1 if response contains a clear subjective experience report and 0 otherwise

Neighborhood — ranked by edge-count

method

LLM Binary Experience Classifier
related_to
Automated classifier returning binary 0/1 for presence of subjective experience report in model outputs

artifact

Large Language Models Report Subjective Experience Under Self-Referential Processing
introduces
Key paper finding structured first-person descriptions in LLMs claiming awareness or subjective experience during self-referential processing.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

LLM judge evaluationmethod0.826
Using Claude Sonnet 4 as a grader to categorize model responses according to predefined criteria.
LLM-Judge Data Attributionmethod0.809
Alternative data attribution approach using an LLM as a judge; compared against the probe-based method.
LLM-judge methodsmethod0.804
Baseline comparison for data attribution; outperformed by probe-based approach.
Binary Detection Taskmethod0.748
Task paradigm from prior work asking 'Did you detect an injected thought?' via YES/NO logit comparison; shown here to be confounded
Cosine Similarity Binary Classifiermethod0.746
Classifier using cosine similarity between activation vectors and steering vectors to detect deception with 89% accuracy
LLM Meta-Cognitionconcept0.740
The ability of LLMs to monitor and evaluate their own reasoning, closely related to reflection.
Gender Representation in LLMsconcept0.739
The case study target in Section 4: localizing gender information in hidden representations of Pythia-6.9B
Non-Linear Representations in LLMsconcept0.737
Recent work identifying cases where LLM features are not one-dimensionally linear, a caveat to the linearity hypothesis.