method
active
method:llm-judge-binary-classifierLLM Judge Binary Classifier
An LLM-based classifier that returns 1 if response contains a clear subjective experience report and 0 otherwise
Neighborhood — ranked by edge-count
Methods (1)
method
- LLM Binary Experience Classifierrelated_toAutomated classifier returning binary 0/1 for presence of subjective experience report in model outputs
Artifacts (1)
artifact
- Key paper finding structured first-person descriptions in LLMs claiming awareness or subjective experience during self-referential processing.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Using Claude Sonnet 4 as a grader to categorize model responses according to predefined criteria.
- Alternative data attribution approach using an LLM as a judge; compared against the probe-based method.
- Baseline comparison for data attribution; outperformed by probe-based approach.
- Task paradigm from prior work asking 'Did you detect an injected thought?' via YES/NO logit comparison; shown here to be confounded
- Classifier using cosine similarity between activation vectors and steering vectors to detect deception with 89% accuracy
- The ability of LLMs to monitor and evaluate their own reasoning, closely related to reflection.
- The case study target in Section 4: localizing gender information in hidden representations of Pythia-6.9B
- Recent work identifying cases where LLM features are not one-dimensionally linear, a caveat to the linearity hypothesis.