framework
active
framework:eliciting-latent-knowledge-elkEliciting Latent Knowledge (ELK)
Christiano et al. (2021) framework motivating the problem of determining whether a model 'believes' a statement; cited as core motivation
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- FactualitycitesScoped definition of 'truth' used in the paper: the truth or falsehood of declarative factual statements
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Reasoning approach using learnable hidden embeddings.
- Asking model to explain its own behavior after the fact when no chain-of-thought was available
- Pearson-Vogel et al.'s finding that models can detect prior concept injections; introspective signals exist in middle layers suppressed by post-training
- Out-of-context reasoning work directly related to synthetic document fine-tuning experiments
- Prior active inference paper providing detailed neurophysiological implementation of belief updates
- Zaadnoordijk and Bayne's category of intentional action; sticker-removal behavior induced by the self-prior corresponds to this
- Claim that the basis for inferring animal sentience is intuitive, not empirical.
- Entities that become visible as centers in a configuration (e.g., rectangles of white space around a dot) that were not present before.