NLTK Stemming and Lemmatization

Used to normalize candidate instruction tokens in the instruction discovery experiment.

Neighborhood — ranked by edge-count

Papers (1)

paper

Unveiling the Latent Directions of Reflection in Large Language Models
mentions

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Backtracking Latentsconcept0.709
SAE latents that rise as correction approaches and peak after self-correction begins, complementing OTDs
Connecting the Dots: LLMs Can Infer and Verbalize Latent Structure from Disparate Training Data (Treutlein et al. 2024)concept0.686
Out-of-context reasoning work directly related to synthetic document fine-tuning experiments
Natural Language Autoencoders (NLA)framework0.685
An unsupervised method for generating natural language explanations of LLM activations through a verbalizer-reconstructor pair trained jointly with RL.
Natural Language Autoencoders (NLAs)method0.684
Core unsupervised method for generating natural language explanations of LLM activations through a verbalizer-reconstructor pair trained with RL.
LLMs trained only on language data have rich enough knowledge of visual structures that decent visual representations can be trained on images generated solely by querying the LLMfinding0.679
Sharma et al. result supporting cross-modal alignment: language-only models implicitly encode visual structure
Little evidence of steganography in NLAs; meaning-preserving transformations cause only small drops in FVEfinding0.679
Quantitative evaluation showing NLAs do not heavily rely on covert encoding beyond overt language.
Understanding how LMs learn linguistic behaviours may offer insights into fundamental properties of languagehypothesis0.675
Forward-looking hypothesis linking LM mechanism analysis to linguistic theory
LLM-Judge Data Attributionmethod0.674
Alternative data attribution approach using an LLM as a judge; compared against the probe-based method.