LLM biases mirror human biases in morally significant ways

Finding from Navigli et al. cited to justify applying human contemplative strategies to AI systems

Source paper

extracted_from

Contemplative Agent

(2025) · Ruben Laukkonen · Fionn Inglis · Shamil Chandaria · Lars Sandved-Smith +4

Neighborhood — ranked by edge-count

Claims (1)

claim

Contemplative wisdom traditions have grappled with the human version of the alignment problem for millennia, aiming to cultivate resilient alignment in the form of personal contentment and social harmony
supports
Foundational analogy motivating the entire Contemplative AI approach

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

The better an LLM is at language modeling, the more it aligns with vision models, and vice versa — linear relationship between language modeling score and vision-language alignmentfinding0.797
Core cross-modal empirical result: larger and better language models align better with vision models
LLMs linearly represent truth-relevant information beyond the plausibility of text, as evidenced by probes trained on likely performing poorly on anti-correlated datasetsclaim0.788
Establishes that the observed linear structure is not merely a representation of text probability
LLMs can predict their own responses more accurately than external observers, implying privileged internal knowledgefinding0.784
Binder et al. finding cited as evidence that LLMs possess introspective capacity analogous to mindfulness
LLM personality self-reports are illusory: post-training alignment creates stable human-like reports dissociated from actual behavior (Han et al. 2025)claim0.782
Skeptical prior work motivating the need to validate self-reports against internal states rather than taking them at face value
LLM self-reports about consciousness and moral significance should express degrees of confidence and provide context.claim0.782
Recommendation for companies on LM outputs.
Reasoning LLMs trigger reflection when their internal uncertainty is highhypothesis0.777
Core hypothesis linking internal uncertainty to self-reflection behavior, tested via probing experiments
We hypothesize that LLMs represent correctness of arithmetic expressions differently from factual statements.hypothesis0.776
Core working hypothesis motivating the factual vs. arithmetic task split in the experimental design.
As LLMs scale, they develop increasingly general abstractions, with large models linearly representing abstract concepts like truth that capture shared properties of diverse inputsclaim0.776
Interpretive claim connecting scale to abstraction level in LLM representations