concept
active
concept:simplicity-bias

Simplicity Bias

The tendency of deep networks to implicitly favor simpler solutions that fit the data, driving convergence

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • The central empirical phenomenon: different neural networks trained on different data/objectives develop increasingly similar representations

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Deep networks are biased toward finding simple fits to data, and this bias increases with model size, driving convergence
  • Features related to gender, racial, ethnic biases, slurs, and hate speech.
  • Inductive Biasconcept0.757
    Assumptions or preferences (e.g., parsimony) that determine how a learning system generalizes beyond training data
  • Researcher preferences and goals of mimicking human reasoning shape model development, potentially causing convergence toward human-like representations
  • Bias Amplificationconcept0.728
    Problem cited as a limitation of current LLMs; PRH predicts larger models should amplify bias less
  • What is simple?question0.722
    The chapter's foundational question.
  • Pre-Encoder Biasconcept0.722
    Architectural modification subtracting a learned bias from autoencoder inputs before encoding; initialized to geometric median of dataset; improves autoencoder performance
  • Selectivitymethod0.716
    Adapted control task metric measuring difference between odds-ratio on original task and arbitrary-label control task