Pre-Encoder Bias

Architectural modification subtracting a learned bias from autoencoder inputs before encoding; initialized to geometric median of dataset; improves autoencoder performance

Neighborhood — ranked by edge-count

Frameworks (1)

framework

Sparse Autoencoder for Dictionary Learning
associated_with
Primary method introduced: trains a one-hidden-layer MLP with L1 sparsity penalty to decompose model activations into overcomplete feature dictionaries

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Bias in language modelsconcept0.751
Features related to gender, racial, ethnic biases, slurs, and hate speech.
Inductive Biasconcept0.748
Assumptions or preferences (e.g., parsimony) that determine how a learning system generalizes beyond training data
Bias Amplificationconcept0.747
Problem cited as a limitation of current LLMs; PRH predicts larger models should amplify bias less
Autoencoderconcept0.742
Neural network architecture that learns compressed representations; SOHMs are functionally equivalent.
Deep Auto Encoderframework0.739
Pretraining exposure densityconcept0.735
Expected prevalence of patterns (e.g., base-10 arithmetic) in pretraining corpora, influencing ρd and dr.
Embodied Predictive Interoception Codingframework0.725
Barrett and Simmons's neuroanatomical model of interoceptive prediction error and affect generation
Simplicity Biasconcept0.722
The tendency of deep networks to implicitly favor simpler solutions that fit the data, driving convergence