concept
active
concept:linear-representation-of-features

Linear Representation of Features

The central object of study — the idea that a concept like truth is encoded as a direction in the LLM's latent space

Neighborhood — ranked by edge-count

Concepts (2)

concept

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior
  • The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
  • linearityconcept0.780
    The sequential, continuous order of text, often challenged by diagrammatic branching.
  • Linear Map (a ⊸ b)framework0.771
    Semantic domain for linear transformations; denotation as actual linear function; Category instance generated from homomorphism principle.
  • Linear Decodingmethod0.771
    Correlative technique measuring the type of information encoded in distributed representations via linear predictability.
  • The idea that programs can be expressed as logical sentences, enabling direct deductive verification.
  • Research thread within About Blank concerning the structure and relational properties of neural network feature representations; covariance pooling tangentially supports this thread.
  • Method of optimizing input to cause a neuron to fire maximally, used to characterize what a neuron detects; establishes causal link