Feature Looping

Repetitive behavioral pattern observed under high steering strengths in SAE feature self-steering experiments

Neighborhood — ranked by edge-count

method

Agentic Self-Steering Emotionality Evaluation
associated_with
Kimi K2.5 uses a tool to steer SAE features on itself in real-time and rates the emotional effect on its own internal state 0-100

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Feature Sparsityconcept0.774
Property that features activate on only a small fraction of inputs; enables compressed sensing and is what allows superposition to work
Feature Visualizationmethod0.749
Method of optimizing input to cause a neuron to fire maximally, used to characterize what a neuron detects; establishes causal link
Feedback Loopsconcept0.738
Key dimension analyzing how wide the gulfs of execution and evaluation are in a system and how they relate; uses concepts from The Design of Everyday Things.
feature as applicationconcept0.735
Metaphor treating each system feature or function as a separate application that can be independently loaded and managed.
Feature Universalityconcept0.730
Property of features that form consistently across different models trained on the same or similar data, suggesting features are real representational units
Feature splittingconcept0.730
Phenomenon where a feature in a small SAE splits into multiple finer features in a larger SAE.
Features are connected by weights forming circuits, and these circuits can be rigorously studied and understood as meaningful algorithms.claim0.723
Second of three speculative claims asserting that subgraphs of neural networks are tractable and meaningful objects of study
Homeostatic Loopsconcept0.721
Feedback mechanisms at cellular and tissue levels driving error correction toward target states.