claim
active
claim:each-functional-token-is-associated-with-an-internalized-visual-operation-yet-requires-no-visual-supervision-and-remains-a-standard-token-in-the-tokenizer-vocabularyEach functional token is associated with an internalized visual operation, yet requires no visual supervision and remains a standard token in the tokenizer vocabulary.
Describes the properties of the functional token.
Source paper
extracted_fromZiyu Guo · Rain Liu · Xinyan Chen · Pheng-Ann Heng
Neighborhood — ranked by edge-count
Papers (1)
paper
Communities (4)
community
- Spans attention head decomposition, benchmark awareness, and genomic pathogenicity prediction via neural models.
- Identifies distributed algorithms implemented across attention heads, with focus on causal masking limitations and emergent capabilities via activation manifold steering.
- Unsupervised learning of interpretable task tokens through gradient flow and vocabulary constraints, enabling reasoning without visual supervision.
- Functional tokens as visual operatorsmembers_ofTokens encode visual operations learned from reasoning context without explicit visual supervision.
Concepts (3)
concept
- The visual operation embedded inside a functional token, requiring no visual supervision.
- tokenizer vocabularycitesThe standard set of tokens that the functional token remains a part of.
- visual supervisioncitesSupervisory signals for visual outputs; functional tokens do not require it.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- ATLAS hypothesis that a compact set of high-level functional tokens (Manip, Shape, Line, Arrow, Text) suffices for multi-domain visual reasoning.
- Token-level supervision enables models to learn functional-token invocation from reasoning contextclaim0.805ATLAS author's assertion that functional tokens optimized via standard cross-entropy loss learn when and how to invoke operations from surrounding text.
- A pair of query and key subcomponents distributed across attention heads performs previous-token behaviorfinding0.769VPD recovers an attention algorithm for attending to the previous token, distributed across multiple heads.
- Keeping functional-token vocabulary compact minimizes perturbation to base model token distributionclaim0.756ATLAS design philosophy: five functional tokens suffice to abstract common visual operations without excessive disruption.
- Interesting special case of copying behavior related to tokenization artifacts; primitive precursor to induction heads
- Quote framing KV caching as introspection mechanism.
- A discrete token in the vocabulary that represents a visual operation (e.g., <|Line|>, <|Shape|>, <|Text|>), generated via next-token prediction within autoregressive sequences.