concept
active
concept:bpe-tokenization-effectsBPE Tokenization Effects
Byte-pair encoding tokenization causes Arabic, Hebrew, and other Unicode characters to split across multiple tokens, affecting feature activation patterns
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Finite State Automata Feature Assembliesassociated_withCollections of features that interact via the token stream — one feature increases probability of tokens that activate the next feature — forming FSA-like systems
Related by similarity (6)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The method that predicts and explains variant pathogenicity using Evo 2, producing disruption profiles.
- Phenomenon where a moment bleeds into adjacent moments temporally; includes deja vu as reversed direction.
- The standard set of tokens that the functional token remains a part of.
- Demonstrates long-tail persistence of causal steering effect in a subset of emotion features
- Robustness check on token choice for binary classification
- Metric for intervention effectiveness: 0 = ineffective, 1 = full flip of model output from false to true or vice versa