concept
active
concept:bpe-tokenization-effects

BPE Tokenization Effects

Byte-pair encoding tokenization causes Arabic, Hebrew, and other Unicode characters to split across multiple tokens, affecting feature activation patterns

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • Collections of features that interact via the token stream — one feature increases probability of tokens that activate the next feature — forming FSA-like systems

Related by similarity (6)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.