Jan Tempus

3 posts

Jan Tempus

Jan Tempus

@Jan55028368

Doing some research on tokenisation.

Katılım Eylül 2021
7 Takip Edilen66 Takipçiler
Jan Tempus
Jan Tempus@Jan55028368·
Interestingly, Craig W. Schmidt has a second paper using LPs for tokenisation hitting arXiv today as well! Check it out: arxiv.org/abs/2605.22705
English
0
4
27
3.1K
Jan Tempus
Jan Tempus@Jan55028368·
In our new paper, we reinterpret tokenisation as a problem in high-dimensional geometry (100M dims to be precise!), which we can solve efficiently to get a globally near-optimal tokeniser! Our method consistently improves language models over BPE. See 🧵for details.
Jan Tempus tweet media
English
9
37
280
20.9K