Sabitlenmiş Tweet
Nous Research
2.7K posts

Nous Research
@NousResearch
World-class open source AI https://t.co/vrD0aDJeto
USA Katılım Ekim 2020
25 Takip Edilen187.8K Takipçiler
Nous Research retweetledi

Sooo many people dont know this but Hermes creator @NousResearch has its own sub program...
the FREE account is HUGE because it gives you access to all of their FREE models
there are paid subs and if you are interested even the $20 sub is SUPER worth because it gives you access to over 300+ models
if you do hermes model
you can click the Nous sub option to login if you have created an account

English
Nous Research retweetledi

DeepSeek V4 Flash IS BACK on Nous Portal for FREE for use in Hermes Agent!
Check it out at portal.nousresearch.com/manage-subscri…
English
Nous Research retweetledi

Doing a massive Hermes deployment today. Full team, isolated VMs, @honchodotdev shared memory for the team.
This stuff is so fun.
@NousResearch
English


Hermes on your watch? nicee
Uzi@uzairansar
Hermesmaxxing Thinking of maybe adding live notifications so you can see the responses stream in on the lock screen.
English
Nous Research retweetledi

@gigant_theo "A bunch of nerds making progress" has a nice ring to it...
English
Nous Research retweetledi

Thank you @NousResearch team! Here's to Security for All, agents included. 😎🤖
Nous Research@NousResearch
Hermes Agent now supports the @Bitwarden Secrets Manager
English


What we find most useful about this study is the decomposition. For byte-level pretraining, the ordering is throughput first and boundary signal second; the boundary benefit can be recovered without a static tokenizer by treating boundaries as a prior or a target (c.f. Bolmo). For subword pretraining, the vocabulary-capacity argument seems weaker than common framings suggest at 1.7B, and the compression-driven throughput multiplier is the load-bearing source of gain (c.f. our work on Token Superposition Training).
Paper: arxiv.org/abs/2604.27263
HF: huggingface.co/papers/2604.27…
English

Subword boundaries are the second meaningful effect. Adding end-of-subword markers as input embeddings produces a large gain throughout training (H3): end-boundaries leak future bytes (whitespace always follows an end-boundary, for example) and simplify the next-byte prediction task.
Start-of-subword boundaries cannot leak the future, and they also help. When start-boundaries are provided only during the first 50k training steps and removed thereafter for both training and validation, the improvement persists; end-boundaries do not survive the same intervention. One reading is that start-boundaries supply a morphological inductive bias (H4), while end-boundaries supply a near-term prior the model becomes dependent on.

English

Today we release a study on decoupling the benefits of subword tokenization for language model training, by simulating each suspected benefit one at a time inside a 1.7B byte-level pretraining pipeline.
We formulate seven hypotheses for why subword LLMs outperform byte-level LLMs (covering computational efficiency, structural priors over subword boundaries and positions, and the optimization objective) and implement each as a controlled intervention against a byte-level baseline. Three of the seven move the validation loss at this scale; the rest either have negligible effect or hurt.
Validated at 1.7B parameters on fineweb-edu with a LLaMA-3 architecture, with 68M-parameter replications in the appendix.
The work was led by Théo Gigant, Bowen Peng, and Jeffrey Quesnelle.
Paper: arxiv.org/abs/2604.27263

English









