Nous Research

2.7K posts

Nous Research banner
Nous Research

Nous Research

@NousResearch

World-class open source AI https://t.co/vrD0aDJeto

USA Katılım Ekim 2020
25 Takip Edilen187.8K Takipçiler
Sabitlenmiş Tweet
Nous Research
Nous Research@NousResearch·
Hermes Agent v0.14.0 - “The Foundation Release” Changelog below
English
236
458
4.6K
640.9K
Nous Research
Nous Research@NousResearch·
Join the team on Wednesday for another Hermes Agent Jam!
GIF
English
34
35
514
17.8K
Nous Research retweetledi
Hermes Agent Tips
Hermes Agent Tips@HermesAgentTips·
Sooo many people dont know this but Hermes creator @NousResearch has its own sub program... the FREE account is HUGE because it gives you access to all of their FREE models there are paid subs and if you are interested even the $20 sub is SUPER worth because it gives you access to over 300+ models if you do hermes model you can click the Nous sub option to login if you have created an account
Hermes Agent Tips tweet media
English
30
20
462
41.7K
garnix
garnix@Samsara_of_eth·
The Nous Reasearch team is a bunch of absolute legends. I had some issues, posted in the Discord in less than 5 hours (!) they shipped an update. How do they do it? (obviously AI assisted but still crazy)
garnix tweet media
English
9
3
110
16.4K
Nous Research retweetledi
David Ondrej
David Ondrej@DavidOndrej1·
> learn how to use tmux > trust me
David Ondrej tweet media
English
269
123
3K
352.6K
NetworkChuck
NetworkChuck@NetworkChuck·
Doing a massive Hermes deployment today. Full team, isolated VMs, @honchodotdev shared memory for the team. This stuff is so fun. @NousResearch
English
80
28
1.1K
61.5K
Nous Research retweetledi
isaac
isaac@isaaccyn·
optimise for fun when it comes to building for and with agents setting up a hermes agent is fun because it feels like you’re hatching and raising an incredibly smart digital pet that can do and learn almost anything you want it to one magic moment after another gets you hooked
English
7
4
67
12.9K
Kelano
Kelano@kelanoo·
when you send your setup to your friend, and he asks his Hermes if it's good
Kelano tweet media
English
14
0
81
4.7K
Nous Research
Nous Research@NousResearch·
What we find most useful about this study is the decomposition. For byte-level pretraining, the ordering is throughput first and boundary signal second; the boundary benefit can be recovered without a static tokenizer by treating boundaries as a prior or a target (c.f. Bolmo). For subword pretraining, the vocabulary-capacity argument seems weaker than common framings suggest at 1.7B, and the compression-driven throughput multiplier is the load-bearing source of gain (c.f. our work on Token Superposition Training). Paper: arxiv.org/abs/2604.27263 HF: huggingface.co/papers/2604.27…
English
7
3
48
9.6K
Nous Research
Nous Research@NousResearch·
Subword boundaries are the second meaningful effect. Adding end-of-subword markers as input embeddings produces a large gain throughout training (H3): end-boundaries leak future bytes (whitespace always follows an end-boundary, for example) and simplify the next-byte prediction task. Start-of-subword boundaries cannot leak the future, and they also help. When start-boundaries are provided only during the first 50k training steps and removed thereafter for both training and validation, the improvement persists; end-boundaries do not survive the same intervention. One reading is that start-boundaries supply a morphological inductive bias (H4), while end-boundaries supply a near-term prior the model becomes dependent on.
Nous Research tweet media
English
4
2
43
11.3K
Nous Research
Nous Research@NousResearch·
Today we release a study on decoupling the benefits of subword tokenization for language model training, by simulating each suspected benefit one at a time inside a 1.7B byte-level pretraining pipeline. We formulate seven hypotheses for why subword LLMs outperform byte-level LLMs (covering computational efficiency, structural priors over subword boundaries and positions, and the optimization objective) and implement each as a controlled intervention against a byte-level baseline. Three of the seven move the validation loss at this scale; the rest either have negligible effect or hurt. Validated at 1.7B parameters on fineweb-edu with a LLaMA-3 architecture, with 68M-parameter replications in the appendix. The work was led by Théo Gigant, Bowen Peng, and Jeffrey Quesnelle. Paper: arxiv.org/abs/2604.27263
Nous Research tweet media
English
41
117
982
62.7K