James Peterson

146 posts

James Peterson

James Peterson

@hellofromjames

AI at @FathomDotVideo

San Francisco Katılım Eylül 2022
1.5K Takip Edilen120 Takipçiler
James Peterson
James Peterson@hellofromjames·
@michpokrass The journaling helps. Yet it's so lossy. And every astonishment, every achievement, every ... them-becoming-them moment is so big. So valuable. I'd trade so much just to be able to slow down time even a little more.
English
0
0
3
320
Michelle Pokrass
Michelle Pokrass@michpokrass·
parenthood is hard. not for the usual reasons but because it feels impossible to adequately savor it. i know i'm in the days i will achingly miss one day but try as i might i still can't commit every moment to memory. that determined look on her face when she is carefully inspecting a new toy, will i remember it in ten years? the way she giggles during peekaboo, am i going to be able to call that sound to mind? her babyhood is slipping away as i desperately grasp for it, all of it at once, catching only the slivers that i record in journals and pictures and the collective memories of those who love her. what a gift to catch any of it and still i yearn for it all
English
29
41
677
33.8K
Tilde
Tilde@tilderesearch·
Distillation (especially on-policy) has become a pivotal component of the post-training stack. ☕ To dramatically accelerate distillation at scale, we open-source Nitrobrew, a communication-efficient, fused strategy for logit distillation. It’s built for both on- and off-policy distillation with: 100x faster loss computation 50% peak memory savings 3x faster on-policy distillation and more! A 🧵 (1/8)
Tilde tweet media
English
6
41
285
27.3K
James Peterson
James Peterson@hellofromjames·
@DillonUzar The most interesting to me would be to track progress from open models vs the frontier on using long context. Both for large (eg GLM5.1) and smaller (eg Qwen3.5-27B) models.
English
1
0
1
43
Dillon Uzar
Dillon Uzar@DillonUzar·
Below are *some* (not all) of the models actively running for tomorrow (not all will be ready by EOD tomorrow, so whatever finishes will be posted. Will have more models over the next couple of weeks. And YES - Opus/Sonnet will be tested to 1M. And YES - Grok 4.20 will be tested to 2M (might be out a little later this week tho due to time taken). No limit to the context length tests now ;) Massive thanks to @pingToven and @OpenRouter for sponsoring the credits needed for the Anthropic (especially Opus) results! Some of the credits left over went to Grok 4.20 too. I'll have more to share about this new benchmark run throughout this week. As always, I'm eager to hear what models everyone is most interested in to see run!
Dillon Uzar tweet media
English
2
0
11
414
Dillon Uzar
Dillon Uzar@DillonUzar·
Heads up - Will be releasing some new benchmark results this week starting tomorrow. This will also be on a new variant of MRCR.
English
3
2
36
5.3K
James Peterson
James Peterson@hellofromjames·
You might find the cliff to self-hosting these models is surprisingly small, and that the rewards are immense. If you DM I'm happy to trade notes about our journey into that experience. I'm not sure I quite understand the differentiating value of serverless inference for open models.
English
1
0
0
30
search founder
search founder@n0riskn0r3ward·
Yes, and open source in general. But I actually feel like I'm seeing the exact same trend from pure inference providers (deepinfra, fireworks, together etc.). I don't see any of the qwen 3.5 models listed by deepinfra. I only see Qwen3.5 397B A17B listed by fireworks... Like I'm still interested in "managed services"/dedicated model endpoints available via api in this category. OpenRouter's list of endpoints for the qwen 3.5 models - esp. ones that don't log my prompts - is much shorter than it was for previous generations.
English
1
0
0
125
search founder
search founder@n0riskn0r3ward·
A little weird seeing the 5.4-mini and 5.4-nano blog post focus so heavily on code gen and related workloads as someone who has a lot of use cases for capable small models in the <=$0.1 per M tokens input price category. OpenAI/Google are apparently uninterested in that market?
English
4
0
6
560
James Peterson
James Peterson@hellofromjames·
@itsandrewgao Oh if you think 0.1% for 3w holding is free money I have a CD to sell you
English
0
0
4
1.3K
andrew gao
andrew gao@itsandrewgao·
is this just free money or is there something i'm missing
andrew gao tweet media
English
12
2
60
28.6K
James Peterson
James Peterson@hellofromjames·
@kimmonismus More breadth too: quite a useful vision model and was trained with a native 256K context. It was only 2.5 years ago that GPT-4*V* was announced (blast from the past).
English
0
0
4
1.1K
Chubby♨️
Chubby♨️@kimmonismus·
Two years difference, same model size. Absolutely insane.
Chubby♨️ tweet media
English
19
54
778
47K
DanielW
DanielW@dddanielwang·
@teortaxesTex official? thought they haven't released models yet
English
2
0
1
392
Alexander Doria
Alexander Doria@Dorialexander·
@hellofromjames just vanilla trl. you need latest transformers. the one annoying thing is hybrid attention architecture (haven't managed to compile causal-conv1d on public cluster yet). it's running but falling to torch and much slower.
English
1
0
1
215
Alexander Doria
Alexander Doria@Dorialexander·
and first qwen-3.5 finetune launched.
English
4
0
69
8.9K
Rick Ross
Rick Ross@RickRossTN·
@baseten Actually, I just tested again, and you HAVE NOT added proper support for images yet. C'mon! It is a multimodal model, and your model page emphasizes this! Why don't you give it proper support?
English
3
0
5
451
Baseten
Baseten@baseten·
Introducing Kimi K2.5 on Baseten’s Model APIs with the most performant TTFT (0.26 sec) and TPS (340) on Artificial Analysis. Even among a landscape of incredible open source models, Kimi K2.5 stands out with its multi-modal capabilities and it's ability to accommodate an alarmingly large number of tool calls. Get the good stuff here: baseten.co/library/kimi-k…
Baseten tweet media
English
11
8
98
15.2K
James Peterson
James Peterson@hellofromjames·
Claude: "but the savings are only 11% ($1.2K), so it may not be worth doing". Also Claude, one LOC (line of claude) later: "done". Yeesh. Claude really values its time.
English
0
0
0
69