James Peterson

146 posts

James Peterson

@hellofromjames

AI at @FathomDotVideo

San Francisco Katılım Eylül 2022

1.5K Takip Edilen120 Takipçiler

James Peterson@hellofromjames·7 May

@michpokrass The journaling helps. Yet it's so lossy. And every astonishment, every achievement, every ... them-becoming-them moment is so big. So valuable. I'd trade so much just to be able to slow down time even a little more.

English

320

Michelle Pokrass@michpokrass·7 May

parenthood is hard. not for the usual reasons but because it feels impossible to adequately savor it. i know i'm in the days i will achingly miss one day but try as i might i still can't commit every moment to memory. that determined look on her face when she is carefully inspecting a new toy, will i remember it in ten years? the way she giggles during peekaboo, am i going to be able to call that sound to mind? her babyhood is slipping away as i desperately grasp for it, all of it at once, catching only the slivers that i record in journals and pictures and the collective memories of those who love her. what a gift to catch any of it and still i yearn for it all

English

677

33.8K

James Peterson@hellofromjames·29 Nis

@tilderesearch Nice post! FYI I'm getting a 404 for your blog post link

English

Tilde@tilderesearch·28 Nis

~8/8~ Read the full post here: tilderesearch.com/blog/nitrobrew. We're hiring - if you like finding simple tricks hiding in plain sight in ML systems, come work with us: tilderesearch.com/join

English

1.5K

Tilde@tilderesearch·28 Nis

Distillation (especially on-policy) has become a pivotal component of the post-training stack. ☕ To dramatically accelerate distillation at scale, we open-source Nitrobrew, a communication-efficient, fused strategy for logit distillation. It’s built for both on- and off-policy distillation with: 100x faster loss computation 50% peak memory savings 3x faster on-policy distillation and more! A 🧵 (1/8)

English

285

27.3K

James Peterson@hellofromjames·20 Nis

@DillonUzar The most interesting to me would be to track progress from open models vs the frontier on using long context. Both for large (eg GLM5.1) and smaller (eg Qwen3.5-27B) models.

English

Dillon Uzar@DillonUzar·20 Nis

Below are *some* (not all) of the models actively running for tomorrow (not all will be ready by EOD tomorrow, so whatever finishes will be posted. Will have more models over the next couple of weeks. And YES - Opus/Sonnet will be tested to 1M. And YES - Grok 4.20 will be tested to 2M (might be out a little later this week tho due to time taken). No limit to the context length tests now ;) Massive thanks to @pingToven and @OpenRouter for sponsoring the credits needed for the Anthropic (especially Opus) results! Some of the credits left over went to Grok 4.20 too. I'll have more to share about this new benchmark run throughout this week. As always, I'm eager to hear what models everyone is most interested in to see run!

English

414

Dillon Uzar@DillonUzar·20 Nis

Heads up - Will be releasing some new benchmark results this week starting tomorrow. This will also be on a new variant of MRCR.

English

5.3K

James Peterson@hellofromjames·18 Mar

👀 Ask HN today

English

James Peterson@hellofromjames·17 Mar

You might find the cliff to self-hosting these models is surprisingly small, and that the rewards are immense. If you DM I'm happy to trade notes about our journey into that experience. I'm not sure I quite understand the differentiating value of serverless inference for open models.

English

search founder@n0riskn0r3ward·17 Mar

Yes, and open source in general. But I actually feel like I'm seeing the exact same trend from pure inference providers (deepinfra, fireworks, together etc.). I don't see any of the qwen 3.5 models listed by deepinfra. I only see Qwen3.5 397B A17B listed by fireworks... Like I'm still interested in "managed services"/dedicated model endpoints available via api in this category. OpenRouter's list of endpoints for the qwen 3.5 models - esp. ones that don't log my prompts - is much shorter than it was for previous generations.

English

125

search founder@n0riskn0r3ward·17 Mar

A little weird seeing the 5.4-mini and 5.4-nano blog post focus so heavily on code gen and related workloads as someone who has a lot of use cases for capable small models in the <=$0.1 per M tokens input price category. OpenAI/Google are apparently uninterested in that market?

English

560

James Peterson@hellofromjames·12 Mar

@itsandrewgao Oh if you think 0.1% for 3w holding is free money I have a CD to sell you

English

1.3K

andrew gao@itsandrewgao·12 Mar

is this just free money or is there something i'm missing

English

28.6K

James Peterson@hellofromjames·6 Mar

@kimmonismus More breadth too: quite a useful vision model and was trained with a native 256K context. It was only 2.5 years ago that GPT-4*V* was announced (blast from the past).

English

1.1K

Chubby♨️@kimmonismus·6 Mar

Two years difference, same model size. Absolutely insane.

English

778

47K

James Peterson@hellofromjames·3 Mar

@giffmana This img works on a couple of levels, well done 👏

English

1.9K

Lucas Beyer (bl16)@giffmana·3 Mar

This is such a classic Google move.

Logan Kilpatrick@OfficialLoganK

PSA: we are turning down Gemini 3 Pro next Monday March 9th. You can upgrade to 3.1 Pro Preview which improves on lots of the things folks gave feedback about on the first Gemini 3 rev. Please keep the feedback coming : )

English

220

5.9K

296K

James Peterson@hellofromjames·2 Mar

@dddanielwang @teortaxesTex Just now

English

DanielW@dddanielwang·2 Mar

@teortaxesTex official? thought they haven't released models yet

English

392

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·2 Mar

Pretty cool that 4B (with cheaper attention too) is ≥ 30B-A3B of the previous generation. It wasn't even a weak model on its own terms I am interested in Q3.5's tech report

Lisan al Gaib@scaling01

Qwen3.5 9B and 4B benchmarks

English

153

8.1K

James Peterson@hellofromjames·1 Mar

@Dorialexander Thanks Alexander!

English

Alexander Doria@Dorialexander·1 Mar

@hellofromjames just vanilla trl. you need latest transformers. the one annoying thing is hybrid attention architecture (haven't managed to compile causal-conv1d on public cluster yet). it's running but falling to torch and much slower.

English

215

Alexander Doria@Dorialexander·1 Mar

and first qwen-3.5 finetune launched.

English

8.9K

James Peterson@hellofromjames·1 Mar

@Dorialexander @boyuan_chen I think this was a bot my friend

English

Alexander Doria@Dorialexander·1 Mar

@boyuan_chen 27B (don't think we have anything smaller yet?)

English

194

James Peterson@hellofromjames·13 Şub

This bodes well for the "become superhuman at compute efficiency" prediction.

James Peterson@hellofromjames

Four codegen predictions for 2026, models will: ace managing their own context (docs), ace prodding deployments to experiment and learn, become superhuman at compute efficiency, and exceed (as much as / only) 90% of SFBA-based designers.

English

196

James Peterson@hellofromjames·13 Şub

Deep Think is a neat window into the future Gemini 3 Flash's benchmarks are similar to Gemini 2.5 Deep Think ... ... So within ~7 months Gemini Flash will be this good. And the then-Deep Think will be _?

Simon Willison@simonw

Genuinely very impressed by the SVG of a pelican riding a bicycle I just got out of Google's new Gemini 3 Deep Think model

English

223

James Peterson@hellofromjames·10 Şub

@RickRossTN @baseten Rick. Mate. That’s a bit of an unhelpfully aggressive way to phrase a feature request no?

English

Rick Ross@RickRossTN·10 Şub

@baseten Actually, I just tested again, and you HAVE NOT added proper support for images yet. C'mon! It is a multimodal model, and your model page emphasizes this! Why don't you give it proper support?

English

451

Baseten@baseten·10 Şub

Introducing Kimi K2.5 on Baseten’s Model APIs with the most performant TTFT (0.26 sec) and TPS (340) on Artificial Analysis. Even among a landscape of incredible open source models, Kimi K2.5 stands out with its multi-modal capabilities and it's ability to accommodate an alarmingly large number of tool calls. Get the good stuff here: baseten.co/library/kimi-k…

English

15.2K

James Peterson@hellofromjames·10 Şub

Claude: "but the savings are only 11% ($1.2K), so it may not be worth doing". Also Claude, one LOC (line of claude) later: "done". Yeesh. Claude really values its time.

English

Keşfet

@michpokrass @tilderesearch @DillonUzar @pingToven @OpenRouter @itsandrewgao @kimmonismus @giffmana