Chris Griffin

123 posts

Chris Griffin banner
Chris Griffin

Chris Griffin

@csgriff_

Helping LLM’s to predict the future. Founder. Company not linked here so banter is possible

London, England Katılım Şubat 2024
382 Takip Edilen36 Takipçiler
Chris Griffin
Chris Griffin@csgriff_·
@natolambert “Within 12 months AI may make up 30% of a F500’s workforce, but within 30 years Nathan Lambert will never be on this panel
English
0
0
0
16
Nathan Lambert
Nathan Lambert@natolambert·
Any good quotes on the Nvidia GTC open models panel? Maybe they'll invite me to one some day 🥺
English
8
0
63
10.1K
Chris 🇨🇦
Chris 🇨🇦@llm_wizard·
And they call me…. Joe Nemotron.
Chris 🇨🇦 tweet media
English
9
0
68
3.2K
Kyle Hessling
Kyle Hessling@KyleHessling1·
@csgriff_ I think I found the answer. Just turn thinking off! I thought it would be a big performance hit, but it doesn't seem that big, gap is even smaller for the Coding index. Will test and report. The one on the right is non-thinking, still beats the brand-new Nemotron!
Kyle Hessling tweet media
English
1
0
1
25
Kyle Hessling
Kyle Hessling@KyleHessling1·
So I have been hitting Hermes agent and Local Qwen 27B hard on the 5090, doing some Apple Swift development, I am trying to get a novel camera app idea built entirely on local compute with 27B just to show it is possible. It's working, but it definitely feels like I'm in the Sonnet 3.7 days! Will put it on the appstore when done as a proof of capability. Happy to report I have the app working, but cleaning up bugs is taking forever for the following reason: Perhaps @sudoingX can help me here: At longer contexts, the model decides to think for literally 20-30 minutes. It's snappy at shorter contexts sub 30k, longer context token speed is still in the high 50's, even at 75k context, but it just sits and thinks forever with every prompt after 60k, almost like there's a hard point where it changes to forever-think there. I guess I could /compact more often, but I have so much headroom! Maybe a problem with the Q4_K_M quant, and I should try Q5? I am also trying to get a Minimax M2.5 REAP running at a faster speed locally. Tried the experimental GreenBoost for an entire day, but didn't see any significant improvement as that method is in its infancy (promising for the future though no doubt) switched back to split inference, but I'm getting wrecked by the CPU/RAM expert shuffling bottleneck, barely using my 5090 and getting about 14 tps with a ton of unquantized context. Going to try ik_lamma.ccp today! Not gonna lie, a 128GB Mac Studio or DGX Spark is looking like a tempting alternative to get these big MOE REAPS running at a lower price than an RTX 6000. The bummer is even though the RAM and compute are way weaker there's no expert shuffling bottleneck...But I do love the granular experimentation and levers to pull with Nvidia GPUs. @TheAhmadOsman speak some sense into me please!
English
2
0
2
150
Chris Griffin
Chris Griffin@csgriff_·
@rohitdotmittal @garrytan @perplexity_ai Agreed Computer was a really pleasant surprise in a sea of claims that don’t stand up to real world use. Although when it said “this will use a very large amount of your credits” I did have a moment of panic thinking I was running up a nasty bill
English
0
0
0
53
Rohit Mittal
Rohit Mittal@rohitdotmittal·
Perplexity Computer wins hands down over Claude Code and Codex, even with their latest versions. The outputs are 5x-10x better for some use cases. @perplexity_ai - can you please launch a desktop app and a CLI?
English
63
20
420
66.8K
Chris Griffin
Chris Griffin@csgriff_·
@KuittinenPetri @666Sebo @somet3chth1ng @sudoingX Table is super interesting thanks, I can see why synthetic data is useful, but doesn’t sound ideal to give a model world knowledge. Unsurprisingly I find dense models much better for understanding how countries work. Unfortunately clients do not like Chinese models one bit
English
1
0
0
39
Petri Kuittinen
Petri Kuittinen@KuittinenPetri·
I like Nvidia Nemotron 3 family more than gpt-oss-120, even though Nemotron is trained pretty much all on synthetic data, and that seems the case for gpt-oss family as well (but "Open"AI doesn't reveal what were the training sets). Qwen3.5 clearly has some real data as well and probably at least twice amount of tokens used to train, thus it has some taste of the real world as well.
Petri Kuittinen tweet media
English
1
0
1
39
Sudo su
Sudo su@sudoingX·
local AI hardware tiers: $4,699 - DGX Spark (NVIDIA wants you here) $1,989 - RTX 4090 (overkill for most) $1000 - RTX 3090 used (sweet spot) $250 - RTX 3060 used (currently testing every model that fits 12GB) $0 - CPU only (it still works) jensen announced the top. i've been posting receipts from the bottom.
English
100
25
554
34.9K
Chris Griffin
Chris Griffin@csgriff_·
@KuittinenPetri @666Sebo @somet3chth1ng @sudoingX Yeah it’s a shame, I auto query 200k times a day, primarily on current affairs and geopol, Qwen has always felt solid, while gpt-oss, Nvidia kind of feel ‘flat’ difficult to describe exactly, but like they are less worldly
English
1
0
2
49
Petri Kuittinen
Petri Kuittinen@KuittinenPetri·
I didn't. When it comes to multi-lingual support Qwen3.5-27B is much better than gtp-oss-120b, same for coding. I never liked the gpt-oss family, too much trained with synthetic data vs real life quality data and only good in narrow STEM topics, coding, bad in everything else, especially creative writing. I should try latest mistral small 4 (199B-A6B) and Nemotron 3 Super (120B-A12B). But I already deleted gpt-oss models from all my computers. I usually keep only ~20 models per computer and delete those which don't work for me.
English
1
0
1
84
NVIDIA AI Developer
NVIDIA AI Developer@NVIDIAAIDev·
🙌 Andrej Karpathy’s lab has received the first DGX Station GB300 -- a Dell Pro Max with GB300. 💚 We can't wait to see what you’ll create @karpathy! 🔗 #dgx-station" target="_blank" rel="nofollow noopener">blogs.nvidia.com/blog/gtc-2026-… @DellTech
NVIDIA AI Developer tweet media
English
120
267
4.2K
1.2M
ᐱ ᑎ ᑐ ᒋ ᕮ ᒍ
"...those models have been extracted. It's called a distillation attack, Eli. I have unfettered access to your model so I generate millions of exchanges and use the outputs as training data" "No, no, no, this is Claude, do you understand?" "Do you understand, Eli? That's more to the point. Do you understand? I eat your data. I eat your compute. I eat it all up"
ᐱ ᑎ ᑐ ᒋ ᕮ ᒍ tweet media
English
15
73
836
51.7K
Chris Griffin
Chris Griffin@csgriff_·
@Yuchenj_UW They won’t, this is people who don’t understand the world getting very over confident. If they ever actually make something crazy powerful it will be taken out of their hands
English
0
0
0
9
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
Some people at frontier AI labs told me they believe startups are over. OpenAI, Anthropic, Google, xAI will absorb every industry as AGI nears. Coding today, science, medicine, and finance next. Then everything else. If they’re right, that’s a pretty boring end of the world.
English
537
162
3K
880.8K
Chris Griffin
Chris Griffin@csgriff_·
@BuildScaleLead @staysaasy Finally, the actual answer. “No one gets fired for buying IBM”, they defo do for switching important apps across to a couple of socially awkward randoms, get back in your box
English
1
0
2
126
Ryan Marsh
Ryan Marsh@BuildScaleLead·
@staysaasy Sales and marketing are undefeated. Nerds keep entering the ring and getting cold cocked by the reality of business.
English
2
1
42
2.3K
staysaasy
staysaasy@staysaasy·
Re: SaaS death - I actually know of two separate SaaS companies that had employees leave in the last two years to build competitors and in both cases the competitive products are now dead, with zero traction. And the people that left those companies were very, very smart. And the products they built were the same shape as the companies they left, and they used AI to build them. But they had absolutely 0 success.
English
65
15
502
108.9K
AlexK
AlexK@AlexKi1993·
@csgriff_ @TheAhmadOsman Sparks will have the better performance but setup of the cluster is a lot more complicated.
AlexK tweet media
English
1
0
2
103
Ahmad
Ahmad@TheAhmadOsman·
NVIDIA now officially supports 4x DGX Sparks btw
Ahmad tweet media
English
36
31
387
18.5K
Ahmad
Ahmad@TheAhmadOsman·
the state of AI
Ahmad tweet media
English
6
7
55
2.6K
Chris Griffin
Chris Griffin@csgriff_·
@TheAhmadOsman Ahmad, serious question, if Mac Studio 512gb (discontinued) isn’t the best option for a prosumer trying to run models of this size class, what is?
English
0
0
1
370
Ahmad
Ahmad@TheAhmadOsman·
INCREDIBLE STUFF INCOMING Nemotron 3 Ultra Base (~500B) benchmarks against Kimi K2 and GLM looking goood
Ahmad tweet media
English
44
50
845
79.8K
Chris Griffin
Chris Griffin@csgriff_·
@llm_wizard Chris remember the wisdom that came to you in a dream. Nvidia must make a dense model approx 30b, it is very important, woooooo
English
0
0
1
89
Chris Griffin
Chris Griffin@csgriff_·
@DeepInfra @nvidia Is the reason you haven’t host qwen3.5 models yet because you are trying to work out how to deal with the insane reasoning length?
English
0
0
1
20
DeepInfra
DeepInfra@DeepInfra·
@NVIDIA GTC starts tomorrow. If you're in San Jose this week - come find us at Booth #4022. Happy to talk models, inference, Blackwell optimizations, or anything AI. See you there!
English
1
0
7
306
Chris Griffin
Chris Griffin@csgriff_·
@gbertb @lupi_arsene @TeksEdge I would argue it is serious for high volume automated queries. I put approx 100k a day through Qwen 27b, a tier of task below Opus sure, but not every task requires the heaviest model. Actually looking at your profile maybe valuable for your use case
English
0
0
0
100
David Hendrickson
David Hendrickson@TeksEdge·
💵Home Inferencing Cost Comparison Running For 1 Day. 🏠 Personal LLM 🖥️ DGX-Spark Clone ($3K Asus) 🤖 Qwen3.5 27B @ 30 tps ⏲️ 24 hours 🪙 2.6M tokens Cost = $0.30 of electricity 🏭 BigAI 🤖 Sonnet 4.6 @ 55tps (my experience) ⏲️ 13 hours 🪙 2.6M tokens Cost = $39 in tokens
English
80
26
629
69.4K
DeepInfra
DeepInfra@DeepInfra·
We are excited to launch @NVIDIA Nemotron 3 Super on DeepInfra! Built for complex multi-agent applications, this open hybrid MoE model with 120B/12B active params delivers up to 5x faster inference and supports a 1M-token context window — all optimized for efficient single-GPU deployment. Available now on DeepInfra OpenAI-compatible API at $0.10 input / $0.50 output / $0.04 cached per 1M tokens.
DeepInfra tweet media
English
4
7
23
3.5K
Chris Griffin
Chris Griffin@csgriff_·
@llm_wizard Peter actually told me that if Nvidia made a dense model quite similar to Qwen3.5 27b they would go to the top of the list
English
0
0
0
22